This page tracks open source contributions I’ve made over the years. It is incomplete, but should paint a picture. For open source projects which I have made myself, see the “Own Projects” heading below.

2024

MLIR/LLVM: [mlir,python] Fix case when FuncOp.arg_attrs is not set

GitHub PR #117188. In the MLIR Python API, there was a KeyError that was raised if one tried to access the arg_attrs of a FuncOp that did not have any. This PR fixed this, by returning an empty dictionary in this case, rather than raising an error. An alternative could have been to return None, but I felt that an empty dictionary was more consistent with the behaviour of the rest of the API, and better suited the requirements of the Python API.

MLIR/LLVM: [mlir,python] Expose replaceAllUsesExcept to Python bindings

GitHub PR #115850. MLIR’s Python bindings are great for quickly hacking and exploring IR transformations. However, I found that a useful method Value.replaceAllUsesExcept() was not exposed. I added this, with appropriate tests.

MLIR/LLVM: [mlir] Fix remove-dead-values pass throws error when module has a name

GitHub PR #109990. Encountered a bug when using OpenXLA’s StableHLO, which gives MLIR modules names (module @IrToHlo.6443). This caused the remove-dead-values pass to throw an error, since it was expecting a module without a name. Could not find a good reason for this, so I fixed it, providing a test case to ensure it doesn’t regress.

Docker Suno API: Improved docker compose integration

GitHub PR #115. This repo provides an API to the Suno AI music generation tool. I used it as part of another project I was working on. However, this project used a containerised approach (for better modularity and security controls/clear trust boundaries). This PR made the project more amenable to this goal.

MLIR/LLVM: [mlir] Retain original identifier names for debugging

GitHub PR #79704. Currently under active development, with a high volume of discussion on the LLVM developer forums. This feature aims to add a flag to mlir-opt to keep the original names, which can be helpful for debugging compiler pipelines. Right now, identifier names are made anonymous, e.g., %my_input becomes %arg0. Meaningful variable names can make it easy to reason about behaviour, hence why this feature is valuable. Three designs are under consideration, although since this touches the core of MLIR, it is key that caution and careful consideration is used before merging.

Triton: [TESTING] Added precision option to benchmark CSV data saving

GitHub PR #2933. Triton has a built in benchmarking suite, however I discovered that it was saving data with an unusually low level of precision (.1f). In this patch, I made the precision user-configurable, and set the default to 6. I do not think the downsides of higher precision, namely larger file sizes for the CSVs, is relevant compared to the downsides of losing data. By making the value configurable, this gives us the best of both worlds.

Triton: [CLEANUP] Fix typos across the project

GitHub PR #2876. This PR came from my initial reading of the documentation, and identification of a few spelling errors that impacted readability. After a suggestion from one of the maintainers, I used the automated spell checking tool codespell to do a more general cleanup of the codebase.

I was conservative in my correction criteria:

  • codespell provided suggestions, but I used my own discretion when applying them
  • I ignored anything in the third-party directory
  • Corrections were only on comments and docs, no code (even if a variable name was clearly a typo). Exceptions to this include:
    • An error message string in AxisInfo.cpp
    • An error message string in hip.c
    • An error message string in WSPipeline.cpp
    • Docstrings in tablegen files (still documentation, but is compiled)

2023

Apache TVM [fix][relay][qnn] Bug fix for 8-bit quantized mul

GitHub PR #14286. I identified that there was a case where operations within quantized CNN models were not being supported adequately. I reproduced the error with this gist. Upon closer inspection, I identified that the issue is related to the “Squeeze-and-Excitation block”, where we multiply the output of a sigmoid with an earlier output, found in models such as EfficientNet. This broke some of the assumptions of how quantization mul operations were implemented in TVM. I fixed the bug.

2022

Apache TVM: [docs] Update debugger.rst

GitHub PR #11231. TVM’s debugger and profiler is a very powerful tool, but was/is quite new and underutilised. The documentation did not reflect its correct usage, and I had to reverse engineer how it was implemented. My PR updated the documentation to reflect how the debugger can actually be used.

Phabricator #D133977 A minor documentation fix, such that the Toy tutorial (many user’s first experience of MLIR, and a common reference point for MLIR developers) correctly linked to the correct location.

2021

Apache TVM: Better grouped convolution for CPU targets

GitHub PR #6137. This pull request replaced the original grouped convolution algorithm in TVM for x86 and Arm targets, with the faster Grouped Spatial Pack Convolutions (GSPC) algorithm. I developed this algorithm in my ASAP’2020 paper “Optimizing Grouped Convolutions on Edge Devices”. This is now the default algorithm used in TVM for all CPU code for grouped convolutions.

pypylon: Update setup.py to fix #296 (deprecate version)

GitHub PR #314. This PR officially deprecated support for an old version of Pylon (5), since it was no longer supported in other parts of the system. This ensured that users with the old version installed would not encounter issues.

2020

Apache TVM: Asymmetric padding and dilation in conv2d workload

GitHub PR #7142. The goal of this pull request was to make asymmetric padding and dilation a first-class citizen in 2D convolution. The previous workload description had hpad and wpad, however this is not representative of all of the possible configurations. Most conv2d implementations in TVM already support asymmetric padding in their algorithm, so by allowing workload description to reflect this, it could be exploited.

The process of developing this PR also uncovered a bug, where the output dimensions were not being properly calculated for fallback_schedules. Both asymmetric padding and dilation were not being considered properly, which was leading to some untested incorrect behaviour. For some cases, this could perhaps result in a schedule with a performance regression, but this has not been tested. I fixed the bug, and added a test case.

Own Projects

2024

  • RSPL Examples: Examples of using the RSPL language for the N64 RSP GPU.
  • Triton Samples: Some home-brewed Triton kernels, with varying degrees of optimisation

2023

2022