• DNN64: An ML Compiler Toolchain for the Nintendo 64 (Part 4) --- The Co-Processor

    This post is the fourth in my DNN64 series, where I discuss my project of compiling and accelerating deep neural networks (DNNs) on the Nintendo 64 system using modern tools and techniques. My first post is available here, and the goal is to use modern tools and techniques on this constrained platform. This post discusses the N64’s co-processor, the RSP, which I am using to accelerate my DNN computations.

    [Read more]
  • DNN64: Invited Talk

    I was delighted to give an invited talk at my alma mater (University of Glasgow) on my DNN64 project, where I have been building an accelerated DNN compiler for the retro N64 console (using modern tools and techniques!)

  • DNN64: An ML Compiler Toolchain for the Nintendo 64 (Part 3) --- Activation Maps

    This post is the third in my DNN64 series, where I discuss my project of compiling and accelerating deep neural networks (DNNs) on the Nintendo 64 system. My first post is available here, and the goal is to use modern tools and techniques on this constrained platform. This post builds upon the challenges of limited memory discussed in the previous post, with this post looking at the memory requirements of the activation maps.

    [Read more]
  • DNN64: An ML Compiler Toolchain for the Nintendo 64 (Part 2) --- Weighty Matters

    This post is the second in my DNN64 series, where I discuss my project of compiling and accelerating deep neural networks (DNNs) on the Nintendo 64 system. My first post is available here. This post will talk about some of the challenges we face regarding the limited memory of the console compared to the high memory requirements of DNNs, and changes we need to make to our code generator to increase our efficiency and thus increase the size of the models we can run. In particular, we’ll look at this from the perspective of the DNN weights (also known as parameters).

    [Read more]
  • DNN64: An ML Compiler Toolchain for the Nintendo 64 (Part 1)

    In the first of a series of blogposts, I discuss my experience developing and accelerating a compiler toolchain for running deep neural networks on the Nintendo 64 (1996). I endeavoured to use modern tools (e.g., Apache TVM), and acceleration techniques which are relevant to today’s models and hardware accelerators. But of course there will be special considerations required to fit them to the unique hardware and software environment of this beloved (by some, it was out before I was born) games console. This first post will give an overview of my system design, and future posts will go deeper into individual challenges (in ML, compilers, and hardware).

    [Read more]
  • Press: HiPEAC Info 71

    I’m pleased to say that I have been featured in issue 71 of the HiPEAC Info magazine, available here from the HiPEAC website. You can find it on pages 48-49. During the interview with a HiPEAC team member, I had the opportunity to discuss my PhD journey, share some of the techniques I developed, and offer insights that could be beneficial to current and prospective PhD candidates.

  • How to Instantly Open Files at Specific Positions in KDE Konsole

    I often use KDE Konsole for running terminal commands, but sometimes I’m using a tool (e.g., a compiler) which outputs a file path, as well as a line number, which I may want to open in my text editor. E.g.,

    2 errors generated.
    In file included from /home/proj/lib/AsmParser/Parser.cpp:13:
    /home/proj/include/mlir/IR/MLIRContext.h:253:18: error: use of undeclared identifier 'Operation'; did you mean 'operator'?

    Wouldn’t it be handy if we could just click on/select the file in the terminal output, and open it in our text editor in the right place? This would reduce friction when debugging, potentially increasing productivity.

    [Read more]
  • Step-by-step Guide to Adding a New Dialect in MLIR

    For one of my projects, I needed to add a new dialect to the main MLIR tree. However, following the information available, I encountered some issues. I made a “clean” example dialect, which I was able to add correctly. This post discusses how this is achieved, and links to some code.

    [Read more]
  • Retro-AI Christmas Fireplace

    I’m sure many folk are sick of hyper-realistic AI generated images

    Let’s take it back, way back, antique style, 2021

    In between peeling tatties with my family for Christmas, I dusted off my old notebooks to produce this VQGAN+CLIP AI Fireplace

    Warm yourself on the latents:

    This uses the same stack that big Sean Cosgrove and I used for the “Bloodshot” music video, as seen in DJ Magazine and this blogpost.

    Happy holidays to aw yoos.

  • Google Project Management: Professional Certificate

    In this post, I am pleased to announce that I have received a certificate for completing the Google Project Management course, comprised of six modules over six months. Although I have gained significant experienced in managing projects during my time at gicLAB, as well as my other professional endeavours, I felt that re-familiarising myself with the terminology and best practices of the field would serve me well. You can find my certificate of completion here.

    [Read more]
  • PhD Completed!

    Pleased to broadcast that I have completed my PhD at University of Glasgow!

    My thesis title was “Compiler-centric Across-stack Deep Learning Acceleration”. In plain English, this means that neural networks (aka “AI”) are expensive, and to make them scale, we need to collaborate across machine learning, software, and hardware domains. My opinion is that compilers will be an increasingly important piece of this co-design challenge.

    Many thanks to Dr José Cano Reyes for his tireless mentorship and support, which has helped shape me into the researcher and engineer I am today, even through a global pandemic. We logged over 470 hours of one-on-one meetings — I challenge you to find someone more dedicated to his students.

    Additional thanks go out to my friends and colleagues at gicLAB, University of Glasgow, and beyond. Not to mention my ever supportive family, pals, lovers, and enemies.

    Interested in advancing deep learning acceleration or compilers? Let’s connect!

    Dr. P, signing off. 🫡✌️

  • Open Source: signal-compress, semi-secure LLM compression of Signal chats

    This project extracts messages from the Signal messenger app, and runs an LLM (large language model) to try and summarise what happened. This can be handy for extensive chats, and archival purposes, and is intended to be run locally to preserve privacy.

    Signal is designed to be privacy-centric, and several other chat apps implement the protocol such as WhatsApp, and Facebook’s and Skype’s “secret-mode” conversations. Therefore, I was keen to minimise how much I compromised this security model.

    This project uses Docker Compose, to make managing dependencies easier, since Signal encrypts its database using a particular encoding that requires some tool setup to access. Docker Compose also makes it slightly easier to control things like file and network access. The system runs the LLM model locally, in an attempt to preserve the privacy of your messages, compared to sending them to a third party like OpenAI. This uses the llama.cpp project. I’ve open sourced my code at

    See below for more technical details and design rationale.

    [Read more]
  • Thesis Analysis: Gender Ratios

    I recently submitted my PhD thesis, in which I cited 1741 authors (1392 unique authors) over 319 papers. I was interested in the composition of my bibliography; therefore I’ve cobbled together some code to analyse it. The first analysis I have run estimates the gender ratio of the authors I cite. The most reliable estimate of my gender ratio is 11.6-to-1 for men-to-women. I discuss my methodology more below.

    [Read more]
  • Open Source: Webcam Timelapse with Human Detection

    30 days of thesis writing (64% of my active working hours!), and version 1.0 is ready!

    I catalogued this journey with a timelapse, produced using a tool I developed called desk-cheese.

    See more info by expanding this post.

    [Read more]
  • Open Source: AI writing assistant

    Getting close to version 1.0 of my PhD thesis, so I’ll soon be roping in colleagues and pals to give some feedback.

    But first, I’ve written a script to give the first round of feedback - automatically. is an LLM-based LaTeX document review system (i.e., proofreading “AI”, using gpt3.5-turbo). It provides feedback on your writing style and quality, and has some (sketchy) domain knowledge thanks to being trained on the web.

    Take its output with a pinch of salt (as you should with any human proofreader), and definitely don’t use it to write your papers. However, I’ve found it to be super-handy in catching issues that my spelling and grammar checkers have not.

    I’ve open sourced the code here, hopefully I’ll get a chance to discuss it more once my thesis is done!

    [Read more]
  • TVMCon2023 Appearance: Transfer-Tuning

    I was delighted to be accepted as a speaker at TVMCon2023, presenting my work on Transfer-Tuning. The whole event was great, with over 1000 registrants, and 60 speakers. My talk was recorded, and is available here on the OctoML YouTube channel.

  • Open Source: Perry's thesis-o-meter

    I’ll be starting hardcore thesis writing in a month or so. For measurement and motivation reasons, I’ve been advised to keep track of progress (e.g. word count added per day). Therefore, I have open sourced a “thesis-o-meter” tool, which I hope will help me.

    There are a few other metrics and analysis tools that I’ll add to it as time goes on, however if anyone else wants to use or improve it, feel free.

    The README has a roadmap, as well as system critique.

  • MOOC: Accelerated Deployment and Benchmarking on Bonseyes AI platform

    Dr Cano Reyes and I were invited to produce an online course (MOOC) for the BonsAPPs consortium. Off-of-the back of our AccML paper on reproducible workflows for AI, and the AIMDDE project, and our invited keynote in Madrid on the topic, they thought that we would be good advocates to motivate and demonstrate how SMEs can accelerate their AI application development using AI Assets.

    Topics touched upon include cookiecutter-style project templating, dataset processing reproducibility, continuous integration and deployment, containerisation, training, benchmarking, cross-framework exporting, and optimization (e.g., via quantisation).

    You can find the MOOC here at

  • Robot Burns Infinite

    For Burns’ Night 2023, I am pleased to announce Robot Burns Infinite, the follow-up to 2020’s Robot Burns!

    Robot Burns Infinite is a website that uses a GPT-3 derived model, trained on 6000 lines of Burns poetry, and will generate new ones on demand. Buy some credits to get started, and as before all proceed go to Samaritans.

    Check out the site at, with a brief technical write-up here. I used the Flask Python framework for the website, with a SQL database, AWS for hosting, with Cloudflare for DNS resolution and DDoS mitigation.

    Robot Burns Infinite was featured in articles on The Scotsman and MSN.

    [Read more]
  • Misc MLIR errors: for your convenience

    The following is a log of some errors I’ve encountered in MLIR, and how I fixed them. Solutions to many of these were found through search engines and various forums (e.g., the LLVM forum), however I am centralising errors for my own reference. I believe that many of these are likely to be common errors for people initially exploring MLIR. Note that at the time of writing, MLIR is changing quickly, and some of these fixes may no longer be relevant. In fact, several of my issues below emerged due to such changes. I may add additional errors here over time.

    [Read more]
  • PACT Paper: Transfer-Tuning

    I was delighted to have our paper “Transfer-Tuning: Reusing Auto-Schedules for Efficient Tensor Program Code Generation” accepted in the PACT 2022 conference in Chicago, where I was first author. You can view the paper on arXiv here, or from ACM.

    I presented a 25 minute presentation on the paper in person, as well as a poster. We also submitted an artifact for review, the code for which you can find here on GitHub. Our paper received all 3 artifact evaluation stamps.

    In short, transfer-tuning is an approach which allows us to achieve some of the speedups from auto-scheduling systems like Ansor, in a fraction of the search time.

    For more details, please check out the full paper!

    [Read more]
  • Research Visit: FHE+MLIR compilers (Northeastern, Boston, MA)

    I’m visiting the NUCAR lab, at Northeastern University, Boston. I’m working on a short project, exploring fully-homomorphic encryption compilers using MLIR. Meeting lots of interesting folk, eating great food, and getting into the Boston frame of mind.

    Many thanks to the Scottish Informatics and Computer Science Alliance (SICSA) for providing some of the funding!

  • Invited Keynote: BonsAPPs Matchmaking and Demo Day

    Thanks to our success in the AIMDDE project, the BonsAPPs consortium invited Dr Cano Reyes and I as keynotes to their “Matchmaking and Demo Day”.

    We highlighted to the audience (of SMEs and AI talents) the value of using effective community driven tooling for developing AI applications, even as researchers, and highlighted some of the key features of our AccML 2022 paper.

    [Read more]
  • Productive Reproducible Workflows for DNNs: A Case Study for Industrial Defect Detection

    I was delighted to have our paper “Productive Reproducible Workflows for DNNs: A Case Study for Industrial Defect Detection” accepted in the AccML 2022 workshop at the HiPEAC 2022 conference, where I was first author. You can view the paper on arXiv here. I presented a 20 minute presentation on the paper in-person.

    The paper came from our successful completion of the AIMDDE project, where we developed an AI-based solution to industrial defect detection. We were a small team, with one developer (me) and one project lead, thus we had to be as efficient as possible to achieve our goals. I exploited the Bonseyes toolchain and systems such as Docker and high-level domain-specific ML libraries to be highly productive, as well as reproducible. The paper evangelises the value of these workflows for research and SMEs, by using our experience with AIMDDE as a case-study.

    For more details, please check out the full paper!

    [Read more]
  • Bifrost: End-to-End Evaluation and Optimization of Reconfigurable DNN Accelerators

    Header image

    This post gives a brief overview of the paper *“Bifrost: End-to-End Evaluation and Optimization of Reconfigurable DNN Accelerators”**, which I was 2nd author to Axel Stjerngren. The paper was published at ISPASS 2022, and you can view the paper on IEEE Xplore if you’ve got access, or on arXiv for free. I presented a 15 minute presentation on the paper at the conference.

    [Read more]
  • Penguin eating a Tiger(lake): Fixing high battery usage on GNU/Linux+Intel

    I recently got a new laptop, a ThinkPad P14s, as part of a project I undertook at the university. However, despite an advertised 10 hour battery life, I found that it was going from 100% to 0% charge in under an hour. That’s fine in a work-from-home situation, however I’m hoping as the world gradually opens up, I’m keen to become Mr Worldwide 😎, so an hour is not going to be enough for me. I eventually figured out a solution, however there was not clear documentation of this online. This post documents what I found, and how I fixed it.

    [Read more]
  • GPT-3 based Signal bot

    Header image

    In experimenting with the OpenAI API, I developed a little bot for the Signal Messenger app.

    [Read more]
  • Circuit hacking: AI assisted clock control

    As a weekend project, I thought I would try to do something with an old Raspberry Pi 3 I had lying around. However, I have not actually done non-trivial breadboarding since high school, and I wasn’t sure where to start. Luckily I recently got invited to the OpenAI Codex Beta, so I was able to generate a lot of the code automatically!

    [Read more]
  • Open Source: PyTorch Lightning CIFAR

    I released my code for PyTorch Lightning CIFAR on GitHub, free under the MIT License. It is a fork of the classic PyTorch CIFAR codebase from @kuangliu, adding support for the productive research tooling that PyTorch Lightning package brings. I also include accuracies for the models trained using 200 epochs.

    [Read more]
  • AIMDDE BonsAPPs Project

    My research group, Glasgow Intelligent Computing Lab (gicLAB), was awarded a €28k grant as part of the BonsAPPs AI industry challenge. I was one of the writers of the grant, and now am pausing my studies for 2 months to be the technical lead on the project.

    The overall goal of the challenge is to produce AI models which can effectively detect defects in manufactured goods (e.g., rolled steel, or textiles), and deploy them in a constrained edge device (e.g., Nvidia AGX Xavier).

    [Read more]
  • A chat with GPT-3

    I recently got access to the OpenAI API Beta, and with that a trained GPT-3 model. It’s a very flexible NLP model, which can do things like question answering, translation, text summarising, and more. One thing that’s drawn me in though is its chatbot capabilities.

    It is really quite unlike any system I’ve communicated with before, it can have countless different personalities depending on what topic of conversation you have with it. I’ve only had maybe 5 or 6 conversations so far, you can see a couple of them here. I’m posting this one in my blog, because it gave me permission to, plus it went off in an interesting direction.

    [Read more]
  • AI Generated Music Video: SENGA 'Bloodshot'

    I recently had the opportunity to help produce a music video for the upcoming album from SENGA AKA Sean Cosgrove.

    You can read about the track, and the album in this piece in DJ Mag.

    The video is AI generated, and I thought I’d take this opportunity to give a brief overview of how I put it together.

    [Read more]
  • SECDA: Efficient Hardware/Software Co-Design of FPGA-based DNN Accelerators for Edge Inference

    I was fortunate to recently work on a paper on hardware design, led by the talented Jude Haris. SECDA was published in October 2021, in the IEEE 33rd International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD) conference. It was hosted in sunny Belo Horizonte, Brazil; but unfortunately we had to attend remotely due ongoing international travel difficulties.

    The work discusses a co-design methodology we developed for efficiently producing FPGA accelerators. You can read the full paper here on arXiv, the following post just gives a brief accessible summary.

    [Read more]
  • Cross-post - Robot Burns: Using HPC to Create Modern-Day Poems Inspired by 18th Century Master

    My work on Robot Burns (my GPT-2 generated poetry pamphlet, in the style of Robert Burns - available for sale at 😉) was recently featured in an article on the SC21 (International Conference for High Performance Computing, Networking, Storage, and Analysis) blog.

    The blog was written by Cristin Merritt, and can be read here.

    I discuss how low cost causal HPC environments like Google Colab have been a boon to hobbyists and artists.

    [Read more]
  • Writing practice: a rant on encryption policies

    I originally wrote this as an email in response to a proposed EU resolution to ban end-to-end encryption on apps like WhatsApp, Signal, and others. The salient points still remain true. I am a dilettante when it comes to the finer points of security and cryptography, however I am always trying to learn more on the topic, challenge my beliefs, and influence policy decisions. If you want to talk to me more about this sort of thing feel free to get in touch.

    [Read more]
  • Flashing a Hikey 970 in 2020

    I recently had to flash a Hikey 970 from scratch, as I did not get any response from fastboot. However, the official documentation does not appear to have been updated since 2018. Hence, this post documents the steps of how far I got. Unfortunately I was not able to get the whole thing working, but I will update this post if I do. If you found this post, and did get it working, please feel free to contact me so it can be updated.

    [Read more]
  • Installing perf on a development board (e.g. RPi4)

    I recently had a colleague encounter some troubles using perf on a new Raspberry Pi 4 device.

    Normally you would install it from a package repository (e.g. using apt, and the package name linux-tools).

    However, when you’re on a non-x86 platform, you cannot always rely that there will be packages for you device, and even if there are, they might be broken in subtle and frustrating ways. This was the case for my colleague.

    Luckily, I had encountered a similar issue on an ODROID device, and had kept my notes. I’ve adapted the notes into an email to the colleague, and thought I might as well post it here.

    [Read more]
  • Robot Burns

    Leveraging some recent advances in natural language processing, for Burns Night 2020 I recently generated poems in the style of Scotland’s National Bard.

    I collaborated with Scottish illustrator, Alasdair Currie, who has provided their design magic.

    We’re selling a pamphlet of some of the system’s output at, for £10+P&P. All proceeds go to Samaritans.

    We were featured in newspaper articles, including The Scotsman, Digit News, and The Glasgow Times.

    I also had my first appearance on live TV, with a quick segment on EuroNews. With that, I think I focused too much on technical details, and missed the narrative thrust, aspects of communication which I hope to continue working on during my PhD.

    Update Robot Burns was featured in a blogpost in SC21 conference website, see more info here!

    [Read more]
  • Podcast episode: Arm and HPC

    As part of my PRACE Summer of HPC, I produced a few podcast episodes, where I spoke with HPC leaders in research and industry.

    This episode I was joined by Brent Gorda and Filippo Spiga of Arm Holdings. We discussed the movement of Arm into the HPC market, partly in thanks to research collaborations with PRACE affiliated centre BSC.

    Link: PRACE SoHPC Podcast - ARM and HPC – with Brent Gorda and Filippo Spiga of Arm Holdings

    [Read more]
  • Building an old version of gcc, a journey of errors and solutions

    Hello neighbours. Another dry one today, intended first and foremost for myself before I wrote this post, and secondly for anyone coming after me in a similar boat.

    Due to working on a reproducibility study, it was necessary for me to build some old compiler versions, that no longer existed on package repositories.

    I encountered a number of build errors during this time, and have documented how I solved them. As a post, it isn’t bringing many insights, and serves as a curation of disparate Stack Overflow and other sources that I used to solve my issues.

    [Read more]
  • Containerisation for HPC

    For my reproducibility work I have recently been introduced to the Singularity containerisation workflow, which has some key differences from Docker, especially regarding the permissions that processes run at.

    You can find the post on the SoHPC website.

    [Read more]
  • Summer of HPC Introduction

    Have a summer placement through the Partnership for Advanced Computing in Europe. I will be doing some more posts for them this summer, you can find them here, or

    My first post is available here:

    [Read more]
  • Mnemonic

    In order for privacy preserving technology to see widespread adoption, it must be accessible.

    One of the key problems in making this technology accessible is the question of how to handle identity. In a centralised system, some authority can be in charge of which identifiers are associated with which identities. For example, Google runs Gmail, and keep track of which user owns what username. To create an account you ask them for one, such as They will look up their list of previously registered usernames, and tell you that someone else already has it. Finally, you find an unallocated username, which Gmail grants you. From that point forward, if someone wants to contact you they need your username, then send you their message via Google.

    In decentralised systems however this poses some challenges. A global table of registered usernames is a complex engineering challenge.

    [Read more]

subscribe via RSS