Calling Clang’s Assembler from C++

LLVM is a collection of modular and reusable compiler and toolchain technologies. It provides a set of libraries and tools for building compilers, assemblers, linkers, and other related tools.

In a project I was working on, we were using the clang part of this project to compile C and LLVM IR code. The code path for these two sources was similar enough that we could have a single compile function that passed all the configuration flags for our usecase.

However, for a new feature, I was generating assembly files directly and wanted to assemble them into an object file. This post briefly explains how I got this working and how you can use you can use the clang libraries to assemble assembly files in C++.

As a normal user of the clang CLI tool, the workflow is the same:

clang -c my_asm.s -o my_obj.o  # same interface for C and assembly (.s)
clang -c main.c -o my_obj.o

However, this approach didn’t work when trying to use the clang libraries in C++. I encountered several errors, with unrecognised flags. Some were easy to fix as they were just flags that were specific to C or LLVM IR that I didn’t need for assembly. However, one recurring one I got was Error: unknown argument: '-filetype'. This seemed to be inserted automatically when setting up the job, and wasn’t something I could easily remove.

I thought I’d go back to basics, and think about how the clang toolchain works under the hood. I was curious about what flags were being inserted under the hood, so I ran:

clang -### -c my_asm.s 2> cc1_dump.txt

This command printed all the flags passed to the underlying program:

clang version [redacted]
Target: [redacted]-linux-gnu
Thread model: posix
InstalledDir: /usr/bin
 (in-process)
 "/usr/lib/llvm-[redacted]/bin/clang" "-cc1as" "-triple" "[redacted]-linux-gnu" "-filetype" "obj" "-main-file-name" "test_tensor.s" "-target-cpu" "[redacted]" "-fdebug-compilation-dir=/tmp/[redacted]" "-dwarf-debug-producer" "clang version [redacted]" "-dwarf-version=[redacted]" "-mrelocation-model" "pic" "-o" "my_obj.o" "my_asm.s"

You’ll note that clang infers from the file extension that it is an assembly file, and so passes the -cc1as flag.

It’s worth noting that the clang executable isn’t really a compiler, it’s a compiler driver. clang -cc1 is the compiler, and clang -cc1as is the assembler, and these are flags that are passed to the clang driver to invoke the appropriate tool. It can also infer the type of file being compiled based on the file extension, and will pass the appropriate flags to the underlying tools. However, when we’re using the clang libraries directly in C++, we’re responsible for calling the cc1as program directly to assemble assembly files.

The main program that cc1as uses is defined under clang/tools/driver/cc1as_main.cpp, and adapting this was enough to get this feature working.