Calling Clang's Assembler from C++

LLVM is a collection of modular and reusable compiler and toolchain technologies. It provides a set of libraries and tools for building compilers, assemblers, linkers, and other related tools.
In a project I was working on, we were using the clang part of this project to
compile C and LLVM IR code. The code path for these two sources was similar
enough that we could have a single compile function that passed all the
configuration flags for our usecase.
However, for a new feature, I was generating assembly files directly and wanted
to assemble them into an object file. This post briefly explains how I got this
working and how you can use the clang libraries to assemble assembly files in
C++.
As a normal user of the clang CLI tool, the workflow is the same:
clang -c my_asm.s -o my_obj.o # same interface for C and assembly (.s)
clang -c main.c -o my_obj.o
However, this approach didn’t work when trying to use the clang libraries in
C++. I encountered several errors, with unrecognised flags. Some were easy to
fix as they were just flags that were specific to C or LLVM IR that I didn’t
need for assembly. However, one recurring one I got was
Error: unknown argument: '-filetype'. This seemed to be inserted automatically
when setting up the job, and wasn’t something I could easily remove.
I thought I’d go back to basics, and think about how the clang toolchain works
under the hood. I was curious about what flags were being inserted under the
hood, so I ran:
clang -### -c my_asm.s 2> cc1_dump.txt
This command printed all the flags passed to the underlying program:
clang version [redacted]
Target: [redacted]-linux-gnu
Thread model: posix
InstalledDir: /usr/bin
(in-process)
"/usr/lib/llvm-[redacted]/bin/clang" "-cc1as" "-triple" "[redacted]-linux-gnu" "-filetype" "obj" "-main-file-name" "test_tensor.s" "-target-cpu" "[redacted]" "-fdebug-compilation-dir=/tmp/[redacted]" "-dwarf-debug-producer" "clang version [redacted]" "-dwarf-version=[redacted]" "-mrelocation-model" "pic" "-o" "my_obj.o" "my_asm.s"
You’ll note that clang infers from the file extension that it is an assembly
file, and so passes the -cc1as flag.
It’s worth noting that the clang executable isn’t really a compiler, it’s a
compiler driver. clang -cc1 is the compiler, and clang -cc1as is the
assembler, and these are flags that are passed to the clang driver to invoke
the appropriate tool. It can also infer the type of file being compiled based on
the file extension, and will pass the appropriate flags to the underlying tools.
However, when we’re using the clang libraries directly in C++, we’re
responsible for calling the cc1as program directly to assemble assembly files.
The main program that cc1as uses is defined under
clang/tools/driver/cc1as_main.cpp,
and adapting this was enough to get this feature working.