Improving dotnet iOS release build times on Apple Silicon

In 2023, the life of a C# iOS developer is pretty good. We have apple silicon, and dotnet supports it. The legacy Xamarin toolchain is not arm64 friendly and probably never will be, but once you migrate to the new stuff, you'll find yourself in an all-arm64 development nirvana, where builds zip away silently, and the hot, noisy days of intel past are but a faint memory.

Everything is as it should be 🏝️💻 . . .

...

...

...

Or is it? In this post we'll learn how to identify and replace some of the pesky intel binaries that sit between us and a trip to csrutil disable to remove Rosetta for good^, and speed up iOS publishes along the way.

an al-arm-ing discovery

You can follow along if you're on an M1, or just take my word for it:

Open Activity Monitor, sort the process list by Kind.
If you don't have the Kind column, you should 😤
(you can turn it on by right-clicking the column headers)

Open Terminal and get yourself to a dotnet ios/maui project on your machine somewhere.
If you don't have one, dotnet new maui -o gathering_intel && cd gathering_intel will get you set up with a starter project

Then kick off a publish
dotnet publish -c:Release -f:net7.0-ios -r:ios-arm64 -p:EnableCodeSigning=false -v:n

Before long, you'll start to see the mono-aot-cross invocations fill the terminal. They start off like this:

Tool /usr/local/share/dotnet/packs/Microsoft.NETCore.App.Runtime.AOT.osx-x64.Cross.ios-arm64/7.0.3/Sdk/../tools/mono-aot-cross execution started with arguments: ...

Take note of the path to mono-aot-cross for later. Now switch back to Activity Monitor, and try not to audibly gasp.

more like, "opt-out please" am i right 🤓

Just to be sure:

find /usr/local/share/dotnet/packs/Microsoft.NETCore.App.Runtime.AOT.osx-x64.Cross.ios-arm64/7.0.3/Sdk/../tools/ | grep cross/ios-arm64/ | xargs file | grep executable

/usr/local/share/dotnet/packs/Microsoft.NETCore.App.Runtime.AOT.osx-x64.Cross.ios-arm64/7.0.3/Sdk/../tools/llc:                               Mach-O 64-bit executable x86_64
/usr/local/share/dotnet/packs/Microsoft.NETCore.App.Runtime.AOT.osx-x64.Cross.ios-arm64/7.0.3/Sdk/../tools/opt:                               Mach-O 64-bit executable x86_64
/usr/local/share/dotnet/packs/Microsoft.NETCore.App.Runtime.AOT.osx-x64.Cross.ios-arm64/7.0.3/Sdk/../tools/mono-aot-cross:                    Mach-O 64-bit executable x86_64

Yes it's true: even in our arm64 dotnet install, mono-aot-cross, llc and opt - the bits that handle AOT compilation and optimisation - currently ship as x86_64 binaries and are run under Rosetta.

eh, this only affects publishing, it's no big deal!

That's fair - most of the time, we don't care that much about how long release builds take. Because of changes to the build approach in dotnet ios, or maybe just because of most everything else being arm64 on apple silicon, the development-time experience is pretty zippy (I still make heavy use of tbc though).

But what about when you're gearing up for release and start to focus on bundle size, or things like startup performance? That's the thing: The only way to know the true impact of a change with respect to bundle size or performance is to perform a release build. So at some point in your project you might just find yourself in an 'inner-dev-loop' of release builds, and at that time, build times might matter.

I went in search and found that unsurprisingly, the dotnet team already identified this gap in the arm64 binaries, and this issue tracked it. The scope of that issue was eventually narrowed and the remainder (including macos arm64) is tracked here. That means that this should eventually get resolved, maybe even in a future net8 preview, and you could just wait for that. But what if you're doing size/performance optimisation work NOW? You'll have to get your hands dirty, but is possible to solve this for yourself (for some definition of solve).

how much faster is using native AOT binaries over rosetta

So you can decide whether it's worth doing this, I've run some highly un-scientific speed tests. Here's a chart of my findings:

measured once on one machine only - ymmv

I tried four projects - dotnet new ios, dotnet new maui, eshop mobile client (from here) and one of my own. I added -clp:PerformanceSummary to the publish invocation to get the timings for the AOTCompile task.

On my machine, the aot compilation time reduction ranged from 30-35% across the projects - let's call it a third. Apple says the M2 gives up to 20% faster CPU performance than the M1, so if like me you have an M1 and sometimes have irresponsible thoughts about an M2, this basically saves you six to eight thousand australian dollarydoos.

how to (high level)

We saw in the github issue that the dotnet team ran into issues doing this - how can we expect to be able to make it work? We can make it work because we have simpler goals. The dotnet team has to worry about pesky things like "passing build pipelines", "architectures other than arm64", "solutions that don't just work on one person's machine" and other realities of shipping an sdk and runtime. We don't need to concern ourselves with those kinds of hassles.

We just want to take our arm64 mac and produce arm64 aot compiler binaries that aot for ios-arm64, and then somehow have them be used by the build. For that, we can build our own out of dotnet/runtime, and then just overwrite the the intel binaries we originally got from official sources with our bootleg ones. What could possibly go wrong?

Just like back in the day when we were rolling our own reflection-emit-enabled Xamarin.iOS versions, it goes without saying that you should exercise caution when replacing core parts of the dotnet build pipeline with custom built tools. It's true that we are building off tagged ("blessed") commits, but the reality is that arm64 aot compiling binaries aren't officially produced right now and the use case may not have been through the same testing rigour that supported use cases have. I haven't had any issues (yet?), but it's probably best to limit use of these binaries use to the aforementioned 'inner-release-loop' scenarios only, and use the official binaries for builds you actually want to ship. No warranties provided, proceed at your own risk, etc. etc.

how to (in detail)

With disclaimers out of the way, if you're still on board we're ready to start making a mess. With any luck, this process should only take 10-20 minutes.

First, clone dotnet/runtime:

git clone https://github.com/dotnet/runtime.git && cd runtime

Then, check out the tag that matches the version of the sdk you're using to build. You can see it in the path of the aot invocation from earlier. In this post, the invocation was:

Tool /usr/local/share/dotnet/packs/Microsoft.NETCore.App.Runtime.AOT.osx-x64.Cross.ios-arm64/7.0.3/Sdk/../tools/mono-aot-cross execution started with arguments: ...

so we want 7.0.3. In dotnet/runtime, the version tags are preceded by a 'v', so:

git checkout v7.0.3

(It's important to build off the tag matching the version of the dotnet sdk you're using. If not, you may run into errors due to differences between versions. For example, you can't build off the tip of main, which right now is .net8, then use the outputs with a .net7 sdk; things will go badly. An implication of this is that when you update dotnet, or if you have different projects pinned to different versions of dotnet, you'll likely need to need to follow these steps and build binaries for each of them individually. Basically let's just hope arm64 binaries start shipping soon)

Building this repo requires certain dependencies to be available on your system. You can run the below from the repo root (where you should already be) to get them, assuming you already have Homebrew:

brew bundle --no-lock --file eng/Brewfile

Ok, now we're ready to build things. There are flags you can pass to the runtime build script to isolate the build of the AOT cross compiler, but I didn't get great results with various combinations of these (either only some binaries came out arm64, or the build just didn't work - which is maybe what the updated issue tracks). So let's keep it simple:

./build.sh -s mono+libs -os ios -arch arm64 -c Release

This should take somewhere between 5-10 minutes, and complete without issues. It's pretty impressive really (go look in artifacts to see all the things we built with one command and no shenanigans).

Now make sure that we got what we wanted:

find . | grep cross/ios-arm64/ | xargs file

You should see:

./artifacts/bin/mono/iOS.arm64.Release/cross/ios-arm64/llc:            Mach-O 64-bit executable arm64
./artifacts/bin/mono/iOS.arm64.Release/cross/ios-arm64/opt:            Mach-O 64-bit executable arm64
./artifacts/bin/mono/iOS.arm64.Release/cross/ios-arm64/mono-aot-cross: Mach-O 64-bit executable arm64

Yes! arm64 all the things!

All that's left to do is to overwrite the official binaries with our own ones. Once again, the invocation from earlier tells us where these need to go. Just in case, let's keep a copy of the original bits around (also useful if you want to do comparisons).

(Remember to substitute the 7.0.3s here and below for your version if necessary)

sudo cp -R /usr/local/share/dotnet/packs/Microsoft.NETCore.App.Runtime.AOT.osx-x64.Cross.ios-arm64/7.0.3/Sdk/../tools/ /usr/local/share/dotnet/packs/Microsoft.NETCore.App.Runtime.AOT.osx-x64.Cross.ios-arm64/7.0.3/Sdk/../tools/backup

That put all the original binaries under a subdirectory called backup. Now copy our new files over.

sudo cp artifacts/bin/mono/iOS.arm64.Release/cross/ios-arm64/* /usr/local/share/dotnet/packs/Microsoft.NETCore.App.Runtime.AOT.osx-x64.Cross.ios-arm64/7.0.3/Sdk/../tools/

And that's it! Let's run another publish and see how it goes.

zooom

Now we're cooking with charcoal. Enjoy your 33% faster builds!
^(Only one intel binary left!)
(it's m l a u n c h)


bonus thoughts: other factors affecting build time

Switching from x64 to arm64 binaries is a nice 'free' build time improvement. There are a couple of other things that you can look at.

💡 Linking/Trimming

The less code you have, the less code needs to be AOT compiled. Using the linker will reduce the time spent in AOTCompile (and the output binary size). Some of it will be moved to the ILLink task, but the net effect should be a faster build and a happier user.

💡 Dealing with AOT-unfriendly assemblies

In the chart from earlier, "my app"'s AOT time went from ~120s to ~80s when switching to arm64 binaries. But when I first started looking at the build time, the non-arm64 AOT time was around 800s 🤯. Watching CPU usage and looking at build output made it clear - one assembly in the project took several times longer to AOT than all of the others.

The way the AOT step works is that the build system basically spawns an AOT process per assembly, for all assemblies at once, and lets the operating system manage their resource allocation. That's why in the screenshots of Activity Monitor in this post, you see a large number of processes using a small fraction of a cpu core - there are some 100+ processes trying to get their slice of 10 cores. Each of the AOT processes appears to operate on a single thread, which is fine in the beginning when there are more processes than cores and the cpu is oversubscribed. But if a single process takes much longer than the others, eventually it will be left running on it's own on a single thread, which is not very optimal. Scraping the output, I was able to see this behaviour in my own build (names removed to protect the innocent):

one of these things is not like the other

(n.b. Because of the probably indeterminate nature of oversubscription, it's not truly fair to compare any of the specific numbers in the above diagram, but for general magnitudes we can use it)

Essentially, the AOT of one assembly is responsible for blowing out the build time by 10+minutes. In my case, the functionality being used in that library was something that could be replicated natively without too much hassle, so I switched to that and removed the assembly. Another option would have likely been to link aggressively on that assembly to remove more of the code causing the AOT work (my guess - heavy use of generics).

how did I produce this? Probably you can do something smart with binlogs, but I just scraped the msbuild output. First, log the build to a file by adding -flp:v=diag -flp:logfile=mylog.log to your build arguments. Then, use this gist to process the file.

💡 Opting to interpret some assemblies

This is more of a build size vs performance tip, and that should be your driving factor for this (not release build time), but I'll include it anyway. For a while now, we've had access to the interpreter option which enables various scenarios. In new dotnet ios, having it enabled is currently something you probably need to do because it is easy to unintentionally trigger code-gen (my theory is we had some special BCL assemblies in Xamarin days that avoided code-gen in certain methods but now that we share with dotnet you can hit it more easily). But don't just enable it and interpret everything!

Don't: (enable the interpreter and interpret all our assemblies)

<UseInterpreter>true</UseInterpreter>  

Do: (enable the interpreter and interpret none of our assemblies, but be ready to interpret any codegen)

<UseInterpreter>true</UseInterpreter>  
<MTouchInterpreter>-all</MTouchInterpreter>  

Doing the first one will skip AOTing everything, so I guess in the spirit of this blogpost it's going to make your release builds super fast and your outputs super small, but it's also going to make things much slower.

Consider: (enable the interpreter and interpret specific assemblies)

<UseInterpreter>true</UseInterpreter>  
<MTouchInterpreter>-all,AssemblyToNotAOT1,AssemblyToNotAOT2</MTouchInterpreter>  

Here we name AssemblyToNotAOT1 and AssemblyToNotAOT2 as assemblies that will be interpreted at run-time, so they won't be AOT-compiled at build time. This will reduce output size and release build time

💡 Getting an M2

No no... just do some of the above 😎


By combining the use of arm64-specific binaries and maybe a few build tips, hopefully you can see an improvement in your release-build times like I did, plus better battery life and an improvement in your general health and wellbeing.

Finally, everything as it should be 🏝️💻 . . .