Taming Rust Compile Times in CI
How we took our Rust CI builds from painfully slow to fast feedback.
rustci-cdRust’s compile times are the tax you pay for everything else it gives you. Locally, incremental compilation makes it manageable. But in CI, where every run starts from a clean slate, you feel the full weight of recompiling your entire dependency tree from source.
When our backend grew into a multi-crate workspace across several bounded contexts, CI builds had crept into the double-digit minutes. For a team that wants fast feedback on every PR, that’s not acceptable. Here’s what we did about it.
The problem: CI starts cold every time
Locally, cargo build is fast after the first compile because it only rebuilds what changed. But CI runners are ephemeral. Every job starts fresh. That means every build recompiles hundreds of dependency crates that haven’t changed since the last run.
Our dependency tree includes the AWS SDK, an HTTP framework, serialization libraries, and cryptography crates. That’s a lot of code to compile from scratch on every push. The AWS SDK alone pulls in dozens of service crates, each generating substantial code from Smithy models.
To put it in perspective: the overwhelming majority of our dependency graph is third-party crates — several hundred of them — that rarely change but get recompiled from scratch on every CI run.
Why Rust compiles are slow in the first place
Before jumping into solutions, it helps to understand what makes Rust compilation expensive.
Monomorphization is the big one. Every time you use a generic function with a concrete type, the compiler generates a specialized version. #[derive(Deserialize)] expands at compile time into a deserialization implementation for your struct, and then calling serde_json::from_str::<MyStruct>() monomorphizes the generic parsing code for that type. Multiply that across hundreds of types and you get a sense of the code generation burden.
Procedural macros add another layer. Macros like #[derive(Serialize, Deserialize)] run at compile time and generate code that then needs to be compiled itself. They’re incredibly useful, but they’re also invisible compilation work.
Then there’s LLVM. Rust’s backend passes code through LLVM’s optimization pipeline, which is thorough but not fast. Each codegen unit goes through multiple optimization passes before producing machine code.
Understanding these costs informed our strategy. We couldn’t eliminate monomorphization or macro expansion, but we could reduce how often they happen in CI and control how much optimization LLVM does.
Step 1: Compilation caching with sccache
The single biggest improvement came from adding sccache with a cloud storage backend. sccache sits between Cargo and the Rust compiler, caching compiled artifacts by their inputs (source hash, compiler flags, target). When the same crate is compiled with the same inputs, sccache returns the cached result instead of recompiling.
On a warm cache, dependency compilation drops from the bulk of the build to a small fraction of it. The cache persists across CI runs, so only genuinely new or changed code gets compiled.
The key insight: we stopped using traditional CI caching (which caches the entire target/ directory and often has poor hit rates due to path sensitivity) and switched entirely to sccache. It’s more granular, more portable across jobs, and doesn’t bloat over time.
Our sccache configuration points at an S3 bucket. One gotcha we hit early: sccache needs consistent compiler flags across runs to get cache hits. If anything changes in your compilation-relevant inputs (compiler version, rustc flags, or feature flags), the cache key changes and you get a miss. We pin our Rust toolchain version explicitly in rust-toolchain.toml and keep our CI build inputs deterministic.
We set a TTL on cached objects with S3 lifecycle policies handling cleanup automatically.
Our CI platform’s built-in cache (persisting target/) was our first attempt. The problems: it’s keyed on Cargo.lock hash (any dependency update invalidates everything), uploading a multi-gigabyte directory adds 2-3 minutes per run, and it’s path-sensitive. sccache operates at individual compilation unit granularity, so a single dependency update only invalidates that crate and its downstream dependents, not the entire target directory.
Step 2: Tuning Cargo profiles for CI
Production builds and CI builds have different goals. Production wants small, fast binaries. CI wants fast compilation. We tuned our Cargo profiles accordingly.
In CI, we disable LTO and increase codegen units. Link-Time Optimization produces smaller binaries but dramatically increases link time. Disabling LTO and increasing codegen units (which parallelizes code generation) substantially cuts our link time.
In production, we use thin LTO. The CD pipeline uses different profile settings, accepting slower builds for better runtime performance. For our production binaries, where startup time and size matter, the smaller binary and better cross-crate inlining is worth the extra build time.
We disable debug info in CI. Setting debug = 0 skips generating DWARF debug information entirely. Tests run under this same ci profile, trading panic line numbers for faster compiles.
[profile.ci]
inherits = "release"
opt-level = 2
lto = "off"
codegen-units = 16
debug = 0
incremental = false
strip = "symbols"
[profile.deploy]
inherits = "release"
opt-level = 3
lto = "thin"
codegen-units = 1
debug = 0
strip = "symbols"
The ci profile inherits from release so we test against optimized code (catching optimization-related bugs) without paying the full optimization cost.
Step 3: Parallelizing the pipeline
Our original CI ran sequentially: build, then lint, then test. We restructured into parallel jobs:
- Build job: Cross-compiles the release binaries and runs clippy (which needs the same compilation work anyway). Populates the sccache.
- Test job: Runs in parallel, compiling under the
ciprofile with its own sccache hits. - Other checks (frontend, infrastructure, security scanning) run simultaneously.
Wall-clock time is now determined by the slowest parallel job, not the sum of all jobs.
The restructuring required thinking about which jobs actually depend on each other. Clippy and the build share compilation work, so they belong together. Tests compile under the ci profile (with test harness code) so they benefit from their own cache entries but can run in parallel with the build.
We also moved cargo deny (license auditing) and our security tooling into their own parallel job. These don’t compile your code. Our security tooling checks Cargo.lock against advisory databases, and cargo deny examines dependency metadata and license files. Both finish quickly and never block the main build.
Step 4: Single compilation pass for multiple binaries
We have several binary entry points that share most of their code. Originally each was built separately, duplicating compilation work. We restructured to build all binaries in a single cargo build --workspace --bins invocation, which lets Cargo share compilation across targets. (The --workspace flag matters: our workspace root is itself a package, so a bare cargo build --bins only builds that root binary and skips the other member-crate binaries.)
Our workspace has a handful of binary targets that share a large common core. Building them individually means redundant orchestration overhead and missed scheduling opportunities: Cargo can’t see the full dependency graph, so shared dependencies may be compiled efficiently but the builds themselves aren’t coordinated. A single invocation lets Cargo schedule all targets and their dependencies in one pass, reducing overhead and improving parallelism across the dependency graph.
Step 5: Avoiding unnecessary recompilation triggers
Some recompilation is caused not by code changes but by metadata changes that invalidate the cache.
We bake the git commit SHA into the binaries at compile time. It originally lived in our shared crate — which every other crate depends on — so the per-commit value invalidated shared and its entire downstream graph on every build. We moved it into a tiny build-info leaf crate that only the binaries depend on, so now only that crate and the binaries recompile when the SHA changes.
Cargo feature flags also change the compilation inputs. If one job enables a feature that another doesn’t, they can’t share cache entries. We standardized feature flags across all CI jobs so sccache entries are maximally reusable.
The results
The warm build — the common case, where dependencies haven’t changed — is the one that matters for daily development. With a populated cache, only your actual code changes get compiled; the hundreds of unchanged dependency crates come straight from sccache instead of going through the compiler. Cold builds, after a dependency bump, are slower but still benefit from the profile tuning and parallel jobs. The upshot: most PRs get fast feedback instead of a long wait.
Monitoring build performance over time
Optimizing CI once isn’t enough. Build times creep up as you add code and dependencies. We built monitoring into our pipeline to catch regressions early.
Every CI run surfaces sccache statistics — total requests, hits, misses, and time saved. A sudden drop in hit rate usually means something changed in the environment that’s invalidating entries unnecessarily.
We also keep an eye on total CI wall-clock time. When it trends upward, the culprit is usually a new heavy dependency. The AWS SDK is notorious for this: adding a single service crate can pull in dozens of transitive dependencies. We review dependency additions with the same scrutiny we give to code changes.
What we learned
CI builds and deployment builds serve different purposes. The CI build answers “is this code correct?” as fast as possible. The deployment build answers “is this binary production-ready?” with different priorities. Trying to make one build serve both purposes means compromising on both.
Compilation caching beats target directory caching. Don’t cache the output directory. Cache the compilation work itself. It’s more robust, more granular, and more portable.
Parallelism is free speed. Audit your pipeline for false dependencies. Does your test job really need the build job to finish first? In Rust, probably not, since cargo test compiles its own artifacts anyway.
Measure before optimizing. Without visibility into what’s actually being cached vs recompiled, you’re guessing. The first time we saw that our shared crate — and everything depending on it — was recompiling on every run because it baked in the per-commit git SHA, the fix paid for itself immediately.
Workspace structure is a compilation strategy. Small, focused crates with clear dependency boundaries compile faster incrementally than large monolithic crates. The upfront cost of splitting your workspace pays dividends every time someone pushes a commit.
Rust’s compile times are a real cost. But with the right caching strategy, profile tuning, and pipeline structure, you can make CI feel responsive even with a substantial codebase. Fast feedback means developers stay in flow. A long wait means they context-switch, lose focus, and batch up changes into larger, harder-to-review PRs. Fast CI isn’t just a developer experience nicety. It’s a force multiplier for code quality.