How We Use AI in Development (Without Losing Control)

AI coding assistants are everywhere now. The discourse tends to swing between two extremes: either AI is going to replace developers entirely, or it produces garbage that no serious team would ship. Our experience is neither.

Most of our code is written with AI assistance. Implementation, tests, refactoring. AI is deeply embedded in how we work, and we think that’s a strength when paired with the right oversight. Here’s how we think about it.

Experienced engineers driving AI, not the other way around

AI is excellent at implementation. Given a clear specification, well-defined boundaries, and a codebase with strong conventions, it produces working code quickly. The key word is “given.” The quality of AI output is directly proportional to the clarity of the input.

That clarity doesn’t come from clever prompting. It comes from years of building systems. Humans bring experience in systems architecture, domain modeling, and distributed systems design. That experience is what makes AI productive rather than dangerous.

When we scope a feature, we’re drawing on that experience to identify edge cases before they become bugs, define boundaries that will hold as the system evolves, and decompose work into pieces that are small enough for AI to handle cleanly. Good decomposition is the difference between AI producing correct, maintainable code and AI producing a plausible-looking mess. It’s a skill that comes from building systems, not from prompting.

Think about it like a general contractor directing specialists. The contractor doesn’t need to personally wire every outlet or plumb every pipe. But they need to know whether the work was done correctly, whether it meets code, and how it interacts with the rest of the structure. That judgment comes from experience, not from watching someone else do it.

The workflow

A typical feature flows like this:

Define the requirement. What problem are we solving? What’s the scope? What are we explicitly not building? AI helps enumerate considerations and surface edge cases, but the judgment about what matters comes from engineering experience and understanding the product.

Design the approach. Where does this live architecturally? What are the interfaces? How does it interact with existing bounded contexts? These decisions require understanding the long-term evolution of a multi-domain platform. AI can propose options. Knowing which option is right takes engineering maturity that only comes from having built (and maintained) complex systems. When we were designing VegaLoop’s architecture, for example, AI didn’t decide that domain logic should be isolated from infrastructure concerns. Engineers made that call based on years of watching tightly-coupled systems become unmaintainable.

Decompose into tasks. Break the work into small, well-bounded pieces with clear contracts. This is where the real leverage is. A well-scoped task with clear inputs, outputs, and constraints produces clean AI-generated code almost every time. A vague task produces vague code, regardless of the tool.

Implement. AI generates code within the constraints established above. It handles the mechanical work: translating specifications into functions, writing tests against defined behavior, generating boilerplate that follows established patterns. The quality of the output reflects the quality of the scoping.

Review. Every change gets reviewed. We verify it matches intent, check for edge cases the specification might have missed, and ensure it fits the broader system. This step is non-negotiable. AI is confident regardless of whether it’s correct, and plausible-looking code that subtly misses the point is worse than obviously broken code.

Verify. The compiler, test suite, and CI pipeline checks catch mechanical errors. Linting enforces conventions. In Rust specifically, the type system catches an enormous class of structural errors at compile time, which makes AI-generated code much easier to validate quickly.

Why Rust makes this work

Rust’s compiler acts as a second reviewer that never gets tired. In a dynamically typed language, AI-generated code might look correct, pass a linter, and fail at runtime in edge cases you didn’t think to test. In Rust, type mismatches, ownership violations, and non-exhaustive pattern matches are caught immediately.

This tightens the feedback loop dramatically. You can iterate quickly with AI because the compiler provides an immediate, exhaustive correctness check on the structural level. Human review then focuses on the semantic level: does this code do the right thing, not just a type-safe thing?

There’s another benefit specific to our stack. Rust’s trait system and module boundaries create natural “fences” for AI-generated code. When a function signature specifies exactly what types go in and what types come out, AI can’t easily drift into producing something structurally incompatible with the rest of the system. The constraints baked into the language act as guardrails that keep AI output aligned with the architecture.

We wrote about why we chose Rust for other reasons too. Performance, memory safety, and the serverless cost model all factored in. But the AI development story has become one of the strongest arguments for the choice. A language that forces correctness at compile time pairs remarkably well with a tool that produces confident-looking code regardless of whether it’s correct.

The scoping problem nobody talks about

Most of the conversation around AI coding tools focuses on the generation step. Which model writes better code. Which IDE integration is smoother. Which assistant handles more context. That’s focusing on the least important part of the process.

The hard part is everything before generation. What should this code do? Where should it live? What are the boundaries? What happens when the network is down, the user sends garbage input, or two requests arrive simultaneously? These questions require understanding the system holistically. They require knowing what has broken before in similar systems, and why.

AI doesn’t have that context. It can’t tell you that a particular abstraction will leak under load because it saw the same pattern fail three years ago in a different system. It can’t tell you that a given data model will cause migration headaches in six months because it’s seen that shape before. Engineers carry that scar tissue. It’s what makes their “simple” designs actually simple rather than simplistic.

We’ve seen this firsthand designing cross-domain intelligence features. Connecting nutrition data to activity data to recovery signals involves subtle domain modeling choices that ripple through the entire system. Getting those choices wrong doesn’t produce compile errors. It produces a system that works today and becomes increasingly painful to extend tomorrow. AI won’t flag that. A human engineer will.

How we structure context for reliable output

One thing we’ve learned: the quality of AI output depends heavily on how much relevant context it has access to. Not more context. Relevant context. Dumping an entire codebase into a prompt doesn’t help. Curating the right examples, the right type signatures, and the right architectural context does.

For us, that means keeping modules small and well-documented. Not for AI’s benefit specifically, but because the same qualities that help humans understand code also help AI produce code that fits within it. Clear naming conventions. Consistent patterns. Explicit type boundaries. Documentation that explains why, not just what.

We also rely heavily on existing code as examples. When AI needs to implement a new API handler, showing it three existing handlers that follow our conventions produces better results than describing those conventions in prose. Pattern matching is something these models do well. Give them a pattern worth matching.

This is part of why conventions matter even more in an AI-assisted workflow. In a traditional workflow, inconsistent conventions cause confusion and slow down new team members. In an AI-assisted workflow, inconsistent conventions cause AI to generate inconsistent code. The cost multiplies because you’re generating more code faster. If the patterns aren’t clean, the mess accumulates faster too.

What we don’t do

We don’t merge without reviewing a change. AI confidence and human confidence are different things.

We don’t blindly accept AI’s design suggestions. It’s a useful thinking partner, but it optimizes for plausibility, not for the specific context of our system. A human engineer who understands the broader architecture catches when a plausible suggestion is wrong for our situation.

We don’t skip the scoping work because “AI can figure it out.” The scoping IS the work. The implementation is the easy part.

We don’t use AI for security-critical decisions without extra scrutiny. Authentication flows, authorization checks, data access controls. These get additional review because the consequences of subtle errors are high and the errors themselves can look perfectly reasonable in isolation. A function that checks permissions might be logically correct but checking against the wrong context. AI won’t catch that. A reviewer who understands the security model will.

We also don’t let AI make tradeoff decisions. Should we optimize for read latency or write simplicity? Should we denormalize this data or accept the join cost? These are product and architecture decisions that depend on understanding usage patterns, growth projections, and business priorities. AI can lay out the options. Choosing between them requires judgment that incorporates factors outside the code.

Measuring whether it’s actually working

Speed is the obvious metric. Features ship faster. But speed alone isn’t a useful signal if you’re shipping the wrong thing or accumulating technical debt.

We track a few things to stay honest. How often does AI-generated code need significant revision during review? If that number creeps up, it usually means our scoping has gotten sloppy rather than that the AI is getting worse. It’s a signal to invest more in the upstream work.

How often do bugs trace back to AI-generated code versus human-written code? In our experience, the answer is about the same rate. The bugs are different in character, though. AI tends to produce code that handles the happy path well but misses subtle edge cases around error states or concurrent operations. Human-written code more often has simple typos or copy-paste errors. Both need review. Neither is a reason to reject the tool.

How much time are we spending on review? If review time grows proportionally with generation speed, you haven’t actually gained anything. You’ve just shifted where time goes. We aim for review to be faster because the scoping phase catches most issues before code exists. Review should mostly be confirming that implementation matches intent, not discovering that intent was unclear.

AI as accelerator, not replacement

There’s a narrative that AI will replace developers. We don’t see it that way.

AI raises the floor on implementation speed while making the ceiling more valuable than ever. The design thinking, the system understanding, the judgment calls about what to build and how to structure it. Those skills become more important, not less, when implementation is cheap.

A developer using AI well isn’t doing less work. They’re doing different work. Less time on the mechanical. More time on the meaningful. Less time typing. More time thinking about whether the abstraction is right, whether the boundary will hold, whether the system will evolve gracefully.

The developers who thrive with AI are the ones who were already good at the parts AI can’t do: understanding the problem deeply, designing solutions that last, and knowing when something is wrong even if it compiles and passes tests.

Anyone can prompt AI to generate code. Knowing what code to ask for, where it should live, and whether the result is actually correct for your system? That takes human experience. That’s what we bring.

The human skills that matter more now

If implementation becomes cheap, what becomes expensive? Judgment. Taste. The ability to look at a system and know what’s missing before it breaks.

Code review is one example. Reading AI-generated code is a different skill than reading human-written code. AI code tends to be syntactically clean and well-structured but sometimes misses the forest for the trees. It might implement exactly what you asked for while missing that what you asked for was the wrong thing. Catching that requires understanding the problem, not just the solution.

System design is another. When you can implement anything quickly, the bottleneck shifts to deciding what to implement. Which features matter? Which abstractions will hold? Where should boundaries be? These questions get harder as systems get larger, and AI doesn’t make them easier. If anything, the ability to build faster makes poor architectural decisions more costly because you accumulate more code on top of a shaky foundation before discovering the problem.

Communication also matters more. Clear specifications produce better AI output. That means writing clearly, thinking precisely, and articulating constraints explicitly. Developers who can translate a fuzzy product requirement into a precise technical specification get more out of AI tools than developers who jump straight to coding.

More to come

We have more to say about using AI responsibly in a development workflow. Future posts will cover how we think about security when AI generates code that touches authentication and data access, how we evaluate whether AI-assisted development is actually making us better or just making us faster, and how we handle the training and onboarding side of working this way.

Building with AI is still early. The tools change fast. The models improve. The workflows evolve. What doesn’t change is the need for humans who understand what they’re building and why. AI makes good engineers more productive. We’re learning as we go, and we’ll share what works and what doesn’t.