Rhyme logo
News·February 3, 2026·Rhyme

The LastMile Problem with AI-Generated Code

We've reviewed dozens of AI-generated codebases. Impressive demos, but missing everything that matters for production.

It Looked Done

A client came to us last year with an application built almost entirely with AI tools. The demo was impressive — clean UI, responsive, all the right features. They'd gone from idea to working prototype in under two weeks. They wanted us to "just deploy it."

We ran our standard review. The results weren't unusual — we've seen this pattern across dozens of codebases now — but they were sobering:

  • No input validation on any form. SQL injection would've been trivial.
  • Error handling was console.log statements. In production, that means silent failures.
  • The app made 47 API calls on page load. Forty-seven.
  • Zero tests. Not "bad tests" — literally zero.
  • Secrets hardcoded in three separate files.

The prototype worked beautifully. The production application didn't exist yet. That gap — between "it works on my machine" and "it works for 10,000 users at 3 AM when the database hiccups" — is what we call the LastMile problem.

The 70% Illusion

AI gets you 70% of the way, fast. Genuinely fast. We use these tools ourselves every day at Rhyme. But that remaining 30% isn't easier — it's actually harder, because the illusion of completeness makes teams skip it.

Veracode's 2025 report found that nearly half of all AI-generated code contains security flaws. Not obscure theoretical vulnerabilities — real ones. XSS, injection, authentication bypasses. The code compiles, the tests pass (when there are tests), the feature works. It's just not safe.

We've been shipping production software for over a decade — for PostNord's logistics operations, ATG's betting platforms, Zound Industries' consumer electronics, MAIA Universe's music licensing system. We know what "production-ready" actually means because we've been responsible for keeping these systems running.

What We Keep Finding

After reviewing AI-generated codebases across different teams and industries, the gaps are remarkably consistent:

Security is an afterthought. AI writes the happy path. It doesn't think adversarially. It won't add rate limiting because it doesn't know about your API costs. It won't sanitize inputs because nothing in the prompt said "assume users are malicious."

No failure modes. What happens when Stripe is down? When the email service returns a 500? When someone uploads a 2GB file? AI-generated code assumes everything works. Real software assumes everything breaks.

Performance at demo scale. The app works for one user clicking through a demo. At real load — concurrent users, cold starts, database contention, CDN cache misses — the architecture falls apart because nobody designed for it.

The dependency problem. AI loves pulling in packages. We've seen AI-generated projects with 400+ dependencies for a simple CRUD app. Each one is an attack surface, a maintenance burden, and a potential breaking change.

Why We Built LastMile

We kept seeing this pattern with enough frequency that we formalized our approach. LastMile is four phases:

Audit — Our team reviews architecture, security, performance, and dependencies. Not a surface scan — a real review by engineers who've seen what breaks in production across industries from healthcare (Dentech) to media (SplayOne) to public sector (PostNord).

Harden — We fix what matters. Input validation, error handling, rate limiting, secret management, access control. The work that's invisible when done right and catastrophic when skipped.

Integrate — Real-world connections. Auth flows, payment processing, monitoring, alerting, logging. The infrastructure layer that separates a prototype from a product.

Launch — CI/CD pipeline, meaningful test coverage, deployment runbooks, and 30 days of post-launch support. Because the first month in production always reveals something.

This Isn't Anti-AI

Let's be clear: we're not writing this from a position of "AI is bad, hire humans." Our team uses AI tools extensively. They make us significantly faster at the work we're already good at.

But that's the key phrase — "already good at." AI amplifies what you bring to the table. When a team with 10+ years of production experience uses AI, they get speed and quality. When someone with no engineering background uses AI, they get speed and a very convincing-looking prototype that may not survive its first real user.

The best software being built right now comes from experienced teams with AI tooling. Not one or the other. Both. That's been our bet at Rhyme, and everything we're seeing in the market confirms it.