Why do 95% of AI pilots fail to reach production?

MIT NANDA's State of AI in Business 2025 — a multi-method study reviewing 300+ initiatives with 52 interviews and 153 surveys — found 95% of enterprise generative-AI efforts delivered no measurable P&L impact and only ~5% reached production. The bottleneck is the production gap (data readiness, governance, evaluation, monitoring, ownership), with poor data readiness the single most-cited technical cause — the pilot ran on a clean curated slice, production runs on messy, permission-controlled, fragmented live data. There is also a decision-side failure: Harvard Business Review's finding that 40–60% of B2B initiatives are lost to inaction rather than to a competitor applies here too — many pilots simply never get a decision and sit in the demo until budget moves. The Goldsmith Method for Production AI treats closing that gap as the actual deliverable, not the model.

Is it cheaper to build or buy an AI solution?

Buying is materially cheaper up front — roughly $150K–$500K over three years versus $1.4M–$2.3M to build — but the more useful question is where the lines cross. Buy's cost is usage-elastic: per-seat or per-call pricing is cheap at typical volumes but scales with adoption. The crossover is driven by a handful of variables — your interaction volume, tokens per request, retries, and the vendor's per-seat/per-call price set against a standing team's fixed cost — so directionally, monthly buy cost ≈ volume × price-per-interaction, and build crosses over once that exceeds the fixed team line (a senior ML engineer alone runs ~$207K), and the workflow is differentiating enough to justify ~20–30% of build cost in annual maintenance forever. One build line teams forget: eval engineering alone is 30–40% of build cost (SFAI Labs 2026), and it's the most workload-specific part — a packaged tool ships it for you. Below that crossover, buy for commodity workflows; reserve building for capabilities worth funding a permanent team to own. The calculator computes where your own crossover sits from your volume.

Does building in-house or buying reach production more often?

Buying — and partnering. MIT NANDA's State of AI in Business 2025 found purchased AI solutions reached production roughly twice as often as internally built ones; 2026 analysts (TechAhead) read MIT's research as attributing a ~67% success rate to both the buy and the partner paths, versus roughly half that for pure internal build. Treat the exact 67% as the analyst gloss it is — the robust claim is the ~2× direction, and that it favors buying and partnering over building alone. Building has the higher ceiling for proprietary advantage but the lower base rate, so it suits durable moats rather than commodity work. Watch for the trap of treating a one-off internal build as "free" because the salaries are already on payroll — the success-rate gap is the hidden cost.

Should I do all three — is a hybrid approach better?

Yes — and treating it as a single pick is the more common mistake. The right answer is decided per workflow, not as one company-wide doctrine. Hybrid works on two axes. Architecturally, buy a mature base platform or model and build your own differentiating layer (prompts, retrieval, integrations, human checks) on top — the 2026 consensus answer for most strategic pilots. Temporally, sequence the paths: buy commodity workflows now, partner to ship your first stalled strategic pilot (build-operate-transfer, owned by you), staff an in-house team later. The risk to watch is letting the bought base dictate your architecture — keep the owned layer thin but genuinely yours.

Why not just hire ML contractors instead of a consultancy?

Sometimes you should. If you already know exactly what production-readiness requires for your pilot and just need hands to build it, one or two senior contractors can be cheaper and sufficient. The difference is accountability and risk: contractors are individuals you direct who carry the same key-person risk as a hire — the knowledge leaves when they do — whereas a partner brings a pre-built methodology, owns accountability for the production outcome, and bakes the handover to a named internal owner into the engagement. Rent contractors when the gap is "we lack hands." Bring a partner when the gap is "we're not sure what production-ready requires here, and we need it to stick."

How long does it take to get an AI pilot into production?

It depends on the path: buying a packaged tool takes 1–3 months (configure and integrate), a focused consultancy engagement takes roughly 6–12 weeks to close the production gap with your team, and building in-house realistically takes 9–18 months once hiring and ramp-up are counted. A partner is faster than the stalled in-house attempt because the gap work is a repeatable, pattern-based body of work assembled from existing templates with no hiring ramp — not a first-time build. The 9–18 month figure is the one teams underestimate most; it assumes you can hire specialists into a market with ~25–30% annual attrition. IDC/Lenovo's CIO Playbook 2025 found 88% of POCs never reach production at all.

What does "production-ready" actually require beyond a working demo?

Five things a demo skips: production-ready data (the demo ran on a clean curated slice; production runs on messy, permission-controlled, fragmented live data), an evaluation harness that catches regressions, governance and an audit trail that survive a security or compliance review, monitoring for drift and cost, and a named owner who can maintain it. The two most pilots skip entirely are the data-readiness work and the cost-monitoring line — a demo never has to reconcile the live data foundation, and it never has a finance team asking the per-request bill. The Goldsmith Method for Production AI works backward from all five and builds the scaffolding around your existing pilot, in your stack, so your team owns it.

Will a consultancy lock me into ongoing fees?

Not if the engagement is scoped honestly. A production-focused engagement is a one-time $150K–$400K cost over ~6–12 weeks, after which you carry only your own run-cost — because the system is built in your stack with IP assigned to you and knowledge transferred to your team. Insist that governance documentation and handover are explicit contracted deliverables, not afterthoughts; the lock-in risk lives in vague handover terms, not in the engagement model itself. The lock-in question cuts the other way too: when buying SaaS, confirm data, prompt, and log portability and an export path before you sign — see the buy section above.

Back to articles

June 19, 2026ROI & Business27 min read

Build vs. Buy vs. Hire a Consultancy in 2026: Which Gets Your AI Pilot to Production?

Build vs buy vs hire a consultancy for AI in 2026: an honest, sourced comparison of which path gets your stalled AI pilot to production.

QAIZEN

AI Governance Team

📖What is this?

The production gap

The distance between an AI pilot that works in a demo and a system that survives production — production-ready data, an evaluation harness, governance and an audit trail, monitoring, and a named owner. It is where most pilots stall, and the three paths (build, buy, partner) are three answers to who closes that gap and who owns it afterward.

95%

of enterprise GenAI initiatives showed no measurable P&L impact

Source: MIT NANDA 2025

88%

of AI proofs-of-concept never reach production

Source: IDC/Lenovo 2025

~2×

bought & partnered solutions reach production vs. built

Source: MIT NANDA 2025

Key Takeaways

Most AI pilots never ship: 95% of enterprise GenAI initiatives showed no measurable P&L impact (MIT NANDA 2025).
Your real 2026 choice is rarely "which model" — it is build in-house, buy a packaged tool, or partner (build-operate-transfer) to close the production gap.
Bought and partnered solutions reach production roughly twice as often as internally built ones (MIT NANDA 2025).
Buy commodity workflows; build a durable proprietary moat with a standing team; partner a stalled, governance-shaped pilot you want to own.
The decision is per-workflow, not company-wide — and the calculator turns the ranges below into a number for your specific pilot.

What is the honest cost and timeline of build vs. buy vs. consultancy in 2026?

Most pilots stall for the same reason: the demo and the production system are different engineering problems. The gap between them — production-ready data, evaluation, governance, monitoring, and a named owner — is where pilots stall.

The three paths below are really three answers to who closes that gap and who owns it afterward. Here is what each costs and how long it takes, on the same units.

The single most uncomfortable fact to put on the table up front: MIT NANDA's State of AI in Business 2025 found that purchased AI solutions reached production roughly twice as often as internally built ones. If your use case is a commodity workflow, buying is not the lazy answer — it is statistically the more likely path to a system that actually ships.

For a commodity workflow, a mature 2026 tool like Glean or Microsoft Copilot (or Harvey for legal-style drafting, Salesforce Einstein and ServiceNow Now Assist for CRM/ITSM-embedded assistants, Decagon for support automation) already ships the evaluation, governance scaffolding, and monitoring that an in-house build would have to write from scratch.

As of mid-2026, here is how the three paths compare on like-for-like units:

Honest build vs. buy vs. consultancy comparison for getting an AI pilot to production, scored through the Goldsmith Method for Production AI (data readiness, evaluation, governance, monitoring, ownership). The consultancy column describes the partner / build-operate-transfer shape: a third party builds an asset you own. Cost figures are 2026 practitioner and agency estimates with a commercial stake; the senior-ML salary input is the structurally-neutral exception. No single column wins on every criterion.
Criterion	Build in-house	Buy (SaaS / packaged)	Hire a consultancy (partner / BOT)
Time-to-production (months)	9–18 months — hiring, ramp-up, and the production gap are all yours	Strongest option: 1–3 months — fastest; configure and integrate, not build	1.5–3 months (≈6–12 weeks) — gap closed with you, then handed over
Total cost (3-year)	Weakest on this axis: $1.4M–$2.3M (Year 1 ≈ $900K–$1.35M)	Strongest option: $150K–$500K — lowest at typical volumes; usage-elastic at scale	$150K–$400K one-time engagement, then your run-cost only
Governance / audit trail	Weakest on this axis: DIY — exists only if you staff and prioritize it	Vendor-defined — you inherit their controls, not yours	Strongest option: Deliverable — specified, built, and documented as part of scope
Key-person risk	Weakest on this axis: High — concentrated in a small specialist team; AI/ML attrition runs ~25–30%/yr	Strongest option: Low — vendor absorbs staffing risk	Medium — mitigated only if knowledge transfer is contracted
Maintenance burden	Weakest on this axis: You (permanent) — ~20–30% of build cost per year, forever	Strongest option: Vendor — included in subscription	You (post-handover) — but on a system designed to be maintainable
Control & IP	Strongest option: Full — you own every line and every decision	Weakest on this axis: Low — vendor owns the core; you own configuration and data	High (if contracted) — built in your stack, IP assigned to you at handover
Best for	Durable proprietary advantage + a standing ML team	Commodity workflows where speed and low cost win	Stalled pilots needing governance and a fast, owned path to production

A few notes on reading this table honestly:

No single column wins on every criterion. Buy wins decisively on speed and key-person risk, and on cost at typical volumes, but loses on control and IP. Build wins on control and durable advantage, but is the slowest and most expensive. A consultancy wins on closing a governance-shaped production gap quickly while leaving you the owner — but it is not free, and not the right tool for a commodity workflow.
Buy is cheapest at normal scale, but its cost is usage-elastic. Per-seat and per-call pricing scales with adoption, and at high volume it can cross over a build's fixed team cost. Directionally, your monthly buy cost is roughly interaction-volume × price-per-interaction, while a build's cost is a fixed standing-team line that barely moves with volume. The headline year-one price is the most misleading number in any buy decision. Buy still wins the cost row at typical volumes; where your own crossover point sits depends on those variables, and the calculator estimates it.
Cost ranges are 2026 practitioner and agency estimates — every source for them has a commercial stake; none is "vendor-neutral." The consultancy band sits inside published 2026 partner-engagement ranges (Xenoss puts a strategic AI partnership at $100K–$500K initial). The one structurally neutral input is the senior-ML labor line: Glassdoor's mid-2025 data put the median U.S. senior machine-learning engineer total compensation at roughly $207K.
Time rows are unit-normalized to months across all three columns so you are comparing like with like, not "weeks" against "quarters."

Why do so many AI pilots stall before production in 2026?

The honest reason most pilots stall is the gap named above: the demo proves feasibility; production demands production-ready data, evaluation, governance, monitoring, and a named owner. IDC/Lenovo's CIO Playbook 2025 found 88% of proofs-of-concept never reach production, and MIT NANDA's State of AI in Business 2025 reports 95% of enterprise generative-AI initiatives showed no measurable P&L impact — the gap is the rule, not the exception.

The pilot that wins applause in a demo is not the system that survives its first week in production. A demo has to convince a room of people who want to be convinced. Production has to survive adversarial inputs, regulatory questions (increasingly the EU AI Act for anyone serving EU users), on-call engineers at 3am, a finance team asking what it costs per request, and a security review that wants an audit trail.

That same study frames the drop-off as a funnel — and it is the page's signature stat:

Evaluation

60%

Pilot

20%

Production

Of organizations engaging with AI in 2025, roughly 60% evaluated tools, about 20% advanced to a pilot, and only about 5% reached production deployment (MIT NANDA, State of AI in Business 2025).

The real enemy is not a competitor — it is the stalled pilot staying stalled. Harvard Business Review's classic finding, resurfaced across 2026 conversion research (via Unbounce), is that 40–60% of B2B deals are lost to inaction — no decision at all — rather than to a rival. The same physics governs an AI pilot: the most common outcome is not "we picked the wrong path," it is "we never decided, so it sat in the demo until the budget moved on."

The single most-cited technical cause is data, not the model. A pilot is built and demoed on a clean, hand-picked slice of data; production runs on the messy, fragmented, permission-controlled live reality the curated slice was chosen to avoid. This data-readiness gap — missing fields, inconsistent formats, access rules that block the very records the system needs — is the failure cause practitioner post-mortems name most consistently. If the data foundation is not production-ready, no amount of model work ships the pilot.

MIT NANDA's State of AI in Business 2025 is the most-cited evidence of the scale of this gap. It is a multi-method study — the researchers reviewed more than 300 publicly disclosed AI initiatives, conducted 52 structured interviews, and ran 153 survey responses. Its headline finding is that 95% of enterprise generative-AI initiatives produced no measurable profit-and-loss impact, while only about 5% of custom-built tools reached production.

The same report finds that external partnerships reached deployment roughly twice as often as internally built tools — the buy-tilt this page returns to. MIT's research, as read by 2026 analysts (TechAhead), attributes a success rate of roughly 67% to both the buy and the partner paths — versus roughly half that for pure internal build. Treat the exact 67% as the 2026 analyst gloss it is, not a verbatim MIT figure; the robust claim is the ~2× direction, and that it favors both buying and partnering over building alone.

IDC and Lenovo's CIO Playbook 2025 corroborates the direction from a separate dataset: 88% of AI proofs-of-concept never reach production. They report the same graduation rate as a 33-to-4 ratio — for every 33 AI POCs launched, only four reach production — which is the same 88% finding expressed as a shipping rate.

The reason this matters for your build-vs-buy-vs-consultancy decision is simple: the bottleneck is almost never the model. It is the production gap — the data readiness, governance, evaluation harness, monitoring, and ownership that turn a clever demo into a system your organization can run, audit, and trust.

Score your pilot in 60 seconds

Before you read the "when each wins" sections, run your own pilot through six factors. Each is phrased as two poles — a build-leaning signal and a buy-leaning signal. For each factor, decide which pole your pilot sits closer to.

One thing to fix in your head first: this is a per-workflow call, not a company-wide doctrine. Most organizations end up buying some workflows, partnering one stalled strategic pilot, and building a rare few — so run this lens against each candidate workflow separately, not once for the whole company. (If you are not sure which AI workflows you even have in flight, a shadow-AI audit surfaces them first.)

Differentiation. Is this workflow your competitive edge, or a commodity task anyone could buy off a shelf? (Edge: build-leaning. Commodity: buy-leaning.)
Data sensitivity and readiness. Does it run on regulated or sensitive data that cannot leave your environment — and is that data actually production-ready, or only clean in the curated demo slice? (Cannot leave, or data foundation still needs work: build/partner-leaning. Fine on a vendor's platform and already clean: buy-leaning.)
Integration depth. How deep does it reach into your live systems? (Deep, stateful integration: build/partner-leaning. Standalone: buy-leaning.)
Team capacity. Do you have a standing team that can maintain this indefinitely? (Yes: build-leaning. No: buy/partner-leaning.)
Speed-to-value. Do you need it live in weeks, or can you wait quarters? (Weeks: buy/partner-leaning. Quarters acceptable: build-leaning.)
Maintenance appetite. Can you absorb roughly 20–30% of build cost in upkeep every year, forever? (Yes: build-leaning. No: buy/partner-leaning.)

Reading rule:

Mostly buy-leaning: Buy. Speed and cost win; let a vendor absorb the production gap.
Mostly build-leaning and you have a standing team: Build. The capability is the advantage and you can fund owning it.
A working-but-stalled pilot with a governance-shaped gap (sensitive data, deep integration, no standing team, needed soon): partner / transfer-of-ownership. Close the gap once, fast, in your stack, owned by you afterward.

If your factors are split and the answer is not obvious, that ambiguity is itself useful signal — the calculator below turns these six judgments into a cost, time-to-value, and readiness-gap estimate for your specific pilot.

When does building in-house genuinely win?

Build only when you can permanently fund a team — and when the AI capability is a durable, proprietary source of competitive advantage rather than a commodity workflow. The counterweight, stated plainly: building is the lowest-success-rate path of the three, so choose it when the ceiling is the point, not as a default.

Build when the model, the data, or the workflow is the product. If your edge depends on a system competitors cannot simply buy off a shelf — a proprietary ranking engine, a domain-specific model trained on data only you hold, a workflow so core that outsourcing it would outsource your moat — then building in-house is correct, and you should not hire a consultancy to do it for you.

The costs are unavoidable, so go in with the numbers in front of you. It is the slowest path (9–18 months once you account for hiring and ramp), the most expensive ($1.4M–$2.3M over three years, with Year 1 alone around $900K–$1.35M), and it carries permanent maintenance of roughly 20–30% of the build cost every year thereafter.

The single most underestimated line item is the evaluation engineering — building the eval suite and test harness, then re-running it against every new model and data shift. SFAI Labs' 2026 make-or-buy decision tree puts eval engineering at 30–40% of total build cost, and it is the most workload-specific part: it cannot be copied from anyone else, because it encodes what "correct" means for your data.

That is roughly a third of the build budget spent on exactly the scaffolding a packaged tool ships with by default. Key-person risk compounds it: AI/ML talent attrition runs around 25–30% per year, so a two-person "we built it ourselves" success can become a "nobody here understands it anymore" liability after a single resignation.

When does buying a packaged solution genuinely win in 2026?

For any commodity workflow, start here — the evidence favors it. MIT NANDA's State of AI in Business 2025 found bought solutions reach production roughly twice as often as built ones, which makes buying frequently the higher-probability path to a live system, not just the cheaper one. It is also the fastest path (1–3 months) and the lowest three-year cost at typical volumes ($150K–$500K).

If your problem is document summarization, transcription, support-ticket triage, meeting notes, enterprise search, or any of the dozens of AI workflows now well-served by mature vendors, the honest answer is: buy it. Enterprise search and knowledge assistants are handled by tools like Glean or Microsoft Copilot; legal-style drafting and review by Harvey; CRM- and ITSM-embedded assistants by Salesforce Einstein and ServiceNow Now Assist; high-volume customer support by Decagon.

Do not build what you can configure, and do not pay a consultancy to build what you can subscribe to. A packaged tool gets you to production in weeks, transfers staffing and maintenance risk to the vendor, and costs a fraction of a build over three years.

The honest caveat is that buy's cost is usage-elastic. Per-seat and per-call pricing is cheap at typical volumes but scales with adoption, and at high enough volume it can cross over a build's fixed team cost — with little negotiation leverage once you are locked in. The handful of variables that move that crossover are knowable up front: your interaction volume, the tokens per request, retries, and the vendor's per-seat or per-call price set against a standing team's fixed cost.

The most dangerous number in a buy decision is the year-one price, not the steady-state bill. So buy stays the cheapest option at normal scale, but know where your crossover point sits before you commit; the calculator computes it from your own numbers.

What you give up beyond cost-at-scale is control and IP. You inherit the vendor's governance model rather than authoring your own, you are exposed to their roadmap and pricing decisions, and the core capability is theirs, not yours — you own your configuration and your data, but not the engine. For a commodity workflow, that is usually a fair trade; for a strategic differentiator, it is not.

Buy well: check the exit and the fit before you sign. The risk competitors of the SaaS path treat as central is dependency, and it is worth pricing before you commit, not after. One 2026 enterprise survey (Zapier/Centiment, 542 U.S. executives) found 81% of leaders concerned about AI-vendor dependency, yet only 6% confident they could switch providers without material disruption. Before you sign, confirm four things:

Prove it on your reality, not the demo. Run a short paid pilot on your own representative — messy, permission-controlled — production data, not the vendor's curated demo dataset, against written numeric pass/fail accuracy criteria agreed before the demo. This is the single highest-signal check there is.
Confirm the exit. Confirm you can export your data, prompts, and logs on demand and that the contract names a portability and exit path.
Read the renewal clauses. Read the auto-renewal and consumption-tier escalation terms (annual price increases in the 10–30% range are common).
Price the switching cost. Price the cost of unwinding the integration you are about to build, before that integration becomes the lock-in.

A buy decision made with the exit understood and the vendor proven on your own data is a strong decision — this is how you choose Buy well, not a reason to avoid it.

When does hiring a consultancy genuinely win?

A consultancy fits cleanly in one situation: a pilot that already works in the demo but is stalled before production, where the missing pieces are production-ready data, governance, an evaluation harness, monitoring, and an owned path to ship. It is a one-time $150K–$400K engagement over roughly 6–12 weeks, not a permanent cost center — and what you are buying is transfer-of-ownership, not a subscription.

Analysts now call this the partner or build-operate-transfer path: a third party builds an asset you own, with IP assigned to you and knowledge transferred to your team, rather than a vendor renting you a black box. The visual shape of the engagement is an ownership baton that travels from the partner to you:

Partner builds
Built with your team
Handover (IP + named owner)
You own & operate

Build-operate-transfer: the partner builds the production scaffolding in your stack, with your team, then hands over the IP and a named internal owner — so you own and operate the system afterward, not rent it.

The consultancy is the wrong tool for a commodity workflow (buy that) and the wrong tool for building your permanent core moat (staff that). Its genuine fit is real: you have momentum — a pilot that works — but you lack the production scaffolding to ship it, and you lack the standing team to build that scaffolding quickly without it becoming a 12-month hiring project.

A good engagement closes the production gap with your people, documents the governance and audit trail as an explicit deliverable, and leaves your team able to maintain what it inherits — moving you up the AI governance maturity curve in the process.

What this looks like in practice. Two anonymized patterns, shaped to the horizontal niche this page addresses — teams on regulated or sensitive data, no industry attached:

A team working on access-controlled, permission-sensitive data had a pilot that demoed cleanly for roughly eight months and stalled on exactly two things: it had no evaluation harness to prove it stayed accurate, and no audit trail a compliance review would accept. The gap closed in about nine weeks — harness, governance documentation, and monitoring built in their own stack — and the system shipped under their own ownership, run by a named internal engineer who was in the room throughout.
A team with a working pilot on sensitive internal records was about to staff a multi-quarter in-house build to productionize it. A Readiness Review found the gap was governance-and-monitoring-shaped, not a research problem, and scoped it to a fixed engagement; the owned scaffolding was handed over in weeks rather than the hiring project they had budgeted.

These are pattern-shaped, not testimonials — the point is the shape of the work repeats.

What you actually receive. A Readiness Review is a written gap-map across data readiness, evaluation, governance, monitoring, and ownership — the things that decide whether a pilot ships, mapped to whichever framework governs you (NIST AI RMF vs the EU AI Act) — plus a go/no-go recommendation and a cost-and-timeline estimate to close each gap. A full engagement hands over the built scaffolding in your stack (the evaluation harness, monitoring, and governance controls), the governance and audit documentation, the running system itself, and a knowledge-transfer handover to a named internal owner. These are artifacts you keep, not a slide deck of advice.

What the Readiness Review itself costs. Because both CTAs on this page ask you to start here, its shape should be as honest as the full-engagement band. The Readiness Review is a fixed-fee, bounded engagement — a low-four-figure to low-five-figure review depending on the pilot's complexity — delivered in a matter of days to about two weeks, and creditable against a full engagement if you proceed. It is deliberately priced as an on-ramp you can approve without a procurement cycle.

Where the boundary is — and what happens if your gap is the hard one. The 6–12 week, $150K–$400K band assumes a specific shape: a pilot that genuinely works in the demo and a gap that is data/eval/governance/monitoring-shaped, not a fundamentally broken model or an unsolved research problem. If the underlying model does not actually work, or the data foundation needs rebuilding from scratch, that is a different and larger piece of work. The Readiness Review exists precisely to confirm the gap is the shippable shape before a fixed-scope engagement is quoted. If it is not, the Review says so, and you have spent a small, bounded amount to learn that rather than a large one.

Why a partner is faster than the in-house attempt that already stalled. The production gap is not a novel research problem — it is a repeatable, pattern-based body of work (eval harness, governance, monitoring, handover) that a team which has done it before assembles from existing templates rather than inventing from scratch, with no hiring ramp. The in-house attempt is slow precisely because it is a first-time build competing with the day job; a partner is fast because it is the hundredth time, not the first.

Where your data lives — and who gets inside the walls. Because the work happens in your own environment and stack, your data stays under your control — it does not move to a vendor platform. Engagements run under NDA, and for teams on regulated or sensitive data the audit trail itself is one of the deliverables, not an afterthought. The people-access question — who gets credentials, and at what level — is yours to control, not the partner's: your security team scopes and grants partner access under least-privilege, inside your environment, with your offboarding process.

Why hand the production outcome to a partner rather than rent contractors? A budget-holder will reasonably ask why not hire one or two senior ML contractors for less. Sometimes that is the right call — if you already know exactly what is missing and just need hands to build it, contractors can be cheaper and entirely sufficient. The honest distinction: contractors are individuals you direct and who carry the same key-person risk a permanent hire does — when they leave, the knowledge leaves with them. A partner brings a pre-built methodology and accountability for the production outcome itself, and bakes the handover to a named internal owner into the engagement.

Why the handover actually sticks. Knowledge transfer fails when it is bolted on at the end as a contract clause the buyer has to police alone. It sticks when the system is built with your team rather than for them, documented as it is built, designed to be maintainable, and assigned to a named internal owner identified during the work, not after. "Owned by you" should be verifiable in the artifacts you hold — running system, documentation, a named owner who was in the room — not a promise you have to chase.

This is what the Goldsmith Method for Production AI is built to do: treat the production gap as the deliverable, not the model. It works backward from "what does a system need to survive a security review, a finance review, and an on-call rotation" — data readiness, evaluation, governance, monitoring, ownership — and builds exactly that scaffolding around your existing pilot, in your environment, with a handover that leaves your team able to run it. Each stage produces an inspectable artifact, which is what gives the "hundredth time" claim something concrete behind it.

There is a second, increasingly common shape worth naming: buy a mature base, then build your own differentiating layer on top. The dominant 2026 practitioner consensus is that the right architecture for most strategic pilots is neither pure-build nor pure-buy — it is a bought foundation (a base platform or model) plus a thin owned layer of prompts, retrieval, integrations, and human checks that encode your advantage. A partner engagement is frequently exactly how that owned layer gets built and governed on top of a bought base.

The honesty test for any consultancy, including the one writing this page: if your situation is "buy the SaaS" or "build the moat in-house," a good consultancy should tell you so — and a go/no-go Readiness Review that concludes you should buy or build instead is a valid deliverable, not a failed sale. Read the table above and you will see Buy winning several rows outright — that is the comparison working as intended.

Why trust an unnamed team with a stalled, regulated-data pilot?

This page is published under a team name, not a personal brand, and the fair question from any budget-holder is: anyone can write a polished page — why should I believe you can do this? The honest answer is that you should not have to trust the brand up front. The engagement is structured so the first thing you receive is something you keep and can evaluate before you commit to anything larger.

Judge the artifacts, not the name. The Readiness Review is a written document — a gap-map, a go/no-go verdict, and a cost-and-timeline estimate — that is yours whether or not you proceed to a build. You evaluate the quality of that document directly; nothing about the decision rests on taking our word.
The track record has a shape you can recognize. The two anonymized patterns above are the repeatable shape of the work, not one-off luck. The methodology that produces them is itself inspectable: the gap-map template and stage deliverables can be reviewed before you commit.
The walk-away is the deliverable, not a failure. If the Readiness Review concludes you should buy a packaged tool or build in-house, that conclusion is the work product — you owe nothing further.
The handover is verifiable. The named internal owner is identified during the work, not promised after it, and that person can confirm the handover stuck. "Owned by you" is checkable in the artifacts you hold.

The experience behind the method. Anonymity is about the author, not the depth behind the work. The team has authored published technical books and video courses on IoT and cloud architecture (Éditions ENI), has trained engineers at and delivered production systems for Fortune-500 industrial, energy, and luxury-retail organizations, and operates its own production AI agents — the Labs advisor and the ROI calculator on this very site are systems we built, run, and govern under the same method this page describes. The point is not credentials for their own sake: it is that the production-gap work described here is drawn from systems actually shipped and run, not theorized.

Who you actually contract with. Anonymity is about the author, not the entity. The engagement is signed with a registered legal company — QAIZEN TECH DWC-LLC — so your procurement and security teams have a real counterparty to onboard: an NDA and MSA to sign, a named jurisdiction, and the standard procurement-grade basics handled the way any enterprise vendor review expects. The team is anonymous; the company you sign with is not.

You don't have to trust us up front. You trust the structure: a real, signable entity, a track record with a recognizable shape, and a bounded first deliverable you keep and can judge before committing to the full build.

How we built this comparison: the cost and timeline bands are triangulated from 2026 practitioner and agency estimates (Xenoss, SFAI Labs), a structurally-neutral salary input (Glassdoor mid-2025), and the primary research from MIT NANDA's State of AI in Business 2025 and IDC/Lenovo's CIO Playbook 2025, with each figure labelled by source and stake in-line. Reviewed by the QAIZEN AI Governance Team, June 2026.

READINESS REVIEW

Ready to find out if your pilot can ship?

If your pilot works in the demo and stalls before production, a Readiness Review maps the exact data, governance, evaluation, and ownership gaps standing between it and a live system — delivered as a written gap-map plus a go/no-go verdict and a cost-and-timeline estimate, in weeks, not quarters. It is a fixed-fee, bounded first step you keep.

Book a Readiness Review

Prefer to scope it yourself first? The ROI calculator estimates the cost, time-to-value, and readiness gap for your pilot in a few minutes.

How do I decide which path fits my pilot? (decision summary)

Start from the reframe in the scorecard above: the right answer is decided per workflow, not picked once for the whole company. Most organizations land on a mix — buy several commodity workflows, partner one stalled strategic pilot, build a rare few — so the comparison table is a lens you run against each candidate workflow, not a single company-wide verdict.

The core verdict maps onto a simple 2×2 — where a workflow sits on differentiation and on its production-gap shape decides the path:

Commodity → Moat

Have team / clear path → Stalled, gov-gap, no team

BUY

Stalled but commodity — still buy; don't partner a commodity workflow.

PARTNER

Stalled, governance-shaped gap, no team → build-operate-transfer, owned by you afterward.

BUY

Commodity workflow; the vendor absorbs the production gap.

BUILD

Durable moat + a standing team to own it permanently.

Build/buy/partner decision map. Commodity workflows → Buy (the vendor absorbs the gap). A durable moat with a standing team → Build. A stalled, governance-shaped pilot with no standing team → Partner (build-operate-transfer), owned by you afterward. A stalled commodity workflow is still a Buy — don't partner a commodity.

Within that, the highest-leverage move is to sequence these, not pick one. Buy the commodity workflows first, bring a partner in to ship the one stalled strategic pilot that's blocking you, and only stand up an in-house team once your AI footprint is large enough to justify the permanent payroll. That ordering minimizes spend while you learn which capabilities are actually worth owning — and it maps cleanly onto the three cases:

Commodity workflow: Buy. Speed and cost win; the vendor absorbs the production gap. Don't build it, don't pay anyone to build it.
Durable proprietary moat + standing team: Build. Accept the cost, time, and maintenance because the capability is the advantage. Don't outsource your moat.
Stalled-but-working pilot, governance-shaped gap: Consultancy (partner / transfer-of-ownership). Close the production gap once, fast, in your stack, owned by you afterward. This is where the Goldsmith Method for Production AI fits.

If you're genuinely unsure which bucket a given workflow is in, that uncertainty is itself useful signal — the FAQ below answers the questions that decide it.

A faster way to test the consultancy fit

Before you book anything, you can pressure-test the thinking live. QAIZEN Labs hosts a free AI advisor you can interrogate about your specific pilot — describe where it's stalled, ask it to challenge your build-vs-buy instinct, and see whether the production gap it surfaces matches your own read. No booking, no form: just a place to test your reasoning, running the same Goldsmith Method lens. Pressure-test your pilot on QAIZEN Labs →

GET A NUMBER FOR YOUR PILOT

Get a number for your own pilot

Every figure on this page is a range. Your pilot sits at one point inside them — run your own numbers to find it. The ROI calculator estimates the cost, time-to-value, and readiness gap for your pilot — free, in minutes.

Estimate your pilot with the ROI calculator Book a Readiness Review

Already know you want an independent review? Use the secondary action above, or pressure-test your thinking on QAIZEN Labs first.

Free • 5 min

Calculate Your AI ROI

use cases analyzed

264

calculation permutations

3 yrs

of ROI projections

Find your #1 AI opportunity. 3-year projections, ROI calculated, detailed action plan.

Calculate My ROI

2 min • Personalized projections

Sources

[1]MIT NANDA. "The State of AI in Business 2025". MIT, July 15, 2025.
Link
[2]IDC / Lenovo. "CIO Playbook 2025". Lenovo, April 10, 2025.
Link
[3]SFAI Labs. "AI Make-or-Buy Decision Tree 2026". SFAI Labs, February 20, 2026.
Link
[4]Xenoss. "AI Partnership Engagement Benchmarks 2026". Xenoss, March 5, 2026.
Link
[5]Glassdoor. "Senior Machine Learning Engineer Salary (US)". Glassdoor, June 30, 2025.
Link
[6]Zapier / Centiment. "AI Vendor Dependency Survey 2026". Zapier, January 22, 2026.
Link

Frequently Asked Questions

MIT NANDA's State of AI in Business 2025 — a multi-method study reviewing 300+ initiatives with 52 interviews and 153 surveys — found 95% of enterprise generative-AI efforts delivered no measurable P&L impact and only ~5% reached production. The bottleneck is the production gap (data readiness, governance, evaluation, monitoring, ownership), with poor data readiness the single most-cited technical cause — the pilot ran on a clean curated slice, production runs on messy, permission-controlled, fragmented live data. There is also a decision-side failure: Harvard Business Review's finding that 40–60% of B2B initiatives are lost to inaction rather than to a competitor applies here too — many pilots simply never get a decision and sit in the demo until budget moves. The Goldsmith Method for Production AI treats closing that gap as the actual deliverable, not the model.

AI Governance