Build vs. Buy vs. Hire a Consultancy in 2026: Which Gets Your AI Pilot to Production?
Your AI pilot works in the demo and stalls before production. An honest, sourced 2026 comparison of building in-house, buying SaaS (Glean, Copilot, Harvey), or hiring a consultancy (the partner / build-operate-transfer path) — with a real cost-and-time table and when each genuinely wins.
QAIZEN
AI Governance Team
The production gap
The distance between an AI pilot that works in a demo and a system that survives production — production-ready data, an evaluation harness, governance and an audit trail, monitoring, and a named owner. It is where most pilots stall, and the three paths (build, buy, partner) are three answers to who closes that gap and who owns it afterward.
95%
of enterprise GenAI initiatives showed no measurable P&L impact
Source: MIT NANDA 2025
88%
of AI proofs-of-concept never reach production
Source: IDC/Lenovo 2025
~2×
bought & partnered solutions reach production vs. built
Source: MIT NANDA 2025
- Most AI pilots never ship: 95% of enterprise GenAI initiatives showed no measurable P&L impact (MIT NANDA 2025).
- Your real 2026 choice is rarely "which model" — it is build in-house, buy a packaged tool, or partner (build-operate-transfer) to close the production gap.
- Bought and partnered solutions reach production roughly twice as often as internally built ones (MIT NANDA 2025).
- Buy commodity workflows; build a durable proprietary moat with a standing team; partner a stalled, governance-shaped pilot you want to own.
- The decision is per-workflow, not company-wide — and the calculator turns the ranges below into a number for your specific pilot.
What is the honest cost and timeline of build vs. buy vs. consultancy in 2026?
Most pilots stall for the same reason: the demo and the production system are different engineering problems. The gap between them — production-ready data, evaluation, governance, monitoring, and a named owner — is where pilots stall.
The three paths below are really three answers to who closes that gap and who owns it afterward. Here is what each costs and how long it takes, on the same units.
The single most uncomfortable fact to put on the table up front: MIT NANDA's State of AI in Business 2025 found that purchased AI solutions reached production roughly twice as often as internally built ones. If your use case is a commodity workflow, buying is not the lazy answer — it is statistically the more likely path to a system that actually ships.
For a commodity workflow, a mature 2026 tool like Glean or Microsoft Copilot (or Harvey for legal-style drafting, Salesforce Einstein and ServiceNow Now Assist for CRM/ITSM-embedded assistants, Decagon for support automation) already ships the evaluation, governance scaffolding, and monitoring that an in-house build would have to write from scratch.
As of mid-2026, here is how the three paths compare on like-for-like units:
| Criterion | Build in-house | Buy (SaaS / packaged) | Hire a consultancy (partner / BOT) |
|---|---|---|---|
| Time-to-production (months) | 9–18 months — hiring, ramp-up, and the production gap are all yours | Strongest option: 1–3 months — fastest; configure and integrate, not build | 1.5–3 months (≈6–12 weeks) — gap closed with you, then handed over |
| Total cost (3-year) | Weakest on this axis: $1.4M–$2.3M (Year 1 ≈ $900K–$1.35M) | Strongest option: $150K–$500K — lowest at typical volumes; usage-elastic at scale | $150K–$400K one-time engagement, then your run-cost only |
| Governance / audit trail | Weakest on this axis: DIY — exists only if you staff and prioritize it | Vendor-defined — you inherit their controls, not yours | Strongest option: Deliverable — specified, built, and documented as part of scope |
| Key-person risk | Weakest on this axis: High — concentrated in a small specialist team; AI/ML attrition runs ~25–30%/yr | Strongest option: Low — vendor absorbs staffing risk | Medium — mitigated only if knowledge transfer is contracted |
| Maintenance burden | Weakest on this axis: You (permanent) — ~20–30% of build cost per year, forever | Strongest option: Vendor — included in subscription | You (post-handover) — but on a system designed to be maintainable |
| Control & IP | Strongest option: Full — you own every line and every decision | Weakest on this axis: Low — vendor owns the core; you own configuration and data | High (if contracted) — built in your stack, IP assigned to you at handover |
| Best for | Durable proprietary advantage + a standing ML team | Commodity workflows where speed and low cost win | Stalled pilots needing governance and a fast, owned path to production |
A few notes on reading this table honestly:
- No single column wins on every criterion. Buy wins decisively on speed and key-person risk, and on cost at typical volumes, but loses on control and IP. Build wins on control and durable advantage, but is the slowest and most expensive. A consultancy wins on closing a governance-shaped production gap quickly while leaving you the owner — but it is not free, and not the right tool for a commodity workflow.
- Buy is cheapest at normal scale, but its cost is usage-elastic. Per-seat and per-call pricing scales with adoption, and at high volume it can cross over a build's fixed team cost. Directionally, your monthly buy cost is roughly interaction-volume × price-per-interaction, while a build's cost is a fixed standing-team line that barely moves with volume. The headline year-one price is the most misleading number in any buy decision. Buy still wins the cost row at typical volumes; where your own crossover point sits depends on those variables, and the calculator estimates it.
- Cost ranges are 2026 practitioner and agency estimates — every source for them has a commercial stake; none is "vendor-neutral." The consultancy band sits inside published 2026 partner-engagement ranges (Xenoss puts a strategic AI partnership at $100K–$500K initial). The one structurally neutral input is the senior-ML labor line: Glassdoor's mid-2025 data put the median U.S. senior machine-learning engineer total compensation at roughly $207K.
- Time rows are unit-normalized to months across all three columns so you are comparing like with like, not "weeks" against "quarters."
Why do so many AI pilots stall before production in 2026?
The honest reason most pilots stall is the gap named above: the demo proves feasibility; production demands production-ready data, evaluation, governance, monitoring, and a named owner. IDC/Lenovo's CIO Playbook 2025 found 88% of proofs-of-concept never reach production, and MIT NANDA's State of AI in Business 2025 reports 95% of enterprise generative-AI initiatives showed no measurable P&L impact — the gap is the rule, not the exception.
The pilot that wins applause in a demo is not the system that survives its first week in production. A demo has to convince a room of people who want to be convinced. Production has to survive adversarial inputs, regulatory questions, on-call engineers at 3am, a finance team asking what it costs per request, and a security review that wants an audit trail.
That same study frames the drop-off as a funnel — and it is the page's signature stat:
The real enemy is not a competitor — it is the stalled pilot staying stalled. Harvard Business Review's classic finding, resurfaced across 2026 conversion research (via Unbounce), is that 40–60% of B2B deals are lost to inaction — no decision at all — rather than to a rival. The same physics governs an AI pilot: the most common outcome is not "we picked the wrong path," it is "we never decided, so it sat in the demo until the budget moved on."
The single most-cited technical cause is data, not the model. A pilot is built and demoed on a clean, hand-picked slice of data; production runs on the messy, fragmented, permission-controlled live reality the curated slice was chosen to avoid. This data-readiness gap — missing fields, inconsistent formats, access rules that block the very records the system needs — is the failure cause practitioner post-mortems name most consistently. If the data foundation is not production-ready, no amount of model work ships the pilot.
MIT NANDA's State of AI in Business 2025 is the most-cited evidence of the scale of this gap. It is a multi-method study — the researchers reviewed more than 300 publicly disclosed AI initiatives, conducted 52 structured interviews, and ran 153 survey responses. Its headline finding is that 95% of enterprise generative-AI initiatives produced no measurable profit-and-loss impact, while only about 5% of custom-built tools reached production.
The same report finds that external partnerships reached deployment roughly twice as often as internally built tools — the buy-tilt this page returns to. MIT's research, as read by 2026 analysts (TechAhead), attributes a success rate of roughly 67% to both the buy and the partner paths — versus roughly half that for pure internal build. Treat the exact 67% as the 2026 analyst gloss it is, not a verbatim MIT figure; the robust claim is the ~2× direction, and that it favors both buying and partnering over building alone.
IDC and Lenovo's CIO Playbook 2025 corroborates the direction from a separate dataset: 88% of AI proofs-of-concept never reach production. They report the same graduation rate as a 33-to-4 ratio — for every 33 AI POCs launched, only four reach production — which is the same 88% finding expressed as a shipping rate.
The reason this matters for your build-vs-buy-vs-consultancy decision is simple: the bottleneck is almost never the model. It is the production gap — the data readiness, governance, evaluation harness, monitoring, and ownership that turn a clever demo into a system your organization can run, audit, and trust.
Score your pilot in 60 seconds
Before you read the "when each wins" sections, run your own pilot through six factors. Each is phrased as two poles — a build-leaning signal and a buy-leaning signal. For each factor, decide which pole your pilot sits closer to.
One thing to fix in your head first: this is a per-workflow call, not a company-wide doctrine. Most organizations end up buying some workflows, partnering one stalled strategic pilot, and building a rare few — so run this lens against each candidate workflow separately, not once for the whole company.
- Differentiation. Is this workflow your competitive edge, or a commodity task anyone could buy off a shelf? (Edge: build-leaning. Commodity: buy-leaning.)
- Data sensitivity and readiness. Does it run on regulated or sensitive data that cannot leave your environment — and is that data actually production-ready, or only clean in the curated demo slice? (Cannot leave, or data foundation still needs work: build/partner-leaning. Fine on a vendor's platform and already clean: buy-leaning.)
- Integration depth. How deep does it reach into your live systems? (Deep, stateful integration: build/partner-leaning. Standalone: buy-leaning.)
- Team capacity. Do you have a standing team that can maintain this indefinitely? (Yes: build-leaning. No: buy/partner-leaning.)
- Speed-to-value. Do you need it live in weeks, or can you wait quarters? (Weeks: buy/partner-leaning. Quarters acceptable: build-leaning.)
- Maintenance appetite. Can you absorb roughly 20–30% of build cost in upkeep every year, forever? (Yes: build-leaning. No: buy/partner-leaning.)
Reading rule:
- Mostly buy-leaning: Buy. Speed and cost win; let a vendor absorb the production gap.
- Mostly build-leaning and you have a standing team: Build. The capability is the advantage and you can fund owning it.
- A working-but-stalled pilot with a governance-shaped gap (sensitive data, deep integration, no standing team, needed soon): partner / transfer-of-ownership. Close the gap once, fast, in your stack, owned by you afterward.
If your factors are split and the answer is not obvious, that ambiguity is itself useful signal — the calculator below turns these six judgments into a cost, time-to-value, and readiness-gap estimate for your specific pilot.
When does building in-house genuinely win?
Build only when you can permanently fund a team — and when the AI capability is a durable, proprietary source of competitive advantage rather than a commodity workflow. The counterweight, stated plainly: building is the lowest-success-rate path of the three, so choose it when the ceiling is the point, not as a default.
Build when the model, the data, or the workflow is the product. If your edge depends on a system competitors cannot simply buy off a shelf — a proprietary ranking engine, a domain-specific model trained on data only you hold, a workflow so core that outsourcing it would outsource your moat — then building in-house is correct, and you should not hire a consultancy to do it for you.
The costs are unavoidable, so go in with the numbers in front of you. It is the slowest path (9–18 months once you account for hiring and ramp), the most expensive ($1.4M–$2.3M over three years, with Year 1 alone around $900K–$1.35M), and it carries permanent maintenance of roughly 20–30% of the build cost every year thereafter.
The single most underestimated line item is the evaluation engineering — building the eval suite and test harness, then re-running it against every new model and data shift. SFAI Labs' 2026 make-or-buy decision tree puts eval engineering at 30–40% of total build cost, and it is the most workload-specific part: it cannot be copied from anyone else, because it encodes what "correct" means for your data.
That is roughly a third of the build budget spent on exactly the scaffolding a packaged tool ships with by default. Key-person risk compounds it: AI/ML talent attrition runs around 25–30% per year, so a two-person "we built it ourselves" success can become a "nobody here understands it anymore" liability after a single resignation.
When does buying a packaged solution genuinely win in 2026?
For any commodity workflow, start here — the evidence favors it. MIT NANDA's State of AI in Business 2025 found bought solutions reach production roughly twice as often as built ones, which makes buying frequently the higher-probability path to a live system, not just the cheaper one. It is also the fastest path (1–3 months) and the lowest three-year cost at typical volumes ($150K–$500K).
If your problem is document summarization, transcription, support-ticket triage, meeting notes, enterprise search, or any of the dozens of AI workflows now well-served by mature vendors, the honest answer is: buy it. Enterprise search and knowledge assistants are handled by tools like Glean or Microsoft Copilot; legal-style drafting and review by Harvey; CRM- and ITSM-embedded assistants by Salesforce Einstein and ServiceNow Now Assist; high-volume customer support by Decagon.
Do not build what you can configure, and do not pay a consultancy to build what you can subscribe to. A packaged tool gets you to production in weeks, transfers staffing and maintenance risk to the vendor, and costs a fraction of a build over three years.
The honest caveat is that buy's cost is usage-elastic. Per-seat and per-call pricing is cheap at typical volumes but scales with adoption, and at high enough volume it can cross over a build's fixed team cost — with little negotiation leverage once you are locked in. The handful of variables that move that crossover are knowable up front: your interaction volume, the tokens per request, retries, and the vendor's per-seat or per-call price set against a standing team's fixed cost.
The most dangerous number in a buy decision is the year-one price, not the steady-state bill. So buy stays the cheapest option at normal scale, but know where your crossover point sits before you commit; the calculator computes it from your own numbers.
What you give up beyond cost-at-scale is control and IP. You inherit the vendor's governance model rather than authoring your own, you are exposed to their roadmap and pricing decisions, and the core capability is theirs, not yours — you own your configuration and your data, but not the engine. For a commodity workflow, that is usually a fair trade; for a strategic differentiator, it is not.
Buy well: check the exit and the fit before you sign. The risk competitors of the SaaS path treat as central is dependency, and it is worth pricing before you commit, not after. One 2026 enterprise survey (Zapier/Centiment, 542 U.S. executives) found 81% of leaders concerned about AI-vendor dependency, yet only 6% confident they could switch providers without material disruption. Before you sign, confirm four things:
- Prove it on your reality, not the demo. Run a short paid pilot on your own representative — messy, permission-controlled — production data, not the vendor's curated demo dataset, against written numeric pass/fail accuracy criteria agreed before the demo. This is the single highest-signal check there is.
- Confirm the exit. Confirm you can export your data, prompts, and logs on demand and that the contract names a portability and exit path.
- Read the renewal clauses. Read the auto-renewal and consumption-tier escalation terms (annual price increases in the 10–30% range are common).
- Price the switching cost. Price the cost of unwinding the integration you are about to build, before that integration becomes the lock-in.
A buy decision made with the exit understood and the vendor proven on your own data is a strong decision — this is how you choose Buy well, not a reason to avoid it.
When does hiring a consultancy genuinely win?
A consultancy fits cleanly in one situation: a pilot that already works in the demo but is stalled before production, where the missing pieces are production-ready data, governance, an evaluation harness, monitoring, and an owned path to ship. It is a one-time $150K–$400K engagement over roughly 6–12 weeks, not a permanent cost center — and what you are buying is transfer-of-ownership, not a subscription.
Analysts now call this the partner or build-operate-transfer path: a third party builds an asset you own, with IP assigned to you and knowledge transferred to your team, rather than a vendor renting you a black box. The visual shape of the engagement is an ownership baton that travels from the partner to you:
- Partner builds
- Built with your team
- Handover (IP + named owner)
- You own & operate
The consultancy is the wrong tool for a commodity workflow (buy that) and the wrong tool for building your permanent core moat (staff that). Its genuine fit is real: you have momentum — a pilot that works — but you lack the production scaffolding to ship it, and you lack the standing team to build that scaffolding quickly without it becoming a 12-month hiring project.
A good engagement closes the production gap with your people, documents the governance and audit trail as an explicit deliverable, and leaves your team able to maintain what it inherits.
What this looks like in practice. Two anonymized patterns, shaped to the horizontal niche this page addresses — teams on regulated or sensitive data, no industry attached:
- A team working on access-controlled, permission-sensitive data had a pilot that demoed cleanly for roughly eight months and stalled on exactly two things: it had no evaluation harness to prove it stayed accurate, and no audit trail a compliance review would accept. The gap closed in about nine weeks — harness, governance documentation, and monitoring built in their own stack — and the system shipped under their own ownership, run by a named internal engineer who was in the room throughout.
- A team with a working pilot on sensitive internal records was about to staff a multi-quarter in-house build to productionize it. A Readiness Review found the gap was governance-and-monitoring-shaped, not a research problem, and scoped it to a fixed engagement; the owned scaffolding was handed over in weeks rather than the hiring project they had budgeted.
These are pattern-shaped, not testimonials — the point is the shape of the work repeats.
What you actually receive. A Readiness Review is a written gap-map across data readiness, evaluation, governance, monitoring, and ownership — the things that decide whether a pilot ships — plus a go/no-go recommendation and a cost-and-timeline estimate to close each gap. A full engagement hands over the built scaffolding in your stack (the evaluation harness, monitoring, and governance controls), the governance and audit documentation, the running system itself, and a knowledge-transfer handover to a named internal owner. These are artifacts you keep, not a slide deck of advice.
What the Readiness Review itself costs. Because both CTAs on this page ask you to start here, its shape should be as honest as the full-engagement band. The Readiness Review is a fixed-fee, bounded engagement — a low-four-figure to low-five-figure review depending on the pilot's complexity — delivered in a matter of days to about two weeks, and creditable against a full engagement if you proceed. It is deliberately priced as an on-ramp you can approve without a procurement cycle.
Where the boundary is — and what happens if your gap is the hard one. The 6–12 week, $150K–$400K band assumes a specific shape: a pilot that genuinely works in the demo and a gap that is data/eval/governance/monitoring-shaped, not a fundamentally broken model or an unsolved research problem. If the underlying model does not actually work, or the data foundation needs rebuilding from scratch, that is a different and larger piece of work. The Readiness Review exists precisely to confirm the gap is the shippable shape before a fixed-scope engagement is quoted. If it is not, the Review says so, and you have spent a small, bounded amount to learn that rather than a large one.
Why a partner is faster than the in-house attempt that already stalled. The production gap is not a novel research problem — it is a repeatable, pattern-based body of work (eval harness, governance, monitoring, handover) that a team which has done it before assembles from existing templates rather than inventing from scratch, with no hiring ramp. The in-house attempt is slow precisely because it is a first-time build competing with the day job; a partner is fast because it is the hundredth time, not the first.
Where your data lives — and who gets inside the walls. Because the work happens in your own environment and stack, your data stays under your control — it does not move to a vendor platform. Engagements run under NDA, and for teams on regulated or sensitive data the audit trail itself is one of the deliverables, not an afterthought. The people-access question — who gets credentials, and at what level — is yours to control, not the partner's: your security team scopes and grants partner access under least-privilege, inside your environment, with your offboarding process.
Why hand the production outcome to a partner rather than rent contractors? A budget-holder will reasonably ask why not hire one or two senior ML contractors for less. Sometimes that is the right call — if you already know exactly what is missing and just need hands to build it, contractors can be cheaper and entirely sufficient. The honest distinction: contractors are individuals you direct and who carry the same key-person risk a permanent hire does — when they leave, the knowledge leaves with them. A partner brings a pre-built methodology and accountability for the production outcome itself, and bakes the handover to a named internal owner into the engagement.
Why the handover actually sticks. Knowledge transfer fails when it is bolted on at the end as a contract clause the buyer has to police alone. It sticks when the system is built with your team rather than for them, documented as it is built, designed to be maintainable, and assigned to a named internal owner identified during the work, not after. "Owned by you" should be verifiable in the artifacts you hold — running system, documentation, a named owner who was in the room — not a promise you have to chase.
This is what the Goldsmith Method for Production AI is built to do: treat the production gap as the deliverable, not the model. It works backward from "what does a system need to survive a security review, a finance review, and an on-call rotation" — data readiness, evaluation, governance, monitoring, ownership — and builds exactly that scaffolding around your existing pilot, in your environment, with a handover that leaves your team able to run it. Each stage produces an inspectable artifact, which is what gives the "hundredth time" claim something concrete behind it.
There is a second, increasingly common shape worth naming: buy a mature base, then build your own differentiating layer on top. The dominant 2026 practitioner consensus is that the right architecture for most strategic pilots is neither pure-build nor pure-buy — it is a bought foundation (a base platform or model) plus a thin owned layer of prompts, retrieval, integrations, and human checks that encode your advantage. A partner engagement is frequently exactly how that owned layer gets built and governed on top of a bought base.
The honesty test for any consultancy, including the one writing this page: if your situation is "buy the SaaS" or "build the moat in-house," a good consultancy should tell you so — and a go/no-go Readiness Review that concludes you should buy or build instead is a valid deliverable, not a failed sale. Read the table above and you will see Buy winning several rows outright — that is the comparison working as intended.
Why trust an unnamed team with a stalled, regulated-data pilot?
This page is published under a team name, not a personal brand, and the fair question from any budget-holder is: anyone can write a polished page — why should I believe you can do this? The honest answer is that you should not have to trust the brand up front. The engagement is structured so the first thing you receive is something you keep and can evaluate before you commit to anything larger.
- Judge the artifacts, not the name. The Readiness Review is a written document — a gap-map, a go/no-go verdict, and a cost-and-timeline estimate — that is yours whether or not you proceed to a build. You evaluate the quality of that document directly; nothing about the decision rests on taking our word.
- The track record has a shape you can recognize. The two anonymized patterns above are the repeatable shape of the work, not one-off luck. The methodology that produces them is itself inspectable: the gap-map template and stage deliverables can be reviewed before you commit.
- The walk-away is the deliverable, not a failure. If the Readiness Review concludes you should buy a packaged tool or build in-house, that conclusion is the work product — you owe nothing further.
- The handover is verifiable. The named internal owner is identified during the work, not promised after it, and that person can confirm the handover stuck. "Owned by you" is checkable in the artifacts you hold.
The experience behind the method. Anonymity is about the author, not the depth behind the work. The team has authored published technical books and video courses on IoT and cloud architecture (Éditions ENI), has trained engineers at and delivered production systems for Fortune-500 industrial, energy, and luxury-retail organizations, and operates its own production AI agents — the Labs advisor and the ROI calculator on this very site are systems we built, run, and govern under the same method this page describes. The point is not credentials for their own sake: it is that the production-gap work described here is drawn from systems actually shipped and run, not theorized.
Who you actually contract with. Anonymity is about the author, not the entity. The engagement is signed with a registered legal company — QAIZEN TECH DWC-LLC — so your procurement and security teams have a real counterparty to onboard: an NDA and MSA to sign, a named jurisdiction, and the standard procurement-grade basics handled the way any enterprise vendor review expects. The team is anonymous; the company you sign with is not.
You don't have to trust us up front. You trust the structure: a real, signable entity, a track record with a recognizable shape, and a bounded first deliverable you keep and can judge before committing to the full build.
How we built this comparison: the cost and timeline bands are triangulated from 2026 practitioner and agency estimates (Xenoss, SFAI Labs), a structurally-neutral salary input (Glassdoor mid-2025), and the primary research from MIT NANDA's State of AI in Business 2025 and IDC/Lenovo's CIO Playbook 2025, with each figure labelled by source and stake in-line. Reviewed by the QAIZEN AI Governance Team, June 2026.
Ready to find out if your pilot can ship?
If your pilot works in the demo and stalls before production, a Readiness Review maps the exact data, governance, evaluation, and ownership gaps standing between it and a live system — delivered as a written gap-map plus a go/no-go verdict and a cost-and-timeline estimate, in weeks, not quarters. It is a fixed-fee, bounded first step you keep.
Prefer to scope it yourself first? The ROI calculator estimates the cost, time-to-value, and readiness gap for your pilot in a few minutes.
How do I decide which path fits my pilot? (decision summary)
Start from the reframe in the scorecard above: the right answer is decided per workflow, not picked once for the whole company. Most organizations land on a mix — buy several commodity workflows, partner one stalled strategic pilot, build a rare few — so the comparison table is a lens you run against each candidate workflow, not a single company-wide verdict.
The core verdict maps onto a simple 2×2 — where a workflow sits on differentiation and on its production-gap shape decides the path:
Commodity → Moat
Stalled but commodity — still buy; don't partner a commodity workflow.
Stalled, governance-shaped gap, no team → build-operate-transfer, owned by you afterward.
Commodity workflow; the vendor absorbs the production gap.
Durable moat + a standing team to own it permanently.
Within that, the highest-leverage move is to sequence these, not pick one. Buy the commodity workflows first, bring a partner in to ship the one stalled strategic pilot that's blocking you, and only stand up an in-house team once your AI footprint is large enough to justify the permanent payroll. That ordering minimizes spend while you learn which capabilities are actually worth owning — and it maps cleanly onto the three cases:
- Commodity workflow: Buy. Speed and cost win; the vendor absorbs the production gap. Don't build it, don't pay anyone to build it.
- Durable proprietary moat + standing team: Build. Accept the cost, time, and maintenance because the capability is the advantage. Don't outsource your moat.
- Stalled-but-working pilot, governance-shaped gap: Consultancy (partner / transfer-of-ownership). Close the production gap once, fast, in your stack, owned by you afterward. This is where the Goldsmith Method for Production AI fits.
If you're genuinely unsure which bucket a given workflow is in, that uncertainty is itself useful signal — the FAQ below answers the questions that decide it.
A faster way to test the consultancy fit
Before you book anything, you can pressure-test the thinking live. QAIZEN Labs hosts a free AI advisor you can interrogate about your specific pilot — describe where it's stalled, ask it to challenge your build-vs-buy instinct, and see whether the production gap it surfaces matches your own read. No booking, no form: just a place to test your reasoning, running the same Goldsmith Method lens. Pressure-test your pilot on QAIZEN Labs →
Get a number for your own pilot
Every figure on this page is a range. Your pilot sits at one point inside them — run your own numbers to find it. The ROI calculator estimates the cost, time-to-value, and readiness gap for your pilot — free, in minutes.
Already know you want an independent review? Use the secondary action above, or pressure-test your thinking on QAIZEN Labs first.
Calculate Your AI ROI
11
use cases analyzed
264
calculation permutations
3 yrs
of ROI projections
Find your #1 AI opportunity. 3-year projections, ROI calculated, detailed action plan.
2 min • Personalized projections
Sources
- [1]MIT NANDA. "The State of AI in Business 2025". MIT, July 15, 2025.Link
- [2]IDC / Lenovo. "CIO Playbook 2025". Lenovo, April 10, 2025.Link
- [3]SFAI Labs. "AI Make-or-Buy Decision Tree 2026". SFAI Labs, February 20, 2026.Link
- [4]Xenoss. "AI Partnership Engagement Benchmarks 2026". Xenoss, March 5, 2026.Link
- [5]Glassdoor. "Senior Machine Learning Engineer Salary (US)". Glassdoor, June 30, 2025.Link
- [6]Zapier / Centiment. "AI Vendor Dependency Survey 2026". Zapier, January 22, 2026.Link
Frequently Asked Questions
- MIT NANDA's State of AI in Business 2025 — a multi-method study reviewing 300+ initiatives with 52 interviews and 153 surveys — found 95% of enterprise generative-AI efforts delivered no measurable P&L impact and only ~5% reached production. The bottleneck is the production gap (data readiness, governance, evaluation, monitoring, ownership), with poor data readiness the single most-cited technical cause — the pilot ran on a clean curated slice, production runs on messy, permission-controlled, fragmented live data. There is also a decision-side failure: Harvard Business Review's finding that 40–60% of B2B initiatives are lost to inaction rather than to a competitor applies here too — many pilots simply never get a decision and sit in the demo until budget moves. The Goldsmith Method for Production AI treats closing that gap as the actual deliverable, not the model.