Back to Notes

The Ad Hoc Copilot: Internal Tools Used to Take Months. Now They Take an Afternoon.

For twenty years, a small operational problem had three bad options: wait for the dev backlog, buy a SaaS subscription that almost fits, or pay an outside firm. There is now a fourth option, and it is quietly becoming the default. We call them copilots — small, purpose-built tools we stand up the same week we discover the need.


1. The Request That Used to Die in a Queue

Every operations leader carries a mental list of small problems that never get solved. Not the big ones — those get budget, headcount, a vendor evaluation, a steering committee. The small ones. The vendor whose invoices nobody reconciles. The customer feedback that arrives as anecdotes instead of data. The weekly report someone assembles by hand from four systems. The spreadsheet that one person understands and everyone fears.

These problems share a profile: each is worth solving, none is worth what solving it used to cost. The traditional menu was grim:

So the rational move, for decades, was to live with the problem. The unserved backlog of small operational problems is, I would argue, the largest pool of untapped ROI inside most companies — precisely because every individual item in it was too small to justify the old acquisition paths.

That math has flipped. Over the past year my team has been standing up what we internally call copilots: small, single-purpose tools, built ad hoc, the same week — sometimes the same afternoon — we discover the need. This post is about two of them, and about the operating model that keeps "ad hoc" from becoming "chaos."

2. What I Mean by a Copilot

The word gets used loosely, so here is my working definition. A copilot is an internal tool that is:

The economics are worth stating plainly, because they're the whole story:

The old pathsThe copilot
Time to first valueMonths (backlog, procurement, onboarding)Days, sometimes hours
Upfront costDev sprints, license fees, or a consulting quoteAn afternoon of builder time
Recurring costPer-seat pricing, retainers, maintenance contractsPennies of model spend; the core runs on code
Fit60% of what you need, 300% of what you'll useExactly the problem, nothing else
OwnershipThe vendor's roadmapYours — including the data

Two examples from our own operations make this concrete — one looking inward at costs, one looking outward at customers.

3. Copilot One: The Invoice Audit Nobody Had Ever Run

Every operations team has a vendor like this. They do good work. They bill monthly. The invoices arrive as scanned PDFs — hundreds of pages of tables, no text layer, no CSV export, no API. So nobody reconciles them line by line. Finance checks the total, pays the bill, moves on.

In our case it was a manufacturing services vendor billing against individual e-commerce orders. Five months of invoices, roughly 1,200 scanned pages, over a thousand individual jobs — each with line items, unit prices, and a reference back to one of our orders. The questions were basic: What does each order really cost us, fully loaded? Is the vendor's pricing drifting? And the one every ops leader quietly wonders — when they redo a job because of their own quality failure, who pays?

Under the old math, this audit doesn't happen. A person keying 1,200 pages into a spreadsheet is weeks of mind-numbing work. A document-processing SaaS wants a contract. The naive AI answer — feed every page to a vision model — means millions of tokens before you've asked a single analytical question, re-paid every month forever.

The copilot took one afternoon to build, and it rests on a principle that drove every design decision: deterministic tools do the volume; language models handle only the exceptions.

StageToolMarginal cost
Page extractionNative scan extraction, no re-rendering$0
OCRThe OS-native vision framework, locally$0
Table parsingPython + column geometry$0
ValidationArithmetic: unit × qty = line, lines Σ = invoice total$0
Repair of failed pagesSmall vision-model agents, one per flagged pageCheap, bounded
Analytics & mathSQL + Python — never a model$0
Narrative reportsOne mid-tier model per report, fed pre-computed numbers onlyTrivial

The magic is in the validation stage. Invoices are self-checking documents: every row carries its own arithmetic, and every page carries a printed grand total. If a parsed row satisfies unit price × quantity = line total, the OCR was almost certainly correct. If an invoice's parsed lines sum to its printed total, the whole page was correct. You don't need a human or a model to bless the output — the document audits itself.

Only the rows that fail those checks go to AI. In our run, after the deterministic passes converged, that was about 100 repair units out of 1,200 pages. Each got a small, cheap vision agent with a forced JSON schema and one instruction that matters more than all the others: read the numbers exactly as printed; never infer. Inferred data poisons an audit. Flagged-and-missing data just narrows it honestly.

Three lessons from the OCR trenches, for anyone building something similar:

And the audit found things. Vendor unit pricing was flat and the overall picture was healthy — no drama. But two patterns were visible only at line-item granularity: the vendor double-documents every job across two billing streams (a standing double-payment risk that accounts payable now formally checks), and redone jobs — including same-month redos of identical work, the signature of a vendor-side quality failure — were re-billed at full price. Roughly 4% of spend on that vendor was rework billing, a meaningful share of it with no defensible justification we could find. That is now a tracked claims process with a named owner, and the opening exhibit in a remake-policy negotiation.

The cost to find this: an afternoon of building and a few dollars of model spend. The monthly re-run: one dropped PDF, one command, about fifteen minutes of unattended machine time.

4. Copilot Two: The Research Department We Didn't Hire

The second example looks outward. Voice-of-customer research is the kind of capability mid-size companies have always had to rent: a market-research firm engagement measured in weeks and five figures, or a survey SaaS that collects ratings nobody reads, plus analyst time to turn responses into anything a decision-maker can use.

We built it instead. Our voice-of-customer platform is an AI interviewer that conducts genuine adaptive conversations with customers — it asks follow-up questions based on what the customer actually said, in the customer's own language, at whatever time the customer feels like talking. The transcripts are synthesized into themes the same way a (very patient, very consistent) research analyst would do it, and the operations team gets notified as insights come in.

This one started as an ad hoc copilot and graduated into something more interesting: a piece of in-house innovation we keep iterating on weekly — new question flows, new languages, new product lines — at a pace no vendor relationship would tolerate. That's the ceiling on this pattern worth knowing about: most copilots stay small, and should. But because you own the code and the data, the occasional one compounds into a real capability — the kind of thing that would have been a line item in next year's budget, except it already exists.

The contrast with the rented alternative is the same as the invoice story: weeks became days, a recurring outside cost became pennies of model spend, and a generic instrument became one that knows our products, our customers, and our questions.

5. The Operating Model: Ad Hoc Tools Without Ad Hoc Chaos

The obvious objection: a company that builds a tool every time someone has a problem ends up with fifty orphaned scripts and a new kind of technical debt. The objection is right — if you build copilots without an operating model. Ours has four rules:

Notice what's not on the list: a committee, a platform team, a tooling budget line. The governance is proportional to the artifact. That proportionality is the entire advantage — reintroduce the old overhead and you reintroduce the old math.

And to be clear about the boundary: this is not an argument against buying software. Systems of record — the ERP, the commerce platform, the CRM — you buy, because correctness and continuity there are existential and undifferentiated. Copilots live in the connective tissue between those systems, the space that was historically served by spreadsheets, swivel-chair integration, and resignation.

6. The Takeaway

The constraint on internal tooling used to be engineering capacity. It is now something rarer: knowing your operations well enough to name the problem precisely. The afternoon of building is the easy part; the years of operating experience that tell you which 1,200 pages to audit and which questions to ask a customer — that's the scarce input now. Which is why this capability belongs inside the operating team, not delegated wholesale to IT or rented from a vendor: the people closest to the problems are finally the people who can solve them.

Somewhere in your company there is a drawer of scanned PDFs nobody has reconciled, a stream of customer feedback nobody has structured, and a dozen other small problems that were never worth what solving them used to cost. They're worth it now. The price changed. Most org charts just haven't noticed yet.