1. The Request That Used to Die in a Queue
Every operations leader carries a mental list of small problems that never get solved. Not the big ones — those get budget, headcount, a vendor evaluation, a steering committee. The small ones. The vendor whose invoices nobody reconciles. The customer feedback that arrives as anecdotes instead of data. The weekly report someone assembles by hand from four systems. The spreadsheet that one person understands and everyone fears.
These problems share a profile: each is worth solving, none is worth what solving it used to cost. The traditional menu was grim:
- The dev backlog. Write a ticket, make the business case, wait a quarter or two behind everything customer-facing. Most internal tooling requests die here, and rightly so — engineering time is the scarcest resource in the building.
- The SaaS subscription. Find a vendor whose product is 60% what you need and 300% what you'll use. Pay per seat, forever. Integrate it badly. Inherit their roadmap instead of yours.
- The outside firm. Scope it, quote it, onboard them, explain your business, review the deliverable, pay the invoice. For a small problem, the transaction costs exceed the build costs before anyone writes a line of code.
So the rational move, for decades, was to live with the problem. The unserved backlog of small operational problems is, I would argue, the largest pool of untapped ROI inside most companies — precisely because every individual item in it was too small to justify the old acquisition paths.
That math has flipped. Over the past year my team has been standing up what we internally call copilots: small, single-purpose tools, built ad hoc, the same week — sometimes the same afternoon — we discover the need. This post is about two of them, and about the operating model that keeps "ad hoc" from becoming "chaos."
2. What I Mean by a Copilot
The word gets used loosely, so here is my working definition. A copilot is an internal tool that is:
- Single-purpose. It does one job — audit these invoices, interview these customers, watch this queue. It is not a platform. Platforms are how internal tools used to justify their cost; copilots don't need to.
- Built next to the problem. The person who owns the problem is in the room — often driving. The spec is the actual mess on their desk, not a requirements document written three translations away from it.
- Deterministic at the core, AI at the edges. The volume work runs on code — parsers, SQL, arithmetic. The language model handles only what code can't: exceptions, judgment calls, conversation, narrative. This is what makes copilots cheap to run, not just cheap to build.
- Disposable by design. If the process changes, you rebuild it in a day. Nobody mourns it. The sunk-cost gravity that made legacy internal tools immortal simply doesn't accumulate.
The economics are worth stating plainly, because they're the whole story:
| The old paths | The copilot | |
|---|---|---|
| Time to first value | Months (backlog, procurement, onboarding) | Days, sometimes hours |
| Upfront cost | Dev sprints, license fees, or a consulting quote | An afternoon of builder time |
| Recurring cost | Per-seat pricing, retainers, maintenance contracts | Pennies of model spend; the core runs on code |
| Fit | 60% of what you need, 300% of what you'll use | Exactly the problem, nothing else |
| Ownership | The vendor's roadmap | Yours — including the data |
Two examples from our own operations make this concrete — one looking inward at costs, one looking outward at customers.
3. Copilot One: The Invoice Audit Nobody Had Ever Run
Every operations team has a vendor like this. They do good work. They bill monthly. The invoices arrive as scanned PDFs — hundreds of pages of tables, no text layer, no CSV export, no API. So nobody reconciles them line by line. Finance checks the total, pays the bill, moves on.
In our case it was a manufacturing services vendor billing against individual e-commerce orders. Five months of invoices, roughly 1,200 scanned pages, over a thousand individual jobs — each with line items, unit prices, and a reference back to one of our orders. The questions were basic: What does each order really cost us, fully loaded? Is the vendor's pricing drifting? And the one every ops leader quietly wonders — when they redo a job because of their own quality failure, who pays?
Under the old math, this audit doesn't happen. A person keying 1,200 pages into a spreadsheet is weeks of mind-numbing work. A document-processing SaaS wants a contract. The naive AI answer — feed every page to a vision model — means millions of tokens before you've asked a single analytical question, re-paid every month forever.
The copilot took one afternoon to build, and it rests on a principle that drove every design decision: deterministic tools do the volume; language models handle only the exceptions.
| Stage | Tool | Marginal cost |
|---|---|---|
| Page extraction | Native scan extraction, no re-rendering | $0 |
| OCR | The OS-native vision framework, locally | $0 |
| Table parsing | Python + column geometry | $0 |
| Validation | Arithmetic: unit × qty = line, lines Σ = invoice total | $0 |
| Repair of failed pages | Small vision-model agents, one per flagged page | Cheap, bounded |
| Analytics & math | SQL + Python — never a model | $0 |
| Narrative reports | One mid-tier model per report, fed pre-computed numbers only | Trivial |
The magic is in the validation stage. Invoices are self-checking documents: every row carries its own arithmetic, and every page carries a printed grand total. If a parsed row satisfies unit price × quantity = line total, the OCR was almost certainly correct. If an invoice's parsed lines sum to its printed total, the whole page was correct. You don't need a human or a model to bless the output — the document audits itself.
Only the rows that fail those checks go to AI. In our run, after the deterministic passes converged, that was about 100 repair units out of 1,200 pages. Each got a small, cheap vision agent with a forced JSON schema and one instruction that matters more than all the others: read the numbers exactly as printed; never infer. Inferred data poisons an audit. Flagged-and-missing data just narrows it honestly.
Three lessons from the OCR trenches, for anyone building something similar:
- Never re-render a scan. The PDFs embedded 300dpi bilevel fax-style images. Rendering pages to PNG at a "reasonable" resolution turned digits to mush. Extracting the embedded bitmaps natively, pixel for pixel, fixed half the error rate in one move. Resampling bilevel scans is destruction, not conversion.
- Dense tables break full-page OCR — so slice them. The vendor's statements used dotted separator lines between rows, and the vision framework silently dropped numbers in those regions. Detecting the separator grid with a few lines of numpy and OCR-ing each row band as its own image took accuracy from coin-flip to near-perfect.
- Exploit document redundancy before buying intelligence. When one of a pair of identical line items OCR'd cleanly and its twin didn't, copying the clean value — accepted only if the page total then reconciled exactly — repaired hundreds of rows for free. Every deterministic repair you find is a fleet of agents you don't pay for, this month and every month after.
And the audit found things. Vendor unit pricing was flat and the overall picture was healthy — no drama. But two patterns were visible only at line-item granularity: the vendor double-documents every job across two billing streams (a standing double-payment risk that accounts payable now formally checks), and redone jobs — including same-month redos of identical work, the signature of a vendor-side quality failure — were re-billed at full price. Roughly 4% of spend on that vendor was rework billing, a meaningful share of it with no defensible justification we could find. That is now a tracked claims process with a named owner, and the opening exhibit in a remake-policy negotiation.
The cost to find this: an afternoon of building and a few dollars of model spend. The monthly re-run: one dropped PDF, one command, about fifteen minutes of unattended machine time.
4. Copilot Two: The Research Department We Didn't Hire
The second example looks outward. Voice-of-customer research is the kind of capability mid-size companies have always had to rent: a market-research firm engagement measured in weeks and five figures, or a survey SaaS that collects ratings nobody reads, plus analyst time to turn responses into anything a decision-maker can use.
We built it instead. Our voice-of-customer platform is an AI interviewer that conducts genuine adaptive conversations with customers — it asks follow-up questions based on what the customer actually said, in the customer's own language, at whatever time the customer feels like talking. The transcripts are synthesized into themes the same way a (very patient, very consistent) research analyst would do it, and the operations team gets notified as insights come in.
This one started as an ad hoc copilot and graduated into something more interesting: a piece of in-house innovation we keep iterating on weekly — new question flows, new languages, new product lines — at a pace no vendor relationship would tolerate. That's the ceiling on this pattern worth knowing about: most copilots stay small, and should. But because you own the code and the data, the occasional one compounds into a real capability — the kind of thing that would have been a line item in next year's budget, except it already exists.
The contrast with the rented alternative is the same as the invoice story: weeks became days, a recurring outside cost became pennies of model spend, and a generic instrument became one that knows our products, our customers, and our questions.
5. The Operating Model: Ad Hoc Tools Without Ad Hoc Chaos
The obvious objection: a company that builds a tool every time someone has a problem ends up with fifty orphaned scripts and a new kind of technical debt. The objection is right — if you build copilots without an operating model. Ours has four rules:
- Every copilot gets an owner. Not the builder — the person whose problem it solves. The invoice audit's claims queue is worked by a named team member. If nobody will own the output, the tool shouldn't exist.
- Every copilot gets an operating manual. One page: what it does, how to run it, what to do when it breaks. Written the day it ships, while the knowledge is fresh and the builder is honest.
- Every recurring copilot joins a loop. Ours get a slot in the bot fleet that runs our operations cadence — a reminder fires monthly, a human drops a file, a pipeline runs, a dashboard refreshes. A one-off audit decays; a loop compounds.
- Deterministic core, AI edges — always. This is the cost rule and the trust rule at once. Code does the math, models handle exceptions and language, and every number in a report traces back to a row that passed an arithmetic check. When the residual can't be validated, it gets labeled, not hidden.
Notice what's not on the list: a committee, a platform team, a tooling budget line. The governance is proportional to the artifact. That proportionality is the entire advantage — reintroduce the old overhead and you reintroduce the old math.
And to be clear about the boundary: this is not an argument against buying software. Systems of record — the ERP, the commerce platform, the CRM — you buy, because correctness and continuity there are existential and undifferentiated. Copilots live in the connective tissue between those systems, the space that was historically served by spreadsheets, swivel-chair integration, and resignation.
6. The Takeaway
The constraint on internal tooling used to be engineering capacity. It is now something rarer: knowing your operations well enough to name the problem precisely. The afternoon of building is the easy part; the years of operating experience that tell you which 1,200 pages to audit and which questions to ask a customer — that's the scarce input now. Which is why this capability belongs inside the operating team, not delegated wholesale to IT or rented from a vendor: the people closest to the problems are finally the people who can solve them.
Somewhere in your company there is a drawer of scanned PDFs nobody has reconciled, a stream of customer feedback nobody has structured, and a dozen other small problems that were never worth what solving them used to cost. They're worth it now. The price changed. Most org charts just haven't noticed yet.