1. Why Generic Outreach Fails
Cold email has a bad reputation because most of it deserves one. The typical outbound sequence sends the same message to five hundred people who share a job title. The hook is vague, the offer is abstract, and the call to action asks for thirty minutes from someone who doesn't know you.
The problem isn't volume. The problem is that the message wasn't written for anyone in particular. A facilities manager at a manufacturing company has different pressures, different language, different objections than a director of operations at a services firm. Sending them the same email and calling it "personalization" because you merged in their first name isn't a strategy. It's noise.
Real personalization requires research. Research doesn't scale. Until agents.
Over the past few months I've been building an end-to-end B2B outbound system for a new product vertical. The system is designed around one central idea: every prospect gets a message that was written for them specifically — their industry, their role, their pain, their psychology — and the system learns over time which approaches actually work.
This is what I built.
2. The Full Pipeline
Before getting into the pieces, here is the shape of the whole:
Eight stages. Four categories. One thing that makes it different from standard outbound: the purple arrow at the bottom. That is the feedback loop. Reply outcomes flow back to the angle optimizer, which updates the weights the research agent uses to select angles next time. The system gets smarter on every send cycle.
3. The Landing Layer
Most B2B outreach sends prospects to a generic homepage. The homepage talks about everything the company does, serves every audience, and converts almost no one. If I'm writing a targeted email to a specific type of buyer, I want them to land somewhere built for them — where the language, the proof points, and the call to action match exactly what the email promised.
The landing layer is a set of ICP-specific sites, each tailored to a vertical. They're built with a headless architecture: the storefront frontend is completely decoupled from the commerce backend. The product catalog, pricing, and checkout live in the commerce platform. The landing experience — design, copy, social proof, CTAs — lives in a standalone site optimized for that vertical's decision-maker.
A facilities manager in heavy industry lands somewhere different than a director of operations at a service company. Same product. Different framing. Different proof. Different ask.
The sites deploy automatically from a shared repository. New verticals get new experiences without touching the core commerce infrastructure.
4. CRM as the Operating System
Everything flows into the CRM. This is not optional architecture — it's the backbone. Without a clean, tagged, structured CRM, the agents have nowhere to work and no record of what they've done.
Every contact imported carries:
- Vertical tag — which product line and market segment this contact belongs to
- ICP tier tag — which buyer persona within the vertical (there are four tiers per vertical, each with distinct deal motion and urgency)
- Research status — pending, in progress, done
- Outreach status — what has been sent, what was replied, what outcome was recorded
- Angle tag — which psychological angle was used in their outreach email
The CRM also has a separate pipeline per vertical. Different buyer journeys warrant different stages. A deal that starts with a pilot program request has different intermediate steps than one that starts with an inbound product inquiry. Mixing them in a single pipeline produces noise. Separate pipelines produce signal.
5. The Agent Swarm
The pipeline is powered by a swarm of specialized agents running on a scheduled basis. No single agent does everything. Each agent has a narrow, well-defined job.
Each agent runs on a schedule appropriate to its role. The import agent runs weekly, once per vertical — there is no reason to import every day when the prospect pool moves slowly. The research and draft agents run nightly, working through the queue of pending contacts. The monitor runs every weekday morning. The optimizer runs once a week, after enough reply data has accumulated to be meaningful.
All of them read from and write to the same central store: the CRM and a structured database that tracks every email sent, every angle used, and every outcome recorded.
6. The Research Agent
The research agent is what makes personalization scale. Its job is not to write an email. Its job is to understand the prospect well enough to inform one.
For each contact in the queue, the research agent does several things:
- Pulls company and role data from the CRM tags already applied at import
- Cross-references any available public intelligence about the organization
- Reads the current angle weights from the performance database
- Selects the best-performing angle for this vertical and ICP tier
- Writes its research notes and the selected angle to the CRM record
The angle selection step is where the psychology library lives. More on that in the next section.
The output of the research agent isn't an email. It's a brief: company context, role context, and a specific psychological hook that the draft agent will use to write the message. The research agent thinks. The draft agent writes.
7. Marketing Psychology at Scale
Here is the piece I find most interesting architecturally.
The draft agent doesn't pick a template. It picks an angle — a specific psychological approach rooted in a principle of behavioral science. Each angle is a combination of a cognitive mechanism (loss aversion, social proof, endowment effect, identity, urgency) and a hook relevant to the buyer's specific context.
There are roughly twenty angles across the system, divided by vertical and ICP tier. A few examples of the types (abstracted from any specific industry):
| Angle Type | Psychological Principle | When It Works |
|---|---|---|
| Peer adoption signal | Social proof — "others like you are moving" | When buyers are cautious, not leading-edge |
| Loss window / timing | Loss aversion — "something available now won't be later" | Budget cycles, grant periods, seasonal windows |
| Zero-friction pilot offer | Endowment effect — "try it, you won't want to give it back" | High-consideration products, risk-averse buyers |
| Compliance/liability hook | Loss aversion — "here is a risk you may not have quantified" | Regulated industries, EHS titles, operations leads |
| Retention/ROI framing | Anchoring — "here is the math on what you're currently losing" | High-turnover industries, cost-conscious buyers |
| Identity / status signal | Self-concept — "modern organizations do this" | Buyers motivated by how their organization is perceived |
The draft agent takes the research brief, the selected angle, and the relevant template structure for this tier, and writes a personalized email. The output is not a merge-field form. It's a message where the hook, the framing, and the call to action all reflect a deliberate psychological strategy matched to this specific buyer.
The angle isn't decoration. It's the structural reason the email works or doesn't. Sending a loss-aversion hook to a buyer who is motivated by status gets you ignored. Sending a status hook to a buyer whose primary pain is compliance risk gets you the same result. The matching is the work.
8. The Human in the Loop
The agents do not send email autonomously. Every draft gets reviewed before it goes.
This is a design choice, not a limitation. There are several reasons I want a human in this loop:
- Tone calibration. Agents produce good drafts. They occasionally produce drafts with a note slightly off-key for a specific person or context. Human review catches it.
- Relationship awareness. Sometimes a contact is someone I know, or someone referred by a partner, or someone who has already had a conversation with someone on the team. The agent doesn't know this. I do.
- Business timing. There are moments — a product launch, an industry event, a news cycle — where you want to hold or modify outreach. A human call doesn't require rebuilding the logic.
- Quality compound. Every draft I read is a loop of feedback on the system. If I'm editing the same phrase pattern repeatedly, that's a prompt improvement waiting to happen.
In practice, most drafts go out with minimal edits. The review takes seconds per contact. But the option to intervene is always there, and it costs nothing to preserve it.
9. Execution: Send and Monitor
Approved drafts get sent from a dedicated outbound domain warming on a 40-day protocol before any real volume goes out. The system respects the warm-up constraints: rate limits per day, spacing between sends, gradual volume ramp.
The monitor agent runs every weekday morning and reads the inbox for replies. For each reply it finds, it does three things:
- Classifies the sentiment — positive, neutral, negative, out of office, meeting booked
- Updates the CRM — reply recorded on the contact, pipeline stage advanced if positive
- Logs the outcome to the email record — crucially, it links the outcome back to the angle that was used, not just the contact
The third step is what enables the feedback loop. Without it, you know who replied. With it, you know why they replied — or at least, which psychological approach was in play when they did.
10. The Feedback Loop: Self-Updating Angle Weights
Once per week, the optimizer runs.
It reads every row in the email log, aggregates by vertical × ICP tier × angle, and computes a positive reply rate for each combination. It then updates a weight table that the research agent reads on every subsequent run.
The weight formula is Laplace-smoothed to handle small sample sizes gracefully. An angle with three sends and two positive replies gets a strong weight, but not so strong that it shuts out other angles entirely. New angles start with prior weights derived from the psychological analysis that defined them. As real data accumulates — after roughly twenty sends per angle — real rates take over from priors.
There is also an exploration budget. Not every send goes to the highest-weighted angle. A small percentage — around twenty percent — goes to angles with low send counts, ensuring the system doesn't collapse to one approach before it has tested the others.
This is a simplified form of Thompson sampling applied to email copy. The terminology is from reinforcement learning, but the concept is simple: explore enough to learn, exploit what you've learned.
11. The Weekly Angle Report
Every Sunday night, after the optimizer runs, it posts a report to the team Slack channel:
📊 Angle Performance — Week of Feb 23
Top angles this week:
🔥 zero_friction_pilot (tier_1): 38% positive (8 sent)
✅ compliance_liability (tier_2): 29% (14 sent)
📉 generic_baseline (tier_2): 7% (18 sent) — downweighting
Next week: generic_baseline weight cut in half.
Nobody asked for this report. The system generates it because I built it to. Every Monday morning I can see, at a glance, what's working and what isn't — not based on intuition, but on reply data mapped back to the specific psychological mechanism that drove each message.
12. Where This Stands
The full pipeline is live: the import cron, the landing sites, the CRM with tagged contacts, the research and draft agents, the approval workflow, the angle weight system, and the optimizer. Email domain warm-up is underway. First sends go out when the domain reaches the safety threshold — roughly six weeks out from setup.
The angle weight system starts with priors and no real data. By the time we're sending at volume, the priors will already be giving way to whatever the first few hundred sends teach us. That's intentional design: seed with your best thinking, replace it with evidence.
Three things I'll be watching:
- Which angles generalize across tiers and which are truly tier-specific. My hypothesis is that a handful will do well everywhere, and a few will be narrow but very effective for specific buyer profiles.
- Whether the exploration budget is right. Twenty percent feels correct for a small initial pool of angles, but as the library grows it may need to contract.
- Lag in the feedback loop. Reply latency varies. Some people open emails in minutes. Others take three weeks. The weekly optimizer needs to be robust to late replies that arrive after a weight update has already run.
None of this is magic. It's a CRM, a few scheduled agent jobs, a structured database, and a deliberate feedback loop. The sophistication isn't in any individual piece. It's in the architecture that connects them — so that every send is informed by every previous send, and the system quietly gets better every week without anyone having to manually update anything.
Written in collaboration with JBot and Claude. The architecture, decisions, and opinions are mine. The drafting, diagrams, and code are ours.