Roll-ups remain the most reliable value-creation lever in middle-market private equity. A platform bought at 8-10x EBITDA that bolts on 10 sub-scale operators at 4-6x, then exits the consolidated entity at 11-13x, produces 2.5-3.5x MOIC almost regardless of organic growth. The catch: that math only works if the sponsor can find, qualify, and close the add-ons. Most platform theses assume 8-15 closed transactions over a 4-5 year hold, which means corporate development teams need to evaluate 400-1,200 targets to land them. Traditional sourcing — buy-side bankers, conferences, ZoomInfo lists, cold outreach — cannot produce that volume with adequate fit precision. AI-enabled add-on identification has become the operating standard at firms running multi-vertical roll-up strategies.
Why Manual Add-On Sourcing Breaks at Scale
Consider the math facing the corporate development lead at a dental DSO platform. The U.S. has roughly 130,000 active dental practices. Maybe 60,000 fit basic platform criteria (general practice, 2+ operatories, $800K-$5M revenue). Of those, perhaps 8,000 are in the platform's target geographies. Of those, maybe 1,500 have ownership demographics suggesting near-term sale interest (owner age 55+, no associate pipeline). A two-person corp dev team with buy-side banker support typically touches 150-250 of these per year. The rest are invisible. The same dynamic plays out in HVAC (110,000+ contractors), veterinary practices (32,000 in North America), insurance brokerages (37,000 P&C agencies), IT managed service providers (40,000+), fire & life safety, behavioral health, ophthalmology, auto repair, residential services, and the other 20-30 verticals where PE roll-ups dominate.
The bottleneck is not capital or even buyer-seller fit — it is the cost-per-qualified-introduction. Pye-Barker Fire & Safety closed over 100 add-ons in five years. Heartland Dental has done 600+ affiliations. Roper Technologies, ServiceMaster, and Driven Brands all operate sourcing engines that look more like B2B marketing operations than traditional corp dev. AI is the only way mid-market sponsors can approach that volume without 30-person internal teams.
The AI Sourcing Stack
An effective add-on identification system has four layers: a target universe graph, enrichment and signal extraction, fit scoring, and outreach orchestration. The universe layer ingests data from Grata, SourceScrub, Inven, Cyndx, PitchBook, state licensing boards, NPI registries (for healthcare verticals), FCC licenses (for telecom/MSPs), DOT registries (for logistics), USPTO filings, and proprietary web crawls. For most fragmented services verticals, public data captures 70-85% of the relevant universe; the remaining 15-30% lives in regional directories, trade association rosters, and county-level business registrations that require custom scraping.
On top of the raw universe, NLP models — typically fine-tuned BERT variants or GPT-4-class LLMs prompted with extraction schemas — pull structured signals from unstructured sources: website copy, Google reviews, LinkedIn employee counts, glassdoor listings, local news, court records, building permits. For a residential HVAC roll-up, the signal set includes service area ZIP codes, brand affiliations (Carrier, Trane, Lennox), Nexstar or Service Nation membership, BBB rating, fleet size (visible from review photos and Google Street View), and ownership age estimates from Whitepages and voter registration cross-references. This same architecture is described in Article 1 on platform deal sourcing — the difference for add-ons is that the universe is narrower and the fit criteria far more specific.
| Metric | Traditional Corp Dev | AI-Augmented |
|---|---|---|
| Targets evaluated annually | 150-250 | 3,000-8,000 |
| Time to build qualified pipeline of 50 | 6-9 months | 3-5 weeks |
| Platform-fit precision (% of contacted that pass IOI screen) | 12-18% | 45-60% |
| Cost per closed add-on (sourcing only) | $180K-$350K | $45K-$110K |
| Coverage of total addressable universe | 8-15% | 65-85% |
| Repeat outreach errors (already-owned, recently sold) | Common | Near zero with graph dedup |
Fit Scoring: Where Most Implementations Fail
Generating a 5,000-name list is the easy part. Ranking it so the corp dev team works the top 200 first is where the model earns its keep. A well-built fit score combines three model outputs: (1) strategic fit — does the target match platform criteria on service mix, geography, customer concentration, and revenue scale; (2) operational fit — can the platform's shared services (ERP, billing, procurement, HR) absorb this target without significant rework, a question covered in detail in Article 6 on shared services; and (3) transactability — is the owner likely to sell, at what multiple, and on what timeline.
The transactability score is the most underbuilt component in most PE sourcing stacks. Strong implementations use gradient-boosted models trained on closed-deal data from the firm's own historical pipeline plus enriched signals: owner age (inferred from LinkedIn tenure and public records), succession indicators (presence/absence of family members or named associates), capital structure stress (UCC filings, lawsuit records, tax liens), recent investment in CapEx (building permits, equipment purchases), and life events (recent moves, business address changes, divorce filings in public court records). At one industrial services platform we worked with, the transactability model achieved AUC of 0.78 against a holdout set of 340 historical owners — meaning the top-decile names were 4-5x more likely to be sellers within 18 months than the bottom decile.
Vertical Patterns: What the Top Roll-Ups Actually Look For
Each vertical has its own signal economics. The features that predict a good HVAC add-on are nearly orthogonal to those that predict a good ophthalmology add-on. A few patterns from active 2024-2025 platforms:
These signal sets are not theoretical — they map to real $50M-$500M platforms currently doing 6-15 deals per year. Pye-Barker, for instance, built proprietary scoring on fire alarm monitoring station type and AHJ (Authority Having Jurisdiction) territory overlap. Heartland Dental scores on case acceptance rate inferred from production-per-visit. The defensibility of the roll-up thesis increasingly lives in the quality of the proprietary scoring model, not in capital availability.
Workflow Integration: From Score to Closed Deal
A scored list that sits in a spreadsheet creates no value. The systems that actually move deals integrate scoring outputs directly into the corp dev workflow. Affinity and 4Degrees dominate the CRM layer in middle-market PE; both now have API hooks for ingesting external fit scores and surfacing them on contact records. Outreach sequencing typically runs through Apollo.io, Outreach, or Salesloft, with LLM-generated personalization pulling specifics from the target's website, recent press, and Google reviews. A well-tuned sequence drives 18-28% reply rates on cold owner outreach — versus 3-6% for generic banker letters.
Once an owner engages, the workflow shifts to NDA, financial submission, and LOI. AI-enabled QoE — covered in Article 3 — becomes essential at add-on scale because traditional QoE engagements at $80K-$150K per target destroy the economics of $3-8M EBITDA tuck-ins. Firms running high-velocity roll-ups have internal QoE-automation platforms that produce a defensible 60-page QoE on a sub-$5M EBITDA target in 5-7 business days for under $25K of marginal cost.
Implementation Path: 90 Days to Production
License core data (Grata + SourceScrub or equivalent), define vertical schema, build initial universe of 5K-50K names, deduplicate against existing CRM and known-owned competitors. Output: governed master list with confidence scores on each record.
Fine-tune NLP extractors on vertical-specific website and review data. Train fit-scoring model on historical pipeline (need at least 80-150 historical evaluations with outcome labels; firms without this data start with rule-based scoring and migrate to ML at month 9-12).
Wire scores into Affinity/4Degrees, build outreach sequences, run first cohort of 200-400 targets through full funnel. Measure response rate, IOI-conversion, and feedback signals back into model retraining loop.
Monthly model retraining on closed-loop data, expansion to adjacent verticals or geographies, integration with portfolio company corp dev teams (not just sponsor corp dev), build of proprietary signals unique to platform's thesis.
Build vs. Buy vs. Hybrid
Three vendor archetypes compete for this budget. Pure-data players (PitchBook, SourceScrub, Grata, Inven) provide the universe and basic enrichment; pricing runs $60K-$250K per year per platform. Workflow players (Affinity, 4Degrees, DealCloud) own the CRM layer at $40K-$120K per year. Full-stack roll-up specialists — including emerging vendors like Cyndx, Stax AI, and several PE-focused boutiques — offer integrated sourcing-to-CRM stacks at $150K-$500K. The right architecture depends on platform count: sponsors with 2-3 active roll-ups typically buy point solutions and stitch; sponsors with 6+ active platforms (Audax, Trivest, Shore Capital, Alpine, Main Street Capital) build internal sourcing platforms because the marginal cost of an additional vertical drops to near zero once core infrastructure exists.
The defensibility of the roll-up thesis increasingly lives in the proprietary scoring model, not in capital availability. Cheap capital is everywhere; superior target identification is not.
— Observed pattern across 40+ active middle-market consolidation platforms
Governance and What Goes Wrong
Three failure modes recur across implementations. First, models drift when the platform's strategy shifts but the training labels don't update — a platform that pivots from suburban to urban acquisitions will keep getting scored toward old geographies for 12-18 months unless someone forces a retrain. Second, data hygiene erodes: as the same target gets contacted by the platform, the sponsor, and three competing platforms over 18 months, CRM records fork, duplicate, and stale-out unless the firm enforces a master-record discipline. Third, outreach quality collapses when LLM-generated emails become generic, hitting spam filters and damaging the platform's brand with sellers — a real cost in tight-knit verticals where every owner knows every other owner.
The corp dev function inside portfolio companies is also changing. Platform CEOs increasingly demand that the sponsor provide sourcing infrastructure as a service — model output, target lists, and qualified leads — rather than just capital and board oversight. This is consistent with the broader portfolio operating model evolution where shared services move beyond finance and HR into commercial functions. The talent implications, including the rise of fractional analyst and corp dev roles, connect to the operating model questions covered in Article 12.
What to Expect on Returns
Sponsors with well-built AI sourcing capabilities report three measurable outcomes. Deal velocity rises 2-4x — typical platforms move from 3-4 add-ons per year to 8-12. Average entry multiple on add-ons drops 0.8-1.5x EBITDA because the platform reaches owners before competitive processes form. And the platform's exit valuation premium widens because acquirers pay for a demonstrated, repeatable sourcing engine, not just historical acquisitions. On a $250M EBITDA exit at 11x, the difference between a thesis backed by an industrialized sourcing engine versus an ad-hoc one is typically worth 0.5-1.5x of exit multiple — $125M-$375M of enterprise value. That is the math that justifies the $400K-$900K annual budget required to build the capability.
Add-on identification is no longer a banker-and-network exercise. It is a data engineering problem with a corporate development front end. The sponsors who treat it that way are doing 50-100 transactions per platform; the ones who don't are stalling at 4-6 and watching their exit multiples compress.