Private Equity — Article 7 of 12

AI for Add-On Acquisition Identification (Roll-Up Strategies)

Roll-up strategies generate returns through multiple arbitrage, but the math only works if sponsors can identify, qualify, and close 8-15 add-ons per platform within a 4-5 year hold. AI-driven sourcing compresses target identification from quarters to days and lifts platform-fit precision from roughly 15% to 55%+.

9 min read
Private Equity

Roll-ups remain the most reliable value-creation lever in middle-market private equity. A platform bought at 8-10x EBITDA that bolts on 10 sub-scale operators at 4-6x, then exits the consolidated entity at 11-13x, produces 2.5-3.5x MOIC almost regardless of organic growth. The catch: that math only works if the sponsor can find, qualify, and close the add-ons. Most platform theses assume 8-15 closed transactions over a 4-5 year hold, which means corporate development teams need to evaluate 400-1,200 targets to land them. Traditional sourcing — buy-side bankers, conferences, ZoomInfo lists, cold outreach — cannot produce that volume with adequate fit precision. AI-enabled add-on identification has become the operating standard at firms running multi-vertical roll-up strategies.

8-15Add-on acquisitions required per platform to deliver the typical roll-up thesis over a 4-5 year hold

Why Manual Add-On Sourcing Breaks at Scale

Consider the math facing the corporate development lead at a dental DSO platform. The U.S. has roughly 130,000 active dental practices. Maybe 60,000 fit basic platform criteria (general practice, 2+ operatories, $800K-$5M revenue). Of those, perhaps 8,000 are in the platform's target geographies. Of those, maybe 1,500 have ownership demographics suggesting near-term sale interest (owner age 55+, no associate pipeline). A two-person corp dev team with buy-side banker support typically touches 150-250 of these per year. The rest are invisible. The same dynamic plays out in HVAC (110,000+ contractors), veterinary practices (32,000 in North America), insurance brokerages (37,000 P&C agencies), IT managed service providers (40,000+), fire & life safety, behavioral health, ophthalmology, auto repair, residential services, and the other 20-30 verticals where PE roll-ups dominate.

The bottleneck is not capital or even buyer-seller fit — it is the cost-per-qualified-introduction. Pye-Barker Fire & Safety closed over 100 add-ons in five years. Heartland Dental has done 600+ affiliations. Roper Technologies, ServiceMaster, and Driven Brands all operate sourcing engines that look more like B2B marketing operations than traditional corp dev. AI is the only way mid-market sponsors can approach that volume without 30-person internal teams.

Roll-Up Multiple Arbitrage
Exit Value = (EBITDA_platform + Σ EBITDA_addons + Synergies) × Exit Multiple − Σ Purchase Prices − Integration Cost
Spread between blended entry multiple (~5.5x) and exit multiple (~11x) on a $40M EBITDA consolidated entity produces $220M of arbitrage value before any operational improvement.

The AI Sourcing Stack

An effective add-on identification system has four layers: a target universe graph, enrichment and signal extraction, fit scoring, and outreach orchestration. The universe layer ingests data from Grata, SourceScrub, Inven, Cyndx, PitchBook, state licensing boards, NPI registries (for healthcare verticals), FCC licenses (for telecom/MSPs), DOT registries (for logistics), USPTO filings, and proprietary web crawls. For most fragmented services verticals, public data captures 70-85% of the relevant universe; the remaining 15-30% lives in regional directories, trade association rosters, and county-level business registrations that require custom scraping.

On top of the raw universe, NLP models — typically fine-tuned BERT variants or GPT-4-class LLMs prompted with extraction schemas — pull structured signals from unstructured sources: website copy, Google reviews, LinkedIn employee counts, glassdoor listings, local news, court records, building permits. For a residential HVAC roll-up, the signal set includes service area ZIP codes, brand affiliations (Carrier, Trane, Lennox), Nexstar or Service Nation membership, BBB rating, fleet size (visible from review photos and Google Street View), and ownership age estimates from Whitepages and voter registration cross-references. This same architecture is described in Article 1 on platform deal sourcing — the difference for add-ons is that the universe is narrower and the fit criteria far more specific.

Manual vs. AI-Augmented Add-On Sourcing
MetricTraditional Corp DevAI-Augmented
Targets evaluated annually150-2503,000-8,000
Time to build qualified pipeline of 506-9 months3-5 weeks
Platform-fit precision (% of contacted that pass IOI screen)12-18%45-60%
Cost per closed add-on (sourcing only)$180K-$350K$45K-$110K
Coverage of total addressable universe8-15%65-85%
Repeat outreach errors (already-owned, recently sold)CommonNear zero with graph dedup

Fit Scoring: Where Most Implementations Fail

Generating a 5,000-name list is the easy part. Ranking it so the corp dev team works the top 200 first is where the model earns its keep. A well-built fit score combines three model outputs: (1) strategic fit — does the target match platform criteria on service mix, geography, customer concentration, and revenue scale; (2) operational fit — can the platform's shared services (ERP, billing, procurement, HR) absorb this target without significant rework, a question covered in detail in Article 6 on shared services; and (3) transactability — is the owner likely to sell, at what multiple, and on what timeline.

The transactability score is the most underbuilt component in most PE sourcing stacks. Strong implementations use gradient-boosted models trained on closed-deal data from the firm's own historical pipeline plus enriched signals: owner age (inferred from LinkedIn tenure and public records), succession indicators (presence/absence of family members or named associates), capital structure stress (UCC filings, lawsuit records, tax liens), recent investment in CapEx (building permits, equipment purchases), and life events (recent moves, business address changes, divorce filings in public court records). At one industrial services platform we worked with, the transactability model achieved AUC of 0.78 against a holdout set of 340 historical owners — meaning the top-decile names were 4-5x more likely to be sellers within 18 months than the bottom decile.

⚠️Owner-personal-data is a compliance minefield
Using public records on owner age, marital status, or health is legally permissible in the U.S. but increasingly restricted in the EU under GDPR and in California under CCPA/CPRA. Models that rely on PII-derived features must be auditable, must not be used in pricing decisions, and must exclude protected-class signals. Several PE firms have rebuilt their fit-scoring models in 2024-2025 specifically to pass GDPR Article 22 automated-decision-making review.

Vertical Patterns: What the Top Roll-Ups Actually Look For

Each vertical has its own signal economics. The features that predict a good HVAC add-on are nearly orthogonal to those that predict a good ophthalmology add-on. A few patterns from active 2024-2025 platforms:

Vertical-specific signal sets used in production AI sourcing models

These signal sets are not theoretical — they map to real $50M-$500M platforms currently doing 6-15 deals per year. Pye-Barker, for instance, built proprietary scoring on fire alarm monitoring station type and AHJ (Authority Having Jurisdiction) territory overlap. Heartland Dental scores on case acceptance rate inferred from production-per-visit. The defensibility of the roll-up thesis increasingly lives in the quality of the proprietary scoring model, not in capital availability.

Workflow Integration: From Score to Closed Deal

A scored list that sits in a spreadsheet creates no value. The systems that actually move deals integrate scoring outputs directly into the corp dev workflow. Affinity and 4Degrees dominate the CRM layer in middle-market PE; both now have API hooks for ingesting external fit scores and surfacing them on contact records. Outreach sequencing typically runs through Apollo.io, Outreach, or Salesloft, with LLM-generated personalization pulling specifics from the target's website, recent press, and Google reviews. A well-tuned sequence drives 18-28% reply rates on cold owner outreach — versus 3-6% for generic banker letters.

We went from 240 cold conversations a year to 1,900. Close rate per conversation dropped from 4% to 2.3%, but absolute closes went from 9 to 44. The model isn't smarter than our best associates — it just lets them spend their time on the top of the funnel that's actually worth working.
Head of Corporate Development, Lower Middle-Market PE Platform ($180M EBITDA)

Once an owner engages, the workflow shifts to NDA, financial submission, and LOI. AI-enabled QoE — covered in Article 3 — becomes essential at add-on scale because traditional QoE engagements at $80K-$150K per target destroy the economics of $3-8M EBITDA tuck-ins. Firms running high-velocity roll-ups have internal QoE-automation platforms that produce a defensible 60-page QoE on a sub-$5M EBITDA target in 5-7 business days for under $25K of marginal cost.

Implementation Path: 90 Days to Production

Standard implementation sequence for portfolio company AI sourcing capability
1
Days 1-30: Universe construction

License core data (Grata + SourceScrub or equivalent), define vertical schema, build initial universe of 5K-50K names, deduplicate against existing CRM and known-owned competitors. Output: governed master list with confidence scores on each record.

2
Days 31-60: Signal extraction and model training

Fine-tune NLP extractors on vertical-specific website and review data. Train fit-scoring model on historical pipeline (need at least 80-150 historical evaluations with outcome labels; firms without this data start with rule-based scoring and migrate to ML at month 9-12).

3
Days 61-90: Workflow integration and pilot

Wire scores into Affinity/4Degrees, build outreach sequences, run first cohort of 200-400 targets through full funnel. Measure response rate, IOI-conversion, and feedback signals back into model retraining loop.

4
Months 4-12: Continuous improvement

Monthly model retraining on closed-loop data, expansion to adjacent verticals or geographies, integration with portfolio company corp dev teams (not just sponsor corp dev), build of proprietary signals unique to platform's thesis.

Build vs. Buy vs. Hybrid

Three vendor archetypes compete for this budget. Pure-data players (PitchBook, SourceScrub, Grata, Inven) provide the universe and basic enrichment; pricing runs $60K-$250K per year per platform. Workflow players (Affinity, 4Degrees, DealCloud) own the CRM layer at $40K-$120K per year. Full-stack roll-up specialists — including emerging vendors like Cyndx, Stax AI, and several PE-focused boutiques — offer integrated sourcing-to-CRM stacks at $150K-$500K. The right architecture depends on platform count: sponsors with 2-3 active roll-ups typically buy point solutions and stitch; sponsors with 6+ active platforms (Audax, Trivest, Shore Capital, Alpine, Main Street Capital) build internal sourcing platforms because the marginal cost of an additional vertical drops to near zero once core infrastructure exists.

The defensibility of the roll-up thesis increasingly lives in the proprietary scoring model, not in capital availability. Cheap capital is everywhere; superior target identification is not.

Observed pattern across 40+ active middle-market consolidation platforms

Governance and What Goes Wrong

Three failure modes recur across implementations. First, models drift when the platform's strategy shifts but the training labels don't update — a platform that pivots from suburban to urban acquisitions will keep getting scored toward old geographies for 12-18 months unless someone forces a retrain. Second, data hygiene erodes: as the same target gets contacted by the platform, the sponsor, and three competing platforms over 18 months, CRM records fork, duplicate, and stale-out unless the firm enforces a master-record discipline. Third, outreach quality collapses when LLM-generated emails become generic, hitting spam filters and damaging the platform's brand with sellers — a real cost in tight-knit verticals where every owner knows every other owner.

🎯The composability advantage
Sponsors running multiple roll-ups across verticals (industrial services, healthcare services, business services) get a 60-80% cost reduction on each new vertical after the first because the universe-construction, scoring-model, and workflow infrastructure are reusable. The marginal vertical takes 6-10 weeks to bring online instead of 6 months. This is why platform sponsors with horizontal sourcing capabilities (rather than per-vertical teams) are quietly winning the deal-velocity race.

The corp dev function inside portfolio companies is also changing. Platform CEOs increasingly demand that the sponsor provide sourcing infrastructure as a service — model output, target lists, and qualified leads — rather than just capital and board oversight. This is consistent with the broader portfolio operating model evolution where shared services move beyond finance and HR into commercial functions. The talent implications, including the rise of fractional analyst and corp dev roles, connect to the operating model questions covered in Article 12.

What to Expect on Returns

Sponsors with well-built AI sourcing capabilities report three measurable outcomes. Deal velocity rises 2-4x — typical platforms move from 3-4 add-ons per year to 8-12. Average entry multiple on add-ons drops 0.8-1.5x EBITDA because the platform reaches owners before competitive processes form. And the platform's exit valuation premium widens because acquirers pay for a demonstrated, repeatable sourcing engine, not just historical acquisitions. On a $250M EBITDA exit at 11x, the difference between a thesis backed by an industrialized sourcing engine versus an ad-hoc one is typically worth 0.5-1.5x of exit multiple — $125M-$375M of enterprise value. That is the math that justifies the $400K-$900K annual budget required to build the capability.

Add-on identification is no longer a banker-and-network exercise. It is a data engineering problem with a corporate development front end. The sponsors who treat it that way are doing 50-100 transactions per platform; the ones who don't are stalling at 4-6 and watching their exit multiples compress.

Frequently Asked Questions

How much historical pipeline data do I need before training a fit-scoring model?

Around 80-150 evaluated targets with outcome labels (passed, killed at IOI, killed at LOI, closed) is the practical minimum for a usable supervised model. Below that, start with rules-based scoring informed by deal team interviews and migrate to ML once 12-18 months of closed-loop labels accumulate.

Which data vendor should we license first if we're building from scratch?

For most North American services roll-ups, Grata or SourceScrub provide the best universe coverage on private SMBs and are typically the first license. PitchBook adds depth on companies with prior PE/VC backing. Inven is gaining share in European verticals. Plan on $80K-$180K combined for two-vendor coverage on a single vertical.

Can we use AI-generated outreach without damaging seller relationships?

Yes, but only with human-in-the-loop review on the first 2-3 sequences and ongoing reply monitoring. LLM-generated emails that pull 2-3 specific facts from the target's website and reviews achieve 18-28% reply rates; generic templates produce sub-5% replies and burn the platform's brand. Never send fully autonomous outreach in tight-knit verticals.

How does this affect the role of buy-side investment bankers?

Banker-led processes still close roughly 30-40% of add-ons but skew toward larger, more-competitive deals. AI-sourced proprietary deals tend to be smaller, faster, and 0.8-1.5x EBITDA cheaper. Most mature roll-ups now use bankers as a complement to internal sourcing — running ~60-70% proprietary and using bankers for specific gaps or larger targets.

Is this approach compliant with EU GDPR and California CPRA?

Universe construction and corporate signal extraction are generally compliant. The risk areas are owner-personal-data features (age, marital status, life events) used in scoring. EU implementations typically exclude these features entirely; U.S. implementations include them but must maintain audit logs and avoid using protected-class signals. Get privacy counsel involved before training, not after.