ESG Data Collection and Assurance

ESG data collection from portfolio companies became an industry obligation. Actually making the data useful — not just collected — is a different problem, and a harder one.

8 min readUpdated April 22, 2026

Five years ago, ESG reporting in private markets was an LP request that GPs fulfilled reluctantly. Today, it is a regulatory obligation under SFDR in Europe and a mainstream LP requirement globally. Most GPs have built some kind of ESG data collection process. Very few have built one that produces actually useful data, and fewer still can defend their numbers under audit.

The gap between collected and useful is where the current work is. Collecting data from portfolio companies annually is straightforward. Making that data reliable, comparable across the portfolio, and defensible in assurance engagements is where firms are still figuring it out.

Collecting ESG data is a procurement problem. Making it useful is a data engineering problem. Most firms solve the first and stop.

What LPs and regulators actually want

Two distinct requirements often conflated.

SFDR-aligned PAI reporting. Under SFDR, Article 8 and Article 9 funds report principal adverse impacts — GHG emissions, water use, board diversity, fossil fuel exposure, and a defined set of metrics. Data is required annually with specific methodologies. Non-compliance creates regulatory exposure and marketing constraints.

LP-specific ESG reporting. Institutional LPs, especially European public pensions and sovereigns, have their own reporting frameworks — often aligned with SFDR but sometimes with additional metrics or different methodologies. LPs increasingly push for consistent reporting across their GP relationships, which means GPs face multiple overlapping requirements.

The two requirements share substantial overlap but are not identical. GPs that treat them as one problem end up with gaps in one or both. The cleaner approach is to capture a superset of data and map to different reporting frames as needed.

Requirement	Driver	Primary consumer	Stakes
SFDR PAI reporting	EU regulation	Regulators, prospectus disclosure	Regulatory + marketing
ILPA ESG DDQ	Industry convergence	Prospective LPs	Fundraising
LP-specific reporting	LP policy	Existing LPs	Ongoing relationship
Fund-level ESG reporting	Fund strategy	LPs, marketing	Brand + differentiation

The collection problem

Collecting data from portfolio companies has specific failure modes that are predictable across GPs.

Portfolio company capability varies enormously. A $500M revenue company with a sustainability team produces accurate Scope 1, 2, and 3 GHG data on request. A $30M revenue company with no sustainability function produces estimated numbers that cannot withstand review. GPs cannot impose uniform expectations; they have to scale the ask to company capability.

Methodology inconsistency. Scope 3 emissions can be calculated through multiple legitimate methodologies producing different numbers. Without specifying methodology, different portfolio companies submit incomparable data.

Data timeliness. Annual data collected 6–9 months after year end. By the time it is aggregated and reported, it is 12–15 months old. This is acceptable for regulatory reporting and problematic for any kind of management or trend analysis.

Verification gaps. Most collected data is self-reported by portfolio companies without independent verification. Under assurance engagements, auditors ask about verification and often find it missing.

The GHG data pattern. Scope 1 emissions (direct) are reasonably well-reported by most portfolio companies. Scope 2 (purchased electricity) is usually available. Scope 3 (value chain) is the biggest methodological challenge and the largest source of variance across the portfolio. GPs that require Scope 3 without specifying methodology get inconsistent data that cannot support assurance.

Making data useful

Three data engineering practices separate useful ESG data from collected ESG data.

Structured collection with validation. Data is submitted through structured templates with validation at submission. Out-of-range values, missing required fields, or methodology inconsistencies are caught at collection rather than discovered at reporting. This alone improves data quality materially.

Methodology fingerprinting. Every data point is tagged with methodology — which emission factors, which boundary definitions, which reporting standard. Aggregated data preserves this fingerprinting so users understand what the data represents.

Time series integrity. ESG data is valuable primarily in trend form — is the portfolio company reducing emissions, improving diversity, managing water risk better. This requires consistent methodology over time or clear notation when methodology changes. Firms that restate historical data silently lose the trend utility.

Maturity progression for ESG data

Stage 1: Annual collection, spreadsheet-based, self-reported
Stage 2: Structured templates with validation, portfolio-level aggregation
Stage 3: Methodology fingerprinting, time-series integrity, audit trail
Stage 4: Selective independent verification, aligned with assurance standards
Stage 5: Continuous data flows from portfolio company systems where feasible

Where assurance is heading

ESG assurance is moving from voluntary to required faster than most GPs assumed. In the EU, CSRD requires limited assurance on sustainability reporting starting with 2024 fiscal year reporting, moving toward reasonable assurance over time. US SEC climate disclosure, though less sweeping than originally proposed, is moving in similar directions.

Assurance requires a different data quality posture than reporting. Assurance engagements examine methodology, verification evidence, and consistency. Data that was adequate for disclosure purposes often fails in assurance. GPs preparing for this are investing now in the data engineering that supports assurance rather than waiting to be forced.

Assurance readiness checklist

Methodology documented and consistently applied across portfolio
Data collection templates with validation and audit trail
Source data retention (not just aggregated outputs)
Clear boundary definitions (which entities are in scope)
Restatement policy and historical data governance
Independent verification for material metrics
Control framework documented for external review

For alternatives firms building ESG data infrastructure, the alternative investments capability model maps ESG against adjacent capabilities like portfolio monitoring, investor reporting, and fund accounting — useful for scoping investment in data engineering that supports multiple reporting frames rather than just immediate requirements.

Frequently Asked Questions

Does ESG data need to be assured to be reported?

Currently, assurance is required for CSRD filers in the EU and is voluntary elsewhere. The direction of travel is toward required assurance in most major markets within 2–3 years. GPs reporting ESG data that cannot be assured today face a rapidly closing window before that data becomes non-compliant.

Can smaller portfolio companies realistically meet GP ESG reporting expectations?

Not without support. A company of $30M revenue cannot realistically calculate Scope 3 emissions without external help. GPs with realistic portfolio company programs either provide support directly (template, methodology, sometimes external consultant funding) or explicitly tier their expectations by company size.

How do sovereign funds and large pensions judge GP ESG data quality?

Increasingly, through dedicated technical review. Sophisticated LPs evaluate methodology documentation, coverage, and consistency in addition to headline numbers. GPs that present clean aggregated numbers but cannot defend methodology under technical questions fare worse than GPs with messier numbers and better governance.