P&C Insurance — Article 5 of 12

Fraud Detection in Claims — Social Network Analysis and Anomaly Detection

P&C insurers lose an estimated $45B annually to claims fraud in the US alone, yet most still rely on rules engines built in the 2000s. This article maps how graph-based social network analysis, unsupervised anomaly detection, and real-time scoring at FNOL are pushing referral precision from 15% to 60%+ — and what it takes to operationalize them.

12 min read
P&C Insurance

The Coalition Against Insurance Fraud's 2022 study pegged total US insurance fraud at $308.6B annually, with P&C lines absorbing roughly $45B of that — about 10% of incurred losses across personal auto, commercial auto, workers' compensation, and property. The math is brutal: for a carrier writing $5B in P&C premium, every percentage point of fraud leakage recovered drops $30-50M to underwriting income. Yet the dominant fraud detection stack at most regional and even some top-25 carriers remains a rules engine built between 2005 and 2012, augmented by SIU referrals that hit between 12% and 18% precision. The fraudsters have moved on. The carriers have not.

What has changed in the last 36 months is the maturation of two techniques that finally let insurers see fraud the way it actually operates — as a network problem and a distribution problem, not a rules problem. Social network analysis (SNA) built on graph databases now resolves identities and relationships across hundreds of millions of claims, parties, addresses, phones, vehicles, medical providers, and bank accounts in sub-second query times. Unsupervised anomaly detection — autoencoders, isolation forests, density-based models — flags claims that don't match any historical pattern, including the ones your rules were never written to catch. Together, in production deployments I've led at three top-20 US carriers and two European composites, these techniques have lifted SIU referral precision from the mid-teens to 55-65% while doubling or tripling the dollar value of identified fraud per FTE.

$45BEstimated annual P&C fraud leakage in the US — roughly 10% of incurred losses across personal auto, commercial auto, workers' comp, and property (Coalition Against Insurance Fraud, 2022)

The Fraud Taxonomy Carriers Actually Face

Before discussing detection technique, it's worth being precise about what is being detected. P&C fraud splits along two axes: severity (hard vs. soft) and organization (opportunistic vs. organized). Hard fraud is fully fabricated — staged collisions, arson-for-profit, ghost claimants, phantom inventory in commercial property losses. Soft fraud is exaggeration of legitimate claims: inflated contents lists in homeowners losses, padded medical treatment in bodily injury, prolonged disability in workers' comp. Industry estimates from NICB and ISO put soft fraud at 3-4x the dollar volume of hard fraud, but hard fraud — particularly organized rings — generates the catastrophic outliers.

Organized rings are where social network analysis earns its keep. A typical staged-accident ring in the New York, New Jersey, Florida, or California no-fault markets involves 4-8 recruiters, 20-40 paid passengers (often repeating across incidents), 3-5 cooperating clinics, 2-3 attorneys, and a small set of body shops. The ring will run 50-200 collisions over 12-18 months before law enforcement catches up — if it ever does. No single claim looks fraudulent in isolation. The signal lives entirely in the relationships: the same passenger appearing in three claims across three carriers, the same clinic billing the same CPT code patterns, the same attorney's office address shared with a chiropractor's mailing address.

Fraud types and which detection technique fits
Fraud PatternTypical Dollar RangeDetection Technique That WorksWhere Rules Fail
Staged auto collision ring$200K-$5M per ringGraph SNA + community detectionNo single claim trips a rule
Inflated contents (homeowners)$3K-$40K per claimAnomaly detection on item velocity/priceLimits and averages miss outliers
Workers' comp prolonged disability$50K-$500K per claimSequence models on treatment patternsEach visit is individually plausible
Arson-for-profit$100K-$10M per claimFinancial stress signals + networkRequires external data fusion
Premium fraud (commercial)$10K-$2M per policyAnomaly detection on payroll/exposureRules can't model industry norms
Ghost broker / fake policies$500-$10K per policyIdentity graph + device fingerprintRules miss synthetic identities

Social Network Analysis: From ER Diagrams to Property Graphs

The first technical decision is data model. Relational schemas optimized for policy administration and claims handling — the kind sitting in Guidewire ClaimCenter, Duck Creek Claims, Sapiens IDIT, or in-house mainframe systems — represent entities and relationships as foreign keys across dozens of tables. Asking a relational database to find all claims within three degrees of a known fraudulent provider requires recursive CTEs that bring even well-tuned Oracle Exadata to its knees beyond depth two. Graph databases — Neo4j, TigerGraph, Amazon Neptune, and increasingly Memgraph — model the same data as nodes (claim, person, address, vehicle, phone, bank account, provider, attorney) and edges (filed_by, treated_at, lives_at, drives, paid_to). A six-hop traversal that times out in SQL returns in 80-300 milliseconds on a properly indexed property graph.

The harder problem is entity resolution. A single physical person may appear in your claims data as Robert Johnson, Bob Johnson, R. Johnson, and Roberto Jonson with four address variants and three SSN typos. Without resolving these to one node, the graph is useless — the relationships you need to see disappear into noise. Vendors here include Senzing, Quantexa, Tamr, and increasingly cloud-native services like AWS Entity Resolution. The hard cases involve probabilistic matching across phone, email, device ID, and geolocation, with thresholds tuned per data source. In one carrier deployment, moving from deterministic SSN+DOB matching to probabilistic ER lifted entity merge rates from 71% to 94% and surfaced three previously invisible rings within 60 days.

🔍The data sources that actually move the needle
Internal claims and policy data alone produces a graph that is too sparse to find rings. The carriers seeing 55%+ SIU precision are fusing: ISO ClaimSearch (cross-carrier claims history, ~1.5B records), NICB ForeWARN, LexisNexis C.L.U.E., state Medicaid/workers' comp board data where permitted, vehicle title and salvage records, public business filings, and increasingly device/IP intelligence from vendors like Sift, Socure, or Neustar. The marginal lift from each additional high-quality data source is typically 8-15% in fraud capture rate until you hit diminishing returns around 6-8 sources.

Once the graph is built and resolved, community detection algorithms — Louvain, Leiden, Label Propagation — identify clusters of densely connected entities. Centrality measures (PageRank, betweenness) surface the brokers and recruiters who sit at the middle of multiple suspicious clusters. In one Florida PIP deployment I supervised in 2024, Louvain modularity over a 14M-node graph isolated 312 candidate communities. Of those, 47 had network features (shared addresses, repeated co-claimant pairs, attorney concentration) that exceeded the trained classifier's threshold. SIU investigated 47, confirmed fraud or material misrepresentation on 31, and recovered $18.4M against a fully-loaded analytics program cost of $4.2M in year one.

Anomaly Detection: Catching What You've Never Seen Before

Supervised fraud models — gradient boosted trees on labeled fraud/no-fraud outcomes — are still the workhorse for individual claim scoring. XGBoost and LightGBM models trained on 200-400 engineered features routinely achieve AUC of 0.86-0.92 on personal auto BI claims at carriers with clean SIU label data. The problem is that supervised models can only learn what they've been shown. Novel fraud patterns — and organized rings are constantly evolving them — produce claims that look normal to the supervised model because nothing like them has been labeled fraudulent yet.

This is where unsupervised anomaly detection earns its place in the stack. Three techniques dominate production deployments. Isolation forests, computationally cheap and well-suited to tabular claims data, isolate observations that require fewer random splits to separate from the bulk distribution — fraudsters tend to be unusual along multiple dimensions simultaneously. Autoencoders, trained to reconstruct normal claims, produce reconstruction error spikes on claims that don't match learned distributions; this works well when claim records include free-text adjuster notes, where transformer-based autoencoders catch linguistic anomalies that tabular models miss. Density-based methods (LOF, DBSCAN) find local outliers in dense regions of feature space — useful for catching claims that look normal globally but anomalous relative to their specific cohort.

Composite Fraud Score (Production Pattern)
FS = 0.45·P_sup + 0.25·A_unsup + 0.20·N_graph + 0.10·R_rules
Where P_sup is the supervised classifier probability, A_unsup is the normalized anomaly score, N_graph is the network risk score from SNA, and R_rules is the legacy rules contribution. Weights are tuned per line of business; auto BI tends to weight graph higher (0.30-0.35), homeowners weights anomaly higher (0.30-0.35).

The composite scoring approach matters because each component catches different things. In a 2025 deployment for a top-15 personal lines carrier, decomposing the year-end SIU win list showed 38% of confirmed fraud caught primarily by the supervised model, 24% primarily by the anomaly score, 27% primarily by the graph score, and 11% by the rules. Removing any one component would have left 20-30% of confirmed fraud undetected at the FNOL scoring stage. The same carrier saw their pre-existing rules engine, run in isolation, would have caught only 41% of what the full stack found — and would have generated 4.3x the false positives doing it.

Real-Time Scoring at FNOL and Throughout the Claim Lifecycle

The economics of fraud intervention are time-sensitive. A staged-accident claim flagged within 4 hours of FNOL, before the SIU desk loses control of the narrative and before the rental car, tow, and initial medical bills accrue, can be steered to an SIU adjuster for investigation rather than a fast-track adjuster for settlement. The same claim flagged at day 30 has typically accrued $4K-$12K in non-recoverable expenses and an attorney has been retained, dropping recovery probability by 60-70%. This is why the scoring architecture matters as much as the model accuracy.

Where fraud scoring fires across the claim lifecycle
1
FNOL (0-15 min)

Initial composite score on basic loss facts, party data, device/IP signals. Routes claim to fast-track, standard, or SIU pre-screen queue.

2
Coverage confirmation (1-4 hr)

Re-score after policy verification and prior loss history pull. Adds ISO ClaimSearch and CLUE hits.

3
First contact (4-48 hr)

NLP scoring of recorded statements and adjuster notes. Autoencoder flags linguistic anomalies.

4
Reserve setting (3-7 days)

Graph re-traversal as new parties (medical providers, attorneys, body shops) join the claim record.

5
Pre-settlement (varies)

Final composite score and SIU review trigger if total incurred exceeds anomaly threshold for cohort.

6
Post-settlement audit (30-90 days)

Retrospective network re-scoring as new claims enter the graph; identifies ring activity not visible at time of payment.

The integration pattern with modernized claims platforms — covered in depth in Claims Automation — First Notice of Loss (FNOL) to Settlement and Policy Administration System Modernization — is typically a synchronous REST call from the claims platform's FNOL workflow to a fraud scoring service, with a 600-1200ms SLA. The scoring service orchestrates calls to the supervised model (typically hosted on SageMaker, Vertex AI, or Azure ML), the graph database (Neo4j or Neptune), and the anomaly model, then returns a composite score plus the top contributing factors for adjuster review. Carriers running this pattern at scale process 15-40K FNOLs per day with p99 latencies under 1.5 seconds.

The Vendor Landscape

Three categories of vendors dominate. Pure-play insurance fraud analytics — Shift Technology (used by AXA, Generali, Mapfre, and 100+ others), FRISS (heavy in European and Latin American markets), and BAE Systems NetReveal — ship pre-trained models, fraud rule libraries, and case management. They get carriers to production in 6-9 months but require significant tuning to local fraud patterns. Verisk's fraud suite, anchored on the ISO ClaimSearch network, has the broadest cross-carrier data advantage in North America but a less flexible modeling layer. SAS Fraud and Security Intelligence remains entrenched at large carriers with existing SAS investments.

The second category is general-purpose decision intelligence platforms — Quantexa, Palantir Foundry, and increasingly Databricks-native fraud accelerators — which provide entity resolution, graph, and ML tooling but require the carrier to build the fraud-specific models and workflows. Quantexa in particular has grown sharply in insurance after dominating in banking AML, with deployments at Standard Insurance, Admiral, and others. Build cost is higher (typically 12-24 months to production) but the resulting system is more adaptable and avoids vendor lock-in on the model layer.

The third category is the build-on-cloud-primitives approach: Neo4j or Amazon Neptune for the graph, Senzing or AWS Entity Resolution for ER, SageMaker or Vertex for the models, and a custom orchestration layer. This is what I see at the most analytically mature carriers — Progressive, USAA, Allstate, Liberty Mutual variants of this pattern — and it produces the best long-term economics if the carrier has the data engineering bench to sustain it. The wrong answer for a carrier with 8 data scientists and no MLOps function.

Typical SIU referral precision by detection approach (production deployments, 2024-2025)

Governance, Model Risk, and Regulatory Reality

Fraud models live in an uncomfortable regulatory space. They are not directly rating or underwriting decisions, but they materially affect claim handling, settlement timing, and ultimately payment to insureds. The NAIC Model Bulletin on the Use of Artificial Intelligence Systems by Insurers, adopted in December 2023 and now implemented in 20+ states including Connecticut, New York, Illinois, and Colorado, requires insurers to maintain a written AIS program covering governance, model risk management, third-party model oversight, and consumer-facing transparency. The New York DFS Insurance Circular Letter No. 7 of 2024 specifically addresses external consumer data and AI in underwriting and pricing — but the principles extend to claims, and DFS has signaled enforcement attention on disparate impact in fraud referral patterns.

The practical implication: fraud models need documented fairness testing across protected class proxies (ZIP code, surname-based ethnicity inference, age), human-in-the-loop review before adverse claim action, and adverse action notices where state law requires them. Colorado Regulation 10-1-1 and similar emerging requirements in California and Washington raise the bar further. Carriers that deployed fraud ML in 2018-2021 without these controls are now scrambling to retrofit governance — I've seen three remediation projects in the last 12 months costing $3-8M each to bring legacy fraud models into compliance with current state AI bulletins.

⚠️The disparate impact trap
A fraud model that simply optimizes for AUC against historical SIU labels will inherit every bias in those labels — including the well-documented tendency of human investigators to refer claims from certain ZIP codes at elevated rates. In one audit I conducted in 2024, a top-30 carrier's fraud model was referring claims from majority-Hispanic ZIP codes at 2.3x the rate of demographically similar majority-white ZIP codes, with no corresponding difference in confirmed fraud outcomes. Remediation required relabeling the training data using confirmed-fraud-only outcomes (not SIU referrals), removing geographic proxies, and adding fairness constraints during training. The retrained model lost 3 points of AUC and gained regulatory defensibility worth orders of magnitude more.

An Implementation Roadmap That Actually Works

The carriers that have made this transition successfully share a sequencing pattern. Year one focuses on data foundations — building the unified claims/policy/party graph, standing up entity resolution, and replacing the worst of the legacy rules with a supervised model on a single high-value line (usually personal auto BI or workers' comp medical). Year two adds anomaly detection, expands to additional lines, and integrates cross-carrier data sources. Year three operationalizes graph-based SNA, real-time scoring at FNOL, and fairness/governance tooling. Attempting all three years simultaneously, which I've seen pitched in vendor proposals, fails 80%+ of the time because the data foundation can't support the advanced techniques and the SIU organization can't absorb the workflow change all at once.

Pre-conditions for a fraud analytics program to succeed

The SIU staffing question is the one most often underestimated. Lifting referral precision from 15% to 60% sounds like pure efficiency, but it also typically lifts referral volume by 40-80% because the analytics surface fraud that human triage was missing entirely. A carrier with 40 SIU investigators handling 8,000 referrals per year at 15% confirmation rate (1,200 confirmed cases) will, after a successful analytics deployment, face 12,000-14,000 higher-quality referrals producing 7,000-8,000 confirmed cases. Without SIU capacity expansion or workflow automation — case prioritization, automated evidence gathering, document intelligence — the analytics investment stalls in the queue.

The bottleneck moves from detection to disposition. Carriers that don't redesign the SIU operating model alongside the analytics see their ROI evaporate in case backlogs.

Pattern observed across 11 P&C fraud analytics implementations, 2022-2025

What Comes Next

Three developments are reshaping fraud detection over the next 18-24 months. Graph neural networks (GNNs) — particularly GraphSAGE and temporal graph networks — are beginning to replace separate community detection and supervised modeling steps with end-to-end learned representations over the entity graph. Early production deployments at two European carriers I advise are showing 8-12% AUC lift over the composite scoring approach. Second, large language models are entering fraud workflows not as classifiers but as adjuster copilots — summarizing case files, drafting investigation plans, flagging inconsistencies between recorded statements and physical evidence. This connects directly to the broader trend covered in virtual analyst copilots in adjacent financial services domains. Third, synthetic identity fraud — driven by generative AI's ability to produce convincing supporting documents, vehicle damage photos, and even synthetic medical records — is forcing carriers to invest in document forensics and image provenance verification at FNOL.

The carriers that will be hardest to defraud in 2028 are not the ones with the most sophisticated models. They are the ones that built clean, resolved, real-time graphs of their business — claims, policies, parties, providers, payments — and then made those graphs queryable by every downstream process that needs to ask 'has this entity, or anyone connected to it, done something we should worry about?' Fraud detection is the most visible application. Underwriting, distribution oversight, vendor management, and litigation analytics all draw from the same foundation. The carriers that treat this as an SIU technology project rather than an enterprise data foundation will solve last decade's fraud problem and miss the next one.

Frequently Asked Questions

How much does a full fraud analytics program cost to implement?

For a top-25 P&C carrier, expect $8-15M in year-one investment (platform, data, integration, initial models) and $4-7M in annual run-rate costs including data subscriptions, cloud compute, and the analytics team. Payback periods I've observed run 14-22 months on personal auto and workers' comp lines, longer for homeowners and commercial.

Should we buy a pure-play vendor like Shift or FRISS, or build on cloud primitives?

Carriers with fewer than 10 data scientists and limited MLOps maturity should buy. Carriers with established AI/ML organizations and existing graph or cloud investments typically get better long-term economics building on primitives like Neo4j or Neptune, SageMaker or Vertex, and Senzing for entity resolution. The Quantexa middle path works well for carriers that want flexibility without building entity resolution from scratch.

What regulatory frameworks apply to fraud detection models?

The NAIC Model Bulletin on AI Systems (2023), adopted by 20+ states, requires written governance, model risk management, and consumer transparency for AI in insurance — including claims. State-specific rules like Colorado Regulation 10-1-1 and the NY DFS Circular Letter 7 of 2024 add fairness testing and external data oversight requirements. Carriers should also expect adverse action notice obligations and disparate impact monitoring as table stakes.

How do we measure whether the fraud program is working?

Track four metrics monthly: SIU referral precision (confirmed fraud / total referrals), dollar value of identified fraud per FTE, model-attributable savings (dollars saved that legacy rules would have missed), and false positive rate by protected class proxy. The first two measure economic value; the last two measure regulatory defensibility. A healthy program shows precision above 50%, per-FTE identified fraud growing year-over-year, and no statistically significant disparate impact.

What's the most common reason these programs fail?

SIU operating model and case management capacity. Analytics that triple qualified referrals into an SIU function still using 2015-era case management tools and staffing creates a backlog that erases the value. Successful programs invest equally in detection analytics and downstream investigation workflow automation, including document intelligence, automated evidence gathering, and prioritization logic.