Back to Glossary

Fraud & AML

Why schema-on-read works for AML investigation

Schema-on-read enables AML investigators to analyze unstructured financial data without predefined schemas, applying data structure at query time rather than ingestion. This approach reduces investigation preparation time by 60-80% while accommodating diverse data formats from multiple sources including emails, PDFs, and third-party feeds.

Why It Matters

Traditional schema-on-write architectures require 2-6 weeks to onboard new AML data sources, delaying critical investigations. Schema-on-read reduces this to 2-3 days by eliminating ETL bottlenecks. Investigation teams report 40-50% faster case resolution when they can query raw transaction logs, correspondence, and regulatory filings simultaneously without waiting for data engineering teams to create structured tables.

How It Works in Practice

  1. 1Ingest raw data from multiple sources (SWIFT messages, emails, transaction logs) into a data lake without transformation
  2. 2Apply schemas dynamically at query time using tools like Apache Spark or Presto when investigators need specific data views
  3. 3Execute ad-hoc queries across structured and unstructured datasets to identify suspicious patterns or entity relationships
  4. 4Transform only the relevant subset of data into temporary structured views for detailed analysis
  5. 5Preserve original data formats for regulatory audit trails while enabling flexible investigation approaches

Common Pitfalls

Query performance degrades significantly on large datasets without proper indexing strategies, potentially causing investigation delays

Regulatory examiner expectations for data lineage documentation become complex when schemas change between queries

Data quality issues remain hidden until query time, potentially compromising investigation accuracy and regulatory compliance

Storage costs increase 3-5× compared to compressed structured data, impacting long-term AML data retention economics

Key Metrics

MetricTargetFormula
Investigation Data Prep Time<48 hoursTime from data source request to first investigative query execution
Query Response Time<30 secondsAverage time for complex cross-source AML queries on 90-day transaction windows
Data Source Onboarding<72 hoursTime from new external data feed receipt to investigator access availability

Related Terms