Schema-on-read enables AML investigators to analyze unstructured financial data without predefined schemas, applying data structure at query time rather than ingestion. This approach reduces investigation preparation time by 60-80% while accommodating diverse data formats from multiple sources including emails, PDFs, and third-party feeds.
Why It Matters
Traditional schema-on-write architectures require 2-6 weeks to onboard new AML data sources, delaying critical investigations. Schema-on-read reduces this to 2-3 days by eliminating ETL bottlenecks. Investigation teams report 40-50% faster case resolution when they can query raw transaction logs, correspondence, and regulatory filings simultaneously without waiting for data engineering teams to create structured tables.
How It Works in Practice
- 1Ingest raw data from multiple sources (SWIFT messages, emails, transaction logs) into a data lake without transformation
- 2Apply schemas dynamically at query time using tools like Apache Spark or Presto when investigators need specific data views
- 3Execute ad-hoc queries across structured and unstructured datasets to identify suspicious patterns or entity relationships
- 4Transform only the relevant subset of data into temporary structured views for detailed analysis
- 5Preserve original data formats for regulatory audit trails while enabling flexible investigation approaches
Common Pitfalls
Query performance degrades significantly on large datasets without proper indexing strategies, potentially causing investigation delays
Regulatory examiner expectations for data lineage documentation become complex when schemas change between queries
Data quality issues remain hidden until query time, potentially compromising investigation accuracy and regulatory compliance
Storage costs increase 3-5× compared to compressed structured data, impacting long-term AML data retention economics
Key Metrics
| Metric | Target | Formula |
|---|---|---|
| Investigation Data Prep Time | <48 hours | Time from data source request to first investigative query execution |
| Query Response Time | <30 seconds | Average time for complex cross-source AML queries on 90-day transaction windows |
| Data Source Onboarding | <72 hours | Time from new external data feed receipt to investigator access availability |