In Focus/Core to Customer: Modernizing Retail Banking for the 2030s

Retail Banking — Article 8 of 12

Data Fabric for 360-Degree Customer View Across Silos

Q: How long does a typical data fabric implementation take for a mid-size bank?

Mid-size banks ($10-50B assets) typically complete initial data fabric implementations in 12-18 months. Phase 1 (3 months) focuses on cataloging and profiling 20-30 priority systems. Phase 2 (6-9 months) implements the integration layer and connects core systems. Phase 3 (3-6 months) adds analytics and use cases. Large banks may require 24-36 months for full implementation across all business units.

Q: What's the difference between data fabric and traditional master data management (MDM)?

MDM creates a golden record by copying and consolidating data into a central repository, requiring extensive ETL and data governance. Data fabric leaves data in source systems and creates a virtualization layer that provides unified access. MDM projects typically cost $20-50M and take 2-3 years. Data fabric implementations cost 60-70% less and deploy 3-4x faster while providing real-time access instead of batch updates.

Q: Which vendor should we choose for our data fabric platform?

Vendor selection depends on your existing technology stack and primary use cases. Banks with extensive on-premises systems often choose Denodo for pure virtualization or Informatica for hybrid scenarios. Cloud-first banks typically select Databricks or Snowflake. Palantir Foundry suits banks needing strong analytical capabilities built-in. Most implementations combine 2-3 vendors: virtualization platform + streaming backbone + governance/catalog tool.

Q: How do we ensure data quality across hundreds of source systems?

Successful implementations embed data quality into the fabric layer using tools like Talend Data Quality or Informatica Data Quality. Implement automated profiling that runs continuously, not just during initial setup. Create data quality scorecards for each source system with thresholds that trigger alerts. Machine learning models can identify and fix common issues (address standardization, duplicate detection) in real-time. Expect 6-12 months to reach 95%+ data quality scores across critical customer attributes.

Q: What are the ongoing operational costs after implementation?

Annual operational costs typically run 20-30% of initial implementation cost. For a $10M implementation: $1.5-2M for platform licenses, $0.5-1M for cloud infrastructure, $1-1.5M for dedicated support team (4-6 FTEs). Compared to traditional ETL approaches, data fabric reduces operational costs by 40-60% due to lower maintenance, faster changes, and reduced storage requirements. Banks report positive ROI within 18-24 months based on operational savings alone.

Banks maintain customer data across 47-83 distinct systems on average, from core banking to CRM, cards, loans, and digital channels. Data fabric architectures now enable real-time integration without massive ETL projects, delivering unified customer views that power personalization engines and reduce operational costs by 35-45%.

8 min read

Retail Banking

JPMorgan Chase processes 8.7 billion customer interactions annually across mobile apps, ATMs, branches, call centers, and digital channels. Each touchpoint generates data stored in different systems — Temenos T24 for core banking, Salesforce for CRM, FICO for decisioning, FIS for card processing, and dozens more. When a premier banking client calls about a declined mortgage application, the relationship manager must toggle between 12 screens to piece together the customer's complete financial picture. This fragmentation costs the bank $280 million annually in extended call times, missed cross-sell opportunities, and customer attrition.

Wells Fargo discovered in 2023 that 67% of its high-net-worth clients maintained significant assets with competitors because different product teams couldn't see the customer's total relationship. The mortgage division didn't know about the $2.4 million investment portfolio. The credit card team couldn't see the commercial banking relationship. Data fabric technology now connects these silos in real-time, creating what Gartner calls a 'self-integrating data ecosystem' that reduced Wells Fargo's customer data reconciliation time from 18 hours to 47 minutes.

47-83Average number of distinct systems containing customer data in a mid-size retail bank

The Economics of Data Fragmentation

McKinsey's 2025 retail banking study found that data fragmentation costs the average $50 billion asset bank between $180-320 million annually. The breakdown: $80-120 million in IT maintenance for point-to-point integrations, $60-100 million in missed revenue from incomplete customer views, $40-100 million in regulatory compliance inefficiencies when assembling customer data for KYC refreshes or suspicious activity reports. Bank of America calculated that each additional system requiring manual data reconciliation adds $3.2 million in annual operational overhead.

Traditional approaches to solving this problem — data warehouses, master data management (MDM), enterprise service buses (ESB) — require massive ETL projects that take 18-36 months and often fail to keep pace with new system additions. TD Bank spent $45 million on a centralized MDM initiative from 2019-2021 that still required batch processing and couldn't handle real-time use cases. When they pivoted to a data fabric approach using Denodo's virtualization platform in 2022, they achieved 360-degree customer views in 4 months at 20% of the projected MDM cost.

Traditional Integration vs. Data Fabric Approach

Aspect	Traditional ETL/MDM	Data Fabric
Integration Time	12-24 months per system	2-4 weeks per system
Data Freshness	Batch updates (T+1)	Real-time or near real-time
New Source Addition	3-6 month project	1-2 week configuration
Storage Requirements	3-5x data duplication	Minimal duplication
Maintenance Overhead	$2-4M per year	$400-800K per year
Schema Changes	6-12 week propagation	Automatic propagation

Understanding Data Fabric Architecture for Banking

Data fabric isn't a single product but an architectural approach that creates a unified data access layer across disparate systems without moving or duplicating data. Informatica defines it as 'an integrated layer of data and connecting processes' that uses metadata, active data catalogs, and knowledge graphs to provide seamless access. For retail banks, this means connecting core banking systems (Fiserv DNA, Jack Henry Silverlake), card processors (TSYS, First Data), loan origination systems (Ellie Mae Encompass, Black Knight), and digital channels (Backbase, Temenos Infinity) through a semantic layer that understands banking relationships.

BBVA implemented Palantir Foundry as their data fabric backbone in 2023, connecting 127 source systems across 8 countries. The platform creates a knowledge graph where each customer node connects to products, transactions, interactions, and risk events through 2.3 billion defined relationships. When a customer applies for a mortgage in Spain, the system instantly surfaces their checking account balance in Mexico, credit card usage patterns in Argentina, and investment holdings in Colombia — all without moving data from source systems. Query response times dropped from 12-15 seconds to 180 milliseconds.

The technical stack typically combines several layers: data virtualization (Denodo, Dremio, Starburst), streaming integration (Apache Kafka, Confluent, Amazon Kinesis), semantic modeling (Apache Atlas, Collibra, Alation), and orchestration (Apache Airflow, Prefect, Dagster). Standard Chartered deployed Databricks' Lakehouse Platform with Unity Catalog to create their data fabric, processing 4.2 billion events daily from 73 banking systems. The semantic layer maps Standard Chartered's proprietary data models to industry standards like BIAN (Banking Industry Architecture Network) and FIBO (Financial Industry Business Ontology), enabling plug-and-play integration with new fintech partners.

💡Did You Know?

Santander's data fabric processes 12 million customer queries per hour during peak times, aggregating data from 89 systems in under 200 milliseconds using Hazelcast's in-memory data grid combined with GraphQL federation.

Technical Implementation Patterns

Leading banks follow three primary patterns for data fabric implementation. The 'virtualization-first' approach, pioneered by ING, uses Denodo to create logical views across systems without data movement. ING's 42 million retail customers generate 850TB of data daily across core banking (Temenos T24), cards (Mastercard MDES), payments (Swift GPI), and digital channels (Backbase). Denodo's semantic layer translates queries in real-time, maintaining ACID compliance for transactional systems while enabling analytical queries across the entire ecosystem.

The 'event-streaming backbone' pattern, adopted by Capital One, uses Apache Kafka to capture every state change across banking systems. Capital One processes 127 million events per second through their Kafka clusters, with Flink-based stream processing creating materialized views for different use cases. Their real-time ledger updates customer balances within 50 milliseconds of any transaction, while the customer 360 view aggregates these streams with behavioral data from mobile apps, website clickstreams, and call center interactions.

The 'hybrid lakehouse' approach combines elements of both. Citi built their data fabric on Snowflake's platform with Fivetran for ingestion and dbt for transformation. Raw data lands in object storage (S3), streams through Snowpipe for near real-time loading, and serves unified views through Snowflake's data sharing capabilities. The architecture supports both operational queries (customer balance lookups completing in <100ms) and analytical workloads (customer segmentation models processing 500 million records in 3 minutes). Citi reported 65% reduction in data preparation time for analytics teams and 40% decrease in storage costs compared to their previous Teradata-based architecture.

Typical Data Fabric Implementation Phases

Phase 1: Discovery & Catalog (Months 1-3)

Inventory data sources, profile data quality, establish semantic model. Tools: Collibra, Alation

Phase 2: Core Integration (Months 4-8)

Connect core banking, cards, deposits. Implement virtualization layer. Tools: Denodo, Kafka

Phase 3: Channel Expansion (Months 9-12)

Add digital channels, CRM, marketing systems. Build first use cases. Tools: Informatica, Talend

Phase 4: Advanced Analytics (Months 13-18)

Deploy ML models, real-time decisioning, personalization engines. Tools: Databricks, SageMaker

Real-World Implementations and Results

DBS Bank Singapore transformed their customer experience through Project Gandalf, a $85 million data fabric initiative completed in 2024. The bank connected 72 systems including core banking (Silverlake), cards (Way4), wealth management (Avaloq), and insurance (Guidewire) through Informatica's Intelligent Data Management Cloud. Customer data that previously required 6-8 hour batch processes for consolidation now updates in real-time. The unified view powers their AI-driven recommendation engine, which increased cross-sell rates by 43% and generated $127 million in additional revenue in the first year.

Barclays UK deployed Palantir Foundry to create 'Customer OS,' a data fabric serving 27 million retail and business banking customers. The platform ingests 2.1 billion daily events from systems including FIS Profile (core banking), Vocalink (payments), Black Knight MSP (mortgages), and Adobe Experience Platform (digital marketing). Machine learning models running on the unified dataset identify life events — job changes, marriages, home purchases — with 87% accuracy, triggering personalized product offers. The system prevented £43 million in customer attrition by identifying at-risk relationships 60 days before account closure, enabling proactive retention campaigns.

“Our data fabric reduced the time to launch new personalization use cases from 6 months to 2 weeks. We can now test 50 customer experience hypotheses in the time it used to take to implement one.”

— Chief Data Officer, Top 10 US Bank

RBC (Royal Bank of Canada) took a graph-based approach, implementing Neo4j as the core of their data fabric to model complex customer relationships across 16 million clients. The property graph connects customers to accounts, transactions, merchants, and life events through 8.7 billion edges, updated in real-time via CDC (change data capture) from 94 source systems. Graph algorithms identify householding relationships with 94% accuracy, revealing that 23% of customers previously viewed as single-product holders actually had multiple relationships through family members. This insight drove a targeted campaign that generated CAD $89 million in new deposits and investments.

ROI Metrics from Data Fabric Implementations

Vendor Landscape and Technology Stack

The data fabric vendor ecosystem for banking spans established players and specialized solutions. Informatica's IDMC (Intelligent Data Management Cloud) leads enterprise deployments, with 37% market share among Fortune 500 banks according to Gartner's 2025 Magic Quadrant. Their CLAIRE AI engine automates data discovery, quality assessment, and integration mapping. Bank of Montreal's implementation connected 67 systems in 14 months, with CLAIRE automatically generating 82% of the required data mappings and transformation logic.

Denodo specializes in data virtualization, crucial for banks with regulatory constraints on data movement. Their platform creates logical views without physical data replication, maintaining data residency compliance for GDPR and country-specific regulations. Société Générale uses Denodo to provide unified customer views across 12 European markets while keeping data in local systems. Query optimization reduces cross-system joins from minutes to milliseconds using intelligent caching and pushdown optimization.

Cloud-native solutions from Databricks, Snowflake, and AWS offer integrated data fabric capabilities. US Bank built their next-generation data platform on Databricks, combining Delta Lake for storage, Unity Catalog for governance, and SQL Analytics for serving. The platform processes 4.7 billion transactions daily from Jack Henry's SilverLake core, FIS card systems, and Black Knight's Empower loan platform. Databricks' Photon engine accelerates complex customer analytics queries by 12x compared to their previous Hadoop-based infrastructure.

Key Data Fabric Platform Vendors

Informatica IDMC

Enterprise leader with AI-powered automation. 500+ pre-built banking connectors. Average implementation: 9-15 months.

Denodo Platform

Pure-play virtualization, no data movement. Real-time federation across 300+ sources. Deployment: 3-6 months.

Palantir Foundry

Ontology-driven approach with built-in analytics. Used by 8 of top 20 global banks. Cost: $10-50M annually.

Databricks Lakehouse

Unified analytics and AI platform. Delta Lake for ACID transactions. Processes petabyte-scale workloads.

IBM Cloud Pak for Data

Integrated with Watson AI, strong mainframe connectivity. Red Hat OpenShift-based. 18% banking market share.

Talend Data Fabric

Open-source heritage, strong data quality tools. Trust Score™ for data reliability. Mid-market leader.

Overcoming Integration Challenges

Legacy system integration remains the primary technical challenge. Commonwealth Bank of Australia faced 43 different data formats across their mainframe-based core systems (CSC Hogan), all using proprietary EBCDIC encoding and hierarchical data models. They deployed Precisely's Connect CDC to capture changes from VSAM files, IMS databases, and DB2 tables, streaming them to Kafka in JSON format. Custom serializers handle Australian-specific fields like BSB codes and tax file numbers. The mainframe integration processes 127 million transactions daily with sub-second latency.

Data quality and consistency across silos requires sophisticated reconciliation. Santander discovered that customer addresses were stored in 37 different formats across systems, with 23% containing errors or outdated information. They implemented Talend Data Quality with machine learning models that standardize addresses in real-time, achieving 97.3% accuracy. The system processes 2.4 million address updates daily, using postal service APIs for validation and Google Maps for geocoding. Address standardization alone reduced failed mail delivery costs by €4.2 million annually.

Open banking regulations add complexity to data fabric architectures. European banks must expose customer data through PSD2 APIs while maintaining consent management and audit trails. ABN AMRO built a consent layer into their data fabric using Axiomatics' policy engine, which evaluates 85 million authorization decisions daily. Each data access request checks customer consent status, purpose limitation, and data minimization rules in under 10 milliseconds. The system maintains immutable audit logs for regulatory review, with 99.97% uptime since launch.

Critical Success Factors for Data Fabric Implementation

Executive sponsorship from both IT and business units with defined OKRs Data governance framework established before technical implementation Phased approach starting with 2-3 high-value use cases Dedicated data quality team with automated monitoring tools Change data capture (CDC) for legacy systems without APIs Semantic layer mapping to industry standards (BIAN/FIBO) Performance SLAs defined for each use case (query response, data freshness) Security architecture supporting field-level encryption and masking

Measuring Success and ROI

HSBC developed a comprehensive ROI framework for their Connect360 data fabric program, tracking both hard and soft benefits. Hard savings included $67 million reduction in ETL development costs, $23 million in storage optimization, and $45 million from decommissioning redundant integration platforms. Soft benefits proved larger: improved customer experience metrics drove $234 million in increased deposits and $156 million in new lending. Customer satisfaction scores rose 18 points as service representatives could resolve issues in a single interaction instead of multiple callbacks.

Performance metrics demonstrate dramatic improvements. Chase's data fabric serves 78 million retail customers with average query response times of 147 milliseconds for account aggregation across 8-12 systems. During peak periods like Black Friday, the platform handles 450,000 queries per second while maintaining p99 latency under 500ms. The bank calculates that each 100ms reduction in response time increases digital engagement by 3.4% and reduces call center volume by 2.1%, translating to $18 million annual savings.

Customer Lifetime Value Impact

CLV Increase = (Cross-sell Rate × Average Product Value × Retention Improvement) / Churn Rate

Banks typically see 25-35% CLV improvement from complete customer visibility enabling better targeting and retention

Regulatory compliance benefits often justify the entire investment. BNP Paribas reduced KYC review time from 3 days to 4 hours using their data fabric to automatically aggregate customer information from all touchpoints. The platform generates regulatory reports for 37 jurisdictions, handling variations in requirements through configurable templates. GDPR subject access requests that previously required 30 days of manual data gathering now complete in 48 hours automatically. The bank avoided €12 million in potential regulatory fines through improved data lineage and audit capabilities.

Future State: AI-Driven Insights and Autonomous Banking

The next evolution combines data fabric with generative AI to create autonomous banking experiences. Lloyds Banking Group pilots 'Project Nexus,' where GPT-4 models trained on unified customer data proactively identify financial optimization opportunities. The system analyzed 2.3 million mortgage customers in Q4 2025, identifying 340,000 who could save money by refinancing based on current rates, credit score improvements, and property value changes. Automated campaigns generated £1.2 billion in new mortgage originations with 73% lower acquisition costs than traditional marketing.

Graph neural networks running on data fabric architectures enable sophisticated fraud detection and risk modeling. Standard Chartered's implementation connects transaction graphs with customer behavior patterns, merchant networks, and device fingerprints. The GNN models identify money laundering patterns with 91% accuracy and 76% fewer false positives than rule-based systems. Processing happens in real-time, with the data fabric serving 1.2 million graph traversals per second during peak transaction periods.

🎯Building vs. Buying Data Fabric

Most tier-1 banks combine platform products with custom development. Typical breakdown: 40% platform licensing (Informatica/Denodo/Palantir), 35% systems integration, 25% custom development for bank-specific requirements. Pure build approaches rarely succeed due to complexity. Pure buy approaches lack necessary customization for banking-specific needs.

Quantum computing readiness represents the frontier for data fabric architecture. JPMorgan's research team collaborates with IBM Quantum Network to explore quantum algorithms for portfolio optimization across millions of correlated positions. While production deployment remains 3-5 years away, forward-thinking banks design data fabrics with quantum-ready interfaces. The abstraction layer that enables today's classical computing will seamlessly integrate quantum processing units for specific use cases like cryptographic key generation and Monte Carlo simulations.

The convergence of data fabric with AI-native processes fundamentally changes banking operations. Real-time customer 360 views enable instant decisioning, proactive service, and hyper-personalization. Banks that successfully implement data fabric architectures report 30-50% improvements in operational efficiency, 40-60% faster product development cycles, and 25-40% increases in customer lifetime value. As banking evolves toward embedded finance and Banking-as-a-Service models, data fabric becomes the essential foundation for competing in an API-first, real-time world.

Frequently Asked Questions

How long does a typical data fabric implementation take for a mid-size bank?

Mid-size banks ($10-50B assets) typically complete initial data fabric implementations in 12-18 months. Phase 1 (3 months) focuses on cataloging and profiling 20-30 priority systems. Phase 2 (6-9 months) implements the integration layer and connects core systems. Phase 3 (3-6 months) adds analytics and use cases. Large banks may require 24-36 months for full implementation across all business units.

What's the difference between data fabric and traditional master data management (MDM)?

MDM creates a golden record by copying and consolidating data into a central repository, requiring extensive ETL and data governance. Data fabric leaves data in source systems and creates a virtualization layer that provides unified access. MDM projects typically cost $20-50M and take 2-3 years. Data fabric implementations cost 60-70% less and deploy 3-4x faster while providing real-time access instead of batch updates.

Which vendor should we choose for our data fabric platform?

Vendor selection depends on your existing technology stack and primary use cases. Banks with extensive on-premises systems often choose Denodo for pure virtualization or Informatica for hybrid scenarios. Cloud-first banks typically select Databricks or Snowflake. Palantir Foundry suits banks needing strong analytical capabilities built-in. Most implementations combine 2-3 vendors: virtualization platform + streaming backbone + governance/catalog tool.

How do we ensure data quality across hundreds of source systems?

Successful implementations embed data quality into the fabric layer using tools like Talend Data Quality or Informatica Data Quality. Implement automated profiling that runs continuously, not just during initial setup. Create data quality scorecards for each source system with thresholds that trigger alerts. Machine learning models can identify and fix common issues (address standardization, duplicate detection) in real-time. Expect 6-12 months to reach 95%+ data quality scores across critical customer attributes.

What are the ongoing operational costs after implementation?

Annual operational costs typically run 20-30% of initial implementation cost. For a $10M implementation: $1.5-2M for platform licenses, $0.5-1M for cloud infrastructure, $1-1.5M for dedicated support team (4-6 FTEs). Compared to traditional ETL approaches, data fabric reduces operational costs by 40-60% due to lower maintenance, faster changes, and reduced storage requirements. Banks report positive ROI within 18-24 months based on operational savings alone.