Data migration that makes AI work

Turning fragmented records into unified, governed, AI ready assets

hero image

Data is everywhere—emails, purchases, appointments, apps. AI promises insight and automation from all of it. However, what happens when that data is scattered across legacy systems, inconsistent in format, and riddled with duplication? Would you trust AI to recommend actions if it can’t reconcile who the customer actually is? If your models are guessing because timestamps are missing or product codes don’t match, how confident can you be in the outputs?

Alicja reflects on the foundational role of data governance in enabling AI to deliver real business value. She emphasizes that successful data migration is not just technical execution, but a strategic transformation of fragmented data into unified, AI-ready assets.

The stakes are not abstract. Gartner estimates poor data quality costs organizations an average of $12.9 million per year.

30%

Gartner also predicts that 30% of generative AI projects will be abandoned after proof-of-concept by 2025.

Forrester reports that while most enterprises are experimenting with GenAI, fewer than one in five have moved solutions into production. IDC continues to rank data quality and integration among the top barriers to AI ROI, even as AI-centric spending is forecast to exceed $300 billion by 2026. And PwC’s UK CIO survey (2024) found that 47% of CIOs are struggling to meet ROI expectations from technology investments. With numbers like these, the question becomes: how do you turn messy, multi-source data into an AI-ready asset that reliably drives business value? 

Problem statement

Data migration is often treated as a simple lift-and-shift exercise: export, import, done. In reality, moving data “as is” doesn’t solve problems—it multiplies them. Mismatched product codes, duplicate customer records, missing timestamps, inconsistent schemas—these everyday issues derail AI initiatives. 

When each source system speaks a different language (XML here, nested JSON there) and identifiers differ for the same person across channels, AI loses the context it needs to perform. The result? Predictable: 

  • Pilots that look promising but fail in production because the data doesn’t reconcile across processes. 
  • Analytics and AI that deliver inconsistent or contradictory insights, eroding stakeholder trust. 
  • Rising costs as teams repeatedly clean and rework data downstream instead of fixing it once upstream. 

Author’s perspective and expertise

Alicja Białek, Data Engineer, Cloud & Digital, PwC Poland 

Alicja Białek is a Data Engineer with more than three years of hands-on experience designing data transformations for complex migration projects. Working on customer journey data across multiple source systems, she has seen how small inconsistencies become big blockers—and how the right strategy, standards, and governance can turn fragmented records into a unified, governed, AI-ready asset.

Observations and learnings from recent projects

AI value is constrained not by models but by messy, inconsistent data that lacks a unified structure, a single source of truth, and clear governance. Treating migration as a transformation journey—not just a transfer—unlocks reliable AI outcomes. 

Proposed solution: Build an AI-ready migration playbook that unifies, governs, and enriches data before it lands in your target platform. 

  • Start with the end in mind: What decisions should AI support—next-best offer, churn risk, discount eligibility, fraud detection? Translate those decisions into required data elements and quality thresholds. 
  • Create a canonical data model: Normalize schemas across formats (XML, JSON, CSV) into shared definitions for customers, products, orders, events, and timestamps. Define the “golden” attributes and business rules that matter. 
  • Resolve identity and duplicates: Implement entity resolution (match/merge) so one customer is truly one customer across systems. Use deterministic keys plus fuzzy matching for names, addresses, and IDs. 
  • Preserve meaning, not just fields: Map semantics, not just columns. Ensure event timestamps, statuses, and references carry the same business meaning across sources.
  • Establish stewardship and governance: Appoint data owners and stewards for core domains (customer, product, order). Codify data standards, policies, quality rules, and escalation paths.
  • Instrument quality end-to-end: Profile, monitor, and alert on timeliness, completeness, accuracy, uniqueness, and consistency. Fix upstream, not downstream, wherever possible.
  • Build robust pipelines: Use well-structured ETL/ELT with change data capture (CDC), lineage, and metadata so transformations are traceable and auditable.
  • Make it usable for AI: Add a semantic layer, document business metrics, and expose well-governed datasets through APIs or feature stores so AI and analytics consume the same truth. 

A practical example Imagine stitching together a unified customer profile. One system logs an online browse session in nested JSON, another records a store appointment in XML, and a third tracks purchases in relational tables. The same person appears as “Jane A. Smith,” “J. Smith,” and “Jane Smith-Account 842.” Without identity resolution, your AI can’t see the full journey. However, with a canonical model, match/merge rules, and steward-approved standards, the profile becomes coherent: timestamps align, events sequence correctly, and duplicates collapse into a single entity. Now an AI assistant can surface relevant actions—flag a missed discount, recommend an appointment reminder, or detect churn risk—with context you trust.

A word of caution

  • Lift-and-shift is tempting, but costly: Moving broken data to a new platform doesn’t fix it; it just relocates the problem and multiplies rework. 
  • Don’t skip semantics: Field mapping without shared definitions leads to “fast wrong” AI. Agree on what a “customer,” “order,” or “event” means before you migrate. 
  • Beware local optimizations: Cleaning one source in isolation can break cross-channel consistency. Govern shared standards centrally, apply locally. 
  • Watch hidden costs: Without upstream quality controls, inference and storage bills rise as teams repeatedly reprocess data for every new use case. 
  • Security and privacy matter: Align identifiers and enrichment with consent, retention, and masking policies. Bad governance can introduce compliance risk just as quickly as bad data introduces model risk. 

Concluding point

If you want AI to drive real business value, stop treating data governance as an afterthought and migration as a simple move. Transform fragmented records into unified, governed, AI-ready assets: reconcile identities, normalize schemas, codify standards, and instrument quality. Do that, and your models become more reliable, your insights more actionable, and your ROI more defensible.

Supporting perspectives

To see how strong data foundations translate into business impact:

  • Applying GenAI to Strategic Initiatives like Revenue Optimization by Adam Rogalewicz shows how solid data, pricing, and trade terms come together to drive growth. Read Adam’s perspective.

  • Automating and accelerating software development with AI agents by Wiktor Witkowski explains how engineering teams can compress design, testing, and delivery cycles—without sacrificing quality.  Read Wiktor’s perspective.

Digital Foundations Hub - for Cloud, Data & AI

Discover our video series

Contact us

Mariusz Chudy

Mariusz Chudy

Partner, PwC Poland

Tel: + 48 502 996 481

Paweł Kaczmarek

Paweł Kaczmarek

Director, PwC Poland

Tel: +48 509 287 983

Marek Chlebicki

Marek Chlebicki

Partner, PwC Poland

Tel: +48 519 507 667

Jakub Borowiec

Jakub Borowiec

Partner, Analytics & AI Leader, PwC Poland

Tel: +48 502 184 506

Michael Norejko

Michael Norejko

Senior Manager, PwC Poland

Tel: +48 519 504 686

Mariusz Strzelecki

Mariusz Strzelecki

Senior Manager, PwC Poland

Tel: +48 519 505 634