Building the Data Foundations for AI deployment on the Cloud

hero image

If you’ve ever led a strategic initiative, you know how often great plans slip, stall, or never fully make it to production. It’s exciting to see AI, Agentic AI, and GenAI promise step changes in performance. However, reality bites: in a recent PwC CEE survey (2025), over 75% of companies reported investing in AI, but only a small fraction have production applications running at scale. So, what’s holding them back?

Adam provides an example of how Strategic Initiatives drive transformation in Consumer Goods, Retail, and Manufacturing sectors

75%

of companies we have surveyed have started investing in AI – but few have advanced into creating production applications.

A well-defined Data Strategy turns big ideas into real business value, mitigating the risk of failing to meet original transformation goals and ensuring the expected ROI from strategic initiatives.

Problem statement

Many organizations force GenAI into strategic initiatives without the necessary data foundations needed to sustain them. Jumping to use sophisticated LLMs before establishing the single source of truth, clear data ownership, and fit-for-purpose techniques typically results int the following:

  • Pilots that look promising, but are too costly and impossible to scale;
  • Inconsistent outcomes, inability to go into production, and a widening gap between expectations and outcomes;
  • The solution fails to meet business end users’ needs as there is lack of trust and as such lack of adoption.

Research suggests that this is a prevailing problem:

  • Gartner reports 79% of executives see AI as critical, yet warns that 30% of GenAI projects may be abandoned after POC by 2025.
  • Forrester finds most enterprises are experimenting with GenAI, but fewer than 1 in 5 have moved into production - highlighting the chasm between inspiration and industrialization.
  • IDC estimates AI-centric spending will exceed $300 billion by 2026, while consistently citing data quality and integration as top barriers to ROI.
  • PwC’s UK CIO survey (2024) reports 47% of CIOs struggle to meet ROI expectations, underscoring the need to anchor AI in business outcomes and sound data architecture.

Author’s perspective and expertise

Adam Rogalewicz, Manager, SWAT Team, PwC Poland

Adam Rogalewicz provides his insights coming directly from experience gathered throughout data transformation projects that enable strategic initiatives. Adam’s work focuses on building the data foundations, comprising and effective governance that enable the deployment of Gen AI and other forms of Deep Learning beyond the proof of concept into production to attain measurable business outcomes.

Observations and learnings from recent projects

Strategic initiatives drag on not because we do not know how to deploy the technology, but because there is lack of solid data foundations. For example, when master data is not standardized, harmonized, and enriched we lack a consistent ontology that defines key business domains like Customer, Product, Services, Vendors, to name a few, and the relationships between them. Without consistent master records AI outputs can become inconsistent, with hallucinations and biases resulting in increased validation costs, and ultimately stall the entire initiative.

To mitigate this, we re-configure the Enterprise Data Platform to provide access to cleansed datasets. Once we have cleansed data we can then leverage a combination of deterministic techniques along with LLMs to increase the accuracy and performance of simple tasks like matching & merging master records.

To put this into practice and make this less abstract, lets consider a simple example of a consumer goods company bottling and selling sparkling water through a network of distributors and retailers. This company allocates EUR 100 million of promotional spend to boost volume, but we notice that their master records that define their Products down to Stock Keeping Unit (SKU) are duplicative and inconsistent e.g., ‘Sparkling Water 500ml’ is the same as ‘Sparkling 0.5L’ vs. ‘Sparkle Water 500’ and so on. Now imagine the impact that this can have if there are over 10,000 SKUs across thousands of product lines and hundreds of thousands of different types of materials and packaging used to manufacture these variants of products. The EUR 100 million promotional spend would typically be misallocated, simply because Commercial teams like Marketing have a different definition of ‘Product’ to that of Finance, Supply Chain and Procurement departments.

This simple but yet very prevalent inconsistency leads to reduced net revenue and, worse, cannibalization of margin-accretive products that in turn should be promoted but are not. To solve this we can try and use LLMs to detect duplicates and clean the product catalogue. Initially the results may look promising but when we take into consideration the number of permutations derived from the number of product categories, brands, SKUs and departments, the costs of tokens would be too high to make this commercially viable to operate on a daily basis. So what is a more viable solution? A combination of fuzzy matching, and rule-based normalization; with LLMs and clerical review reserved for more complex cases. By adopting such a ‘hybrid’ approach to matching & merging, within which the application of LLMs was just one of many methods, the organization reduced the misallocation of promotional spend, and cannibalization of profitable products, and ultimately improved net revenue at a fraction of the original cost.

A word of caution

  • Avoid leading with the most expensive tool, if 80–90% of inconsistencies, errors, and conflicts can be resolved with deterministic or fuzzy techniques, reserve LLMs for the more complex cases. 
  • Without a consistent definition of key business domains like Product, Services, Materials, Customer, Vendors, Employees, just to name a few across your organization LLMs do not have the necessary ontology for LLMs to understand and interpret your organization.
  • Maintaining consistent definition of something is much more difficult than creating this in the first place. This is why data governance shouldn’t be perceived as bureaucracy but instead as an enabler i.e. clear data stewardship, naming conventions as part of a set of standards, and change control prevent rework.
  • Measure total cost of ownership across core data components including pipelines, integration end-points, systems of record and reference and not just the LLMs. Beyond this continue to track cost per resolution against the business benefits.

Concluding point

Strategic initiatives like Net Revenue Optimization, Zero Based Budgeting, migrations to new instances of ERPs and CRMs including SAP4/HANA and Salesforce, just to name a few, will always be dependent on reliable data across multiple sources both from within and outside of the organization. Of course there is now the opportunity to improve and accelerate strategic initiatives with the application of GenAI and a multitude of new and sophisticated LLMs. However, in doing so we should not forget to also utilize much simpler techniques including rules-based Machine Learning, and Natural Language Programming in combination with human supervision in the form of clerical reviews. The adoption of such a multifaceted approach may require additional upfront effort in terms of configuration but it will provide the necessary trade-off in terms of lower operating costs should the tokenisation of LLMs remain comparatively high. So, instead of throwing expensive LLMs into every initiatives, explore different combinations that may yield greater return on investment in the long-term. 

Supporting perspectives

For complementary viewpoints on turning AI into business outcomes:

  • Wiktor Witkowski’s perspective on automating software development shows how AI agents can compress design, testing, and delivery cycles by an order of magnitude—without sacrificing quality. Read Wiktor’s perspective.

Digital Foundations Hub - for Cloud, Data & AI

Discover our video series

Contact us

Mariusz Chudy

Mariusz Chudy

Partner, PwC Poland

Tel: + 48 502 996 481

Paweł Kaczmarek

Paweł Kaczmarek

Director, PwC Poland

Tel: +48 509 287 983

Marek Chlebicki

Marek Chlebicki

Partner, PwC Poland

Tel: +48 519 507 667

Jakub Borowiec

Jakub Borowiec

Partner, Analytics & AI Leader, PwC Poland

Tel: +48 502 184 506

Michael Norejko

Michael Norejko

Senior Manager, PwC Poland

Tel: +48 519 504 686

Mariusz Strzelecki

Mariusz Strzelecki

Senior Manager, PwC Poland

Tel: +48 519 505 634