Core Data Components for deploying AI on the Cloud

hero image

If GenAI is supposed to transform how work gets done, what does that look like for the people who actually run the business—finance partners, marketers, supply chain managers and so on? Think about the processes they oversee every day including Order to Cash, Purchase to Pay, Record to Report just to name a few. These processes often riddle with manual reconciliations, ad hoc interventions and frequent handovers. Previously we discussed how strategic initiatives like Net Revenue Optimization can be enhanced by GenAI; so now lets consider how GenAI can improve and automate key business processes from the perspective of business end user.

In this short video Michael provides an overiew of core data components as part of the solid data foundations with a focus on Data Governance, Master Data Management and Data Quality.

50%

of Master Data Management programs fail to meet business expectations due to complexity and lack of governance.

Source: Gartner, 2023.

The problem statement

Organizations want GenAI to fix broken processes, but often this is done in a siloed manner across multiple departments without the necessary solid data foundations. Heads of departments attempt to deploy LLMs on fragmented data, inconsistent taxonomies, and without relative measure of performance. Based on our observations from a selection of recent projects where the client was attempting to transform key business processes and enable strategic initiatives the outcome is more often than not the same; great proof-of-concepts that stall at production, no way of comparing the results, and mounting costs.

The author’s perspective and expertise

Michael Norejko

Michael Norejko, Data Engineering Lead, Cloud &Digital, PwC Poland

Michael, who has 15 years of experience building data and analytics capabilities to support digital transformation efforts emphasizes the importance of starting with a robust business case for the deployment of GenAI. Whilst this an obvious starting point it is important to note that the refinement of the proposed business case needs to be done across all core business functions and processes. Furthermore, this needs to be done in parallel to quick and iterative proof of concepts and the configuration of core data components including Enterprise Data Architecture, Master Data Management, and Data Governance just to name a few.

Observations and learnings from recent projects

As an example, lets consider a scenario where a EUR 50 billion consumer goods and manufacturing business discovered 4% commercial leakage in the form of obsolete stock, missing vendor discounts, and phantom orders that their Finance department typically does not notice until months later. Now 4% on EUR 50 billion turnover business is a substantial value at risk. In such an instance we can use LLMs to parse hundreds of thousands of documents including goods receipts, purchase orders, invoices, and contracts that provide inputs into the key business processes to identify discrepancies in the form of missed discounts. But this is just one example. So to create a relative measure of ROI, the proposed application of GenAI has to be contextualized across multiple processes and use cases, not just with a focus on Order to Cash within the realm of Supply Chain Finance. That’s why it’s important to create an ontology of your business in the form of key business domains, functions, processes and use cases which in turn will inform the demand for core data capabilities.

Without standardized, harmonized, and enriched master data, well defined meta data, and readily available transactional data GenAI will remain a proof of concept or, at best, a minimal viable product with negligible impact. So, to scale GenAI beyond a mere proof of concept it is important to ensure that core data components are aligned to key business use cases across key business functions and processes. 

More specifically, start with a strategic review with a focus on identifying pockets of value by raising questions such as – do we have instances of obsolete stock, missed discounts, pricing errors, duplicate orders? If so, what would a 1% reduction in leakage mean for your P&L within two quarters?

Proceed with identifying, documenting and prioritizing business use cases across key processes and functions. As an example, from the perspectives of Finance, Supply Chain and Procurement with a focus on Purchase-to-Pay as one of many key business processes:

  • Finance: automate the manual reconciliation of invoices to purchase orders to detect missing discounts.
  • Supply chain: identify slow-moving stock by monitoring levels of supply against demand signals including macro-economic variables.
  • Procurement: identify discrepancies in key contract terms, including duplicate vendors, products, services, parts, components and materials.

Only then consider configuring core data components, including but not limited to: 

  • Integration end-points and pipelines: for efficient extraction and loading of data using tools like  Azure Data Factory in increments and/or large single migrations.
  • Transformation scripts: to transform transactional, master and meta data both from within and outside of the organization using tools like Databricks.
  • Master Data Management: to match & merge Product, Customer, Vendor and Employee master records into golden records as the single source of truth.
  • Semantic layer for data modelling: to provide hierarchies, and attributes down to lowest level of granularity that can be used by all key departments in a consistent manner.
  • Data lineage, and cataloguing: to make it easier for business end user to discover and gain access to data assets and products.
  • Data quality management: to remediate poor data quality by profiling and cleansing data using remediation workflows, rules, metrics and actions.
  • Access, privacy, and security: role-based access, policy enforcement, and auditability designed to meet regulations and internal standards.
  • Release management: to ensure that solutions like GenAI are deployment in secure environments from Development through to Production promoting version control and human-in-the-loop review.

A word of caution

  • Avoid building components with no utility: migrating data to the cloud and cleansing data across all sources is not necessary considering that typically 80% of decisions are made based on 20% of data.
  • Beware of novelty bias: compare LLMs alongside much simpler and more auditable rules-based Machine Learning and Natural Programming techniques, which will reduce complexity and lower costs.
  • Imbed governance: create clear roles and responsibilities for data stewards, data owners, domain leads, business process owners and delivery team.
  • Validate with business end users: identify business end users across key business functions who will own key business use cases to validate the proposed solution in pre-production environment with clear responsibilities from the outset.

Concluding point

Deploying GenAI without a clearly defined ontology of business functions, processes, and use cases will limit its scalability just as much as having lack of core data components. GenAI needs to be designed, developed and deployed within the context of demand from the wider business as well as supply from IT in the form of core data components. In summary, avoid building sub-optimal solutions that do not address key business challenges.

Michael presents a simple model showing the interaction between key business use cases, key business processes and core data components necessary to enable GenAI to drive business value

Supporting perspectives

For complementary viewpoints on turning GenAI into measurable outcomes:

  • Adam Rogalewicz’s point of view on applying GenAI to Strategic Initiatives like Revenue Optimization looks at pricing, offers, and trade terms in depth. Read Adam’s perspective.
  • Wiktor Witkowski’s perspective on automating software development shows how AI agents can compress design, testing, and delivery cycles by an order of magnitude—without sacrificing quality. Read Wiktor’s perspective.

Digital Foundations Hub - for Cloud, Data & AI

Discover our video series

Contact us

Mariusz Chudy

Mariusz Chudy

Partner, PwC Poland

Tel: + 48 502 996 481

Paweł Kaczmarek

Paweł Kaczmarek

Director, PwC Poland

Tel: +48 509 287 983

Marek Chlebicki

Marek Chlebicki

Partner, PwC Poland

Tel: +48 519 507 667

Jakub Borowiec

Jakub Borowiec

Partner, Analytics & AI Leader, PwC Poland

Tel: +48 502 184 506

Michael Norejko

Michael Norejko

Senior Manager, PwC Poland

Tel: +48 519 504 686

Mariusz Strzelecki

Mariusz Strzelecki

Senior Manager, PwC Poland

Tel: +48 519 505 634