For a decade their team of Data Engineers attempted to adopt a Data Mesh type architecture, which promotes decentralized data ownership and data-as-a-product approach to enabling each department to create their own reports, dashboards, and applications. However, as is often the case the implementation of a decentralised architecture that facilitates self-service of Product, Customer and Vendor data, just to name a few, across multiple departments, required significant effort in creating domain-specific pipelines, interfaces, and scripts to transform and enrich the data to be ready for consumption. The Data Engineers were not able to respond to multiple requests from multiple business users whilst addressing data quality issues, which in turn led to shadow data steward roles being created across the key departments. This in turn led to ad hoc remediation efforts being committed to the profiling and cleansing of data as well as an array of data standards and naming conventions being adopted by different teams across the Group.
So, with the advent of Agentic AI and tools like Databricks Agent Bricks, AWS Bedrock and MS Fabric can CTOs, CDOs and other senior executives make the transition to more federated architecture that enables business users to utilise data assets and data products as and when they require on self-serve basis? The answer lies in how reliable the agents are and how easy it is to deploy them using the readily available toolkits.
Michael Norejko, Data Engineering Lead, Cloud &Digital, PwC Poland
Michael, who has 15 years of experience building data and analytics capabilities to support digital transformation efforts emphasizes the importance of starting with a robust business case for the deployment of GenAI. Whilst this an obvious starting point it is important to note that the refinement of the proposed business case needs to be done across all core business functions and processes. Furthermore, this needs to be done in parallel to quick and iterative proof of concepts and the configuration of core data components including Enterprise Data Architecture, Master Data Management, and Data Governance just to name a few.
By way of a quick recap, decentralised i.e. federated architectures assign data ownership to the business domains that understand the application of specific datasets best e.g., Sales, Supply Chain, Finance. These domains manage their data as data assets with defined data contracts, quality guarantees, and metadata, and data products (e.g., dashboards, applications, reports), making them easily discoverable and usable by other teams. However, organizations often face the following challenges just to name a few:
Manual effort: manually creating and maintaining the vast number of data pipelines, scripts, sources and data products for all data types is labour intensive.
Complexity in governance: ensuring consistent governance, security, and compliance across numerous data domains, and systems is a significant challenge.
Interoperability between domains: different domains use different naming conventions, have different formats, models which makes seamless data sharing and collaboration difficult.
Lack of technical expertise: business users often lack the technical skills to interact directly with complex data platforms.
Agentic AI systems, characterized by their reasoning can overcome these challenges by supporting teams of Data Architects, Engineers and Analysts. These AI agents can interpret, plan, and act to achieve specific goals with human intervention, effectively serving as an "Agentic Data Layer" as part of the enterprise data architecture. More specifically key contributions typically now include:
Data Product creation: agents can automate routine tasks like data ingestion, schema mapping, and pipeline setup, significantly reducing the manual effort required to create new data products.
Intelligent Interfaces: agents, such as "AI Genies" in Databricks, can provide natural language interfaces, allowing non-technical business users to query data and extract insights without needing SQL expertise.
Federated governance: agents can have built-in self-governance, automatically handling access requests and enforcing data policies, e.g., PII filters, ensuring compliance while maintaining domain autonomy.
Seamless Interoperability: agent mesh architecture, much like a service mesh for microservices, provides standardized communication protocols e.g., A2A, MCP that allow specialized agents to collaborate and exchange information seamlessly across domains.
Databricks' Agent Bricks is a low-code, modular framework that simplifies the process of building and deploying high-quality, domain-specific AI agents directly within the Databricks Data Intelligence Platform that provides:
Agent Development: users can specify a use case e.g., information extraction, knowledge assistant, custom LLM tasks, and point to their data, and Agent Bricks automatically builds and optimizes the agent system, eliminating complex prompt engineering and manual configuration.
Agent Optimization: automatically generate synthetic evaluation benchmarks and tests various models and configurations to achieve the best balance of quality and cost for a specific task. This ensures that the agents underpinning the data products are reliable and efficient.
Agent Integration: Agent Bricks agents are seamlessly integrated with the core Databricks platform, leveraging the Unity Catalog for data governance, security, and lineage, and MLflow for monitoring and observability. This provides the necessary control and transparency for enterprise-grade AI.
Multi-Agent Orchestration: The Agent Bricks: Multi-Agent Supervisor allows the creation of a coordinated system of agents to tackle multiple tasks across different domains.
The orchestration layer coordinates a selection of AI agents to support the extraction, transformation and loading of data assets and the governance of data products across different data domains, including but not limited to:
Orchestration agent i.e., multi-agent supervisor:
coordinates multiple specialized agents to tackle complex, multi-step tasks more effectively than a single model could alone. The supervisor acts as a manager, delegating specific sub-tasks to other agents and combining their results to produce a comprehensive final answer.
integrates various components, including other Agent Bricks agents (like extraction or knowledge assistants), Unity Catalog functions (tools), or external services, managing their collaboration and information exchange. This is crucial for building sophisticated workflows, such as handling customer inquiries that require intent detection, document retrieval, and compliance checks
Custom LLM agent:
enables users to perform custom text-based tasks like summarization, classification, content rewriting, or generating domain-specific content (e.g., marketing copy, reports). It avoids reliance on generic, off-the-shelf models by specializing the model to the domain data.
optimizes the underlying prompt engineering and model configuration, based on tasks defined by user in natural language
Extraction agent:
transforms large volumes of unstructured text data from documents like PDFs, emails, reports, or scanned forms into structured fields (e.g., names, dates, amounts, specific entities).
eliminates manual data entry and parsing logic by using AI to reliably extract information into a defined schema, typically a structured table in the Unity Catalog. The framework automatically generates evaluations to ensure field-level accuracy and consistency in the extracted data
In summary based on the initial observations the multi-agent orchestration has the potential to enable a more seamless flow of information across data domains, thereby facilitating the transition towards self-serve platforms based on decentralized architecture principles.
The convergence of decentralized architecture principles and agentic AI, enabled by tools like Databricks Agent Bricks, offers a path towards self-serving platforms. By enabling self-serving platforms through AI agents, CDOs and their teams mitigate silos in the form of shadow data steward roles and activities, which in turn will drive greater operational efficiency in D&A.