As an example, lets consider the choice between adopting a configurable ‘off-the-shelf’ Master Data Management (MDM) solution versus a highly customized one, using specific Azure tools. The decision is a classic example of multiple trade-offs which requires evaluation to determine most optimal outcome(s). More specifically we can evaluate the two decision to invest in either of the two MDM solutions i.e., configurable vs. custom, based on the trade-offs with transforming and migrating master data to Azure using tools like Azure Databricks and Azure Data Factory.
Michael Norejko, Data Engineering Lead, Cloud &Digital, PwC Poland
Michael, who has 15 years of experience building data and analytics capabilities to support digital transformation efforts emphasizes the importance of starting with a robust business case for the deployment of GenAI. Whilst this an obvious starting point it is important to note that the refinement of the proposed business case needs to be done across all core business functions and processes. Furthermore, this needs to be done in parallel to quick and iterative proof of concepts and the configuration of core data components including Enterprise Data Architecture, Master Data Management, and Data Governance just to name a few.
The "Off-the-Shelf" configurable solution:
Configurable solution(s) integrate well with tools like Azure Data Factory, used for for data extraction, as well as loading, and Azure Databricks for transformation using SQL and PySpark scripts. The out-of-the-box functionality accelerates the harmonization, standardization and enrichment of master data.
The success of the configurable solution is both dependent on and enables the cloud migration and data quality remediation efforts as the master, reference and transactional data needs to be standardized before and/or at the point of ingestion into the physical data model within the MDM solution.
Initially the configurable solution requires an upfront licensing costs and potential vendor lock-in, which competes for budget against hiring an internal data engineering team to manage the custom development.
The "highly customized" solution:
Azure Data Factory and Databricks provide the necessary functionality for building custom data pipelines, quality checks, and matching & merging logic tailored exactly to unique business rules, on a "pay-as-you-go" basis.
Furthermore, this approach requires expertise in data governance, data architecture, and engineering across the Azure technology stack. The software ("the custom code") is useless without the human expertise to build and maintain it.
The custom solution competes with long-term maintenance resources. While it avoids initial license fees, the long-term operational costs and the risk of "technical debt" can be substantial, competing for future budgets and talent that could otherwise be used for data analytics or AI projects.
Listing key trade-offs:
To help evaluate the options we can proceed to turn the list of key trade-offs into clarifying questions as follows:
| Trade-off | Configurable | Customized |
| Advantages | ||
| Strategic priority | Faster implementation, leveraging pre-built workflows. | Slower initial build time due to custom development. |
| Data complexity | Enforces industry best practices and data models. | Allows perfect alignment with unique business processes. |
Expertise
|
Vendor manages updates, bugs, and technical support. | Full internal responsibility for maintenance and evolution. |
| Disadvantages |
|
|
Flexibility
|
Rigid data models; customization can be difficult/costly. | High technical debt potential; requires constant internal management. |
| Cost | Significant upfront licensing fees. | Avoids vendor lock-in, but operational costs can be high over time. |
| Integration | Often easier with standard APIs/connectors. | Requires bespoke integration code for every system. |
To help evaluate the different options we would raise a series of clarifying questions, which as an example can be but are not limited to:
| Decision Points | Clarifying questions
|
Option 1: Configurable
|
Option 2: Customiz ed
|
| 1. Strategic Priority | Is "Time-to-Value" critical, or is "Unique Functionality" a competitive advantage? | Time-to-Value is a priority (Need MDM quickly with proven processes) | Unique functionality is a priority (Business has highly specific data rules) |
| 2. Internal Expertise | Do we have senior data engineers/architects capable of building & maintaining enterprise software? | No expertise exist (creates a dependency on Vendor support and third party system integrators) | Sufficient expertise exist (creates a dependency on Certified Azure Databricks engineers) |
3. Costs
|
Do we prefer predictable OpEx and lower initial CapEx with on maintenance costs? | Predictable OpEx (Licensing fees, with a clear support path)
|
Higher CapEx (Leverages existing cloud spend, but custom maintenance is ongoing) |
4. Data Complexity
|
Are our master data rules mostly standard (e.g., standard customer info) or extremely unique/niche? | Standard Data Rules (Configurable solution handles 80% of cases out-of-the-box) | Niche / Complex Data Rules (Requires bespoke logic difficult to configure in off-the-shelf solution) |
Making a choice whether to invest in cloud, data and/or software typically requires further evaluation in terms of the degree of customization and the associated trade-offs. If the goal is to have flexibility, and control over intellectual property, building a ‘custom’ solution will be a preferred option.
Customized solution are sometimes preferred over highly configurable options primarily when an organization needs the MDM solution to align perfectly with unique, highly complex, or niche business processes that off-the-shelf software cannot adequately support, even with extensive configuration.
Alternatively, by combining a highly customized Proof of Concept (PoC) with a subsequent RFP for a configurable solution can help to define the requirements before engaging vendors, and more specifically:
Identifies edge cases: most complex data harmonization and standardization rules that off-the-shelf software might miss.
Validates technical feasibility: confirms that the underlying cloud infrastructure can handle the data volume and processing requirements, independently of any vendor software.
Builds internal expertise: reinforces the expertise provided by the vendor, system integrators and external consultants.
Establishes a baseline: establishing a clear performance benchmark against which all vendor solutions in the subsequent RFP will be measured.
The PoC can then inform the Request for Proposal (RFP) by asking the vendor to demonstrate how their solution address the complex scenarios identified in the PoC, making the evaluation objective. In summary, proceeding with a customized PoC in advance of an RFP acknowledges the interdependence of "build vs. buy" decisions. It is not an "either/or" choice but a sequential one. This hybrid approach ensures the configurable solution is fit-for-purpose, avoiding the common trap of picking a generic solution that causes ongoing inefficiencies because it fails to address critical, unique requirements.