Data Enginerring

Rethinking Data Domains

adding the Transaction Level

Gustavo Gama

Dec 26, 2024 • 6 min read

AI Generated

As organisations expand and innovate, conventional methods of managing data domains need to be rethought and redefined, paving the way for more dynamic and efficient solutions.

Data Silos vs. Data Mesh:

In today's hyper-connected world, businesses require smooth data flow—not merely for analytics but also at the transactional level, where real-time decisions occur. The left shift movement on data is here. Let's delve into how data silos and data mesh transform data domains, facilitate data products, and enhance the entire data lifecycle.

Data Silos: The Current Transactional Reality

Data silos are a reality for many organizations. Here, data domains are often confined to operational systems, each optimized for specific functions like sales, finance, or supply chain. While these silos serve their purpose locally, they create barriers at the organizational level.

The Problem at the Transaction Level

Disconnected Operations: Systems operate independently, making unifying data for transactional workflows difficult.
Delayed Insights: Real-time decisions suffer when data isn't shared efficiently.
Duplication: Overlapping but inconsistent data between silos increases operational risks.

Data Mesh: A Vision Beyond Silos

The data mesh concept transforms the paradigm by viewing data as a developed, managed, and utilized product across decentralized domains. At the transactional level, this model ensures smooth data access for operational workflows and decision-making.

The Data Mesh Vision

Decentralized Domains: Each domain (e.g., customer, inventory, payments) is responsible for its transactional data but adheres to shared organizational protocols.
Data as a Product: Transactional data is not just for internal use but is designed for consumption by other teams, APIs, or systems.
End-to-End Lifecycle Management: From creation and validation to sharing, use, and archival, the lifecycle of data is a core responsibility of each domain.

My Understanding of Data Domains

I need to grasp the concept of data domains primarily because I prefer efficiency. I do not like to do things twice or more, and data duplicated is a synonym for duplicated work, duplicated space, duplicate maintenance, and everything else duplication.

To avoid this, let's categorize the domain types into business data domains, where a logical representation of data, data structure, data relations, context, and governance are defined, and tech data domains, where the data is persisted, organized, maintained, and secured.

Business Data Domains

These domains represent data from a business perspective that is aligned with the organization's core functions and operations.

Technical Data Domains

These domains represent the best technical implementations to support the data and address the requirements of business data domains. To say this clearly, a single solution rarely fits all the requirements.

Bridging the Gap: Mapping Architecture

A mapping architecture that relays between the business and tech data domains is a solution to address this concept.

Key Principles of a Mapping Architecture

Domain Mapping
- Create a one-to-one or many-to-one mapping between business data domain entities and tech data domain schemas.
- Define clear ownership for business data domains and an independent authority for the data operations led by technical data domains to reduce ambiguity and fit responsibilities.
Standardized Interfaces
- Use APIs, Catalogues, Lineage, and schemas to expose the data domains in formats that align with business domain needs.
- Avoid direct dependencies between systems, ensuring scalability and adaptability.
Minimize Transformations
- Prioritize schema alignment and data contracts over transformations to preserve data integrity.
- Reduce unnecessary pipelines by designing technical domains that natively support business use cases.
Federated Governance
- Implement cross-domain policies to ensure consistency without creating bottlenecks.
- Automate metadata capture to maintain traceability across domains.

How This Approach Avoids Common Issues

Duplication

By aligning business and technical domains through mapping, redundant datasets are eliminated, reducing storage and maintenance overhead, and most importantly making data truth clear to everyone.

Complex Pipelines

Mapping architecture simplifies workflows by exposing clean, pre-validated data products, reducing the need for extensive pipelines.

ETL Operations

Fewer transformations and extractions are needed as data is created and stored in a format aligned with business needs, enabling direct consumption.

Data Type Changes

Minimizing transformations ensures that data remains consistent across its lifecycle, reducing the risks of mismatched data types or corrupted insights.

How Data Mesh Produces Data Products

A data product is a reusable and reliable dataset that serves specific operational or transactional needs. Here's how the mesh enables this:

Key Goals of a Data Product

Discoverability: Each domain ensures its data products are easily searchable and well-documented.
Quality: Automated validation and testing ensure data accuracy.
Real-time Accessibility: APIs enable integration into transactional systems.
Governance: Shared standards ensure interoperability without stifling domain autonomy.

Key Components of a Data Product

Schema Registry
- What It Is: A central repository that defines every product's structure, rules, and data format.
- Purpose: Ensures data consistency, validation, and clarity across domains. A schema registry allows for automatic data processing without manual intervention.
- How It Helps: A clear schema provides a consistent structure for the data, reducing the risk of errors during consumption or integration and ensuring that the data is usable across different systems and use cases.
Data Catalog
- What It Is: A searchable, organized inventory of all available data products and their metadata.
- Purpose: It makes it easy for business and technical users to discover, access, and understand the data products within the organization.
- How It Helps: With a data catalog, users can quickly find the right data product for their needs, understand its structure, and learn how to use it effectively. It also helps maintain transparency and ensures that no data product is overlooked.
Data Lineage
- What It Is: The tracking of data as it moves through the data ecosystem—from its origin in the technical domains, through transformations, and into business domains for consumption.
- Purpose: Provides transparency into how data is created, processed, and used, ensuring all stakeholders can trace the data's journey and understand its history.
- How It Helps: Data lineage helps ensure the trustworthiness of data by showing its source, transformations, and how it's been handled over time. The lineage is vital for effectively identifying and resolving errors, along with rectifying any data mismatches or inconsistencies that may arise.
Governance & Security
- What It Is: Policies, rules, and mechanisms ensure data is protected, managed properly, and compliant with legal and regulatory standards.
- Purpose: Establishes access control, data quality, and privacy standards, ensuring that data products are secure and comply with internal and external regulations.
- How It Helps: Governance and security frameworks allow data to be shared across the organization without risking security breaches or compliance violations. It ensures that sensitive data is handled properly, with role-based access and encryption where needed.

Transactional Data Product Examples

Customer Orders: Real-time visibility into order statuses across sales, fulfillment, and customer support.
Inventory Levels: Accurate, up-to-date data for supply chain optimization.
Payments: Streamlined reporting and reconciliation for finance and operations.

Data Lifecycle in a Data Mesh

The data lifecycle is central to the success of a data mesh. Here's how it unfolds:

Creation: Data is generated at the source by transactional systems.
Validation: Domains apply quality checks to ensure integrity.
Publishing: Data is already standardized or is harmonized into standardized products and shared through APIs or data platforms.
Consumption: Other domains and systems consume the data, enabling workflows and decision-making.
Evolution: Data products are continuously updated, enriched, or deprecated as needs evolve.

Data Democratization

In traditional data-siloed architectures, access to data is typically limited to specific teams or technical roles due to the complexity of the data and its transformation processes. In a data mesh, by contrast, the democratization of data ensures that data products are accessible to everyone in the organization who needs it, regardless of technical expertise. Here's how data democratization works:

Self-Service Access:
Through clear, standardized APIs, business users and teams can directly access the required data products without relying on IT or data engineers.
Role-Based Permissions:
Even with broad access, security, and governance can be enforced by allowing permissions and access levels based on roles. Business teams can work with data directly while respecting privacy and security boundaries.
Collaboration Across Domains:
The ability to access data from various domains fosters collaboration between business teams without putting stress on the technical teams, enabling faster decision-making and more agile responses to changes.

Pitfalls to Avoid in a Transaction-Level Mesh

Misaligned Ownership: Without clear domain responsibility, data quality suffers.
Underestimating Interoperability: APIs and standards must be robust to handle real-time transactional needs.
Governance Overload: Balancing autonomy with control is key—over-regulation can stifle innovation.
Neglecting Culture: Transitioning from silos to a mesh is as much about mindset as it is about technology.

Visual Takeaway: Data Silos vs. Data Mesh

Aspect	Data Silos	Data Mesh
Data Domains	Isolated by systems	Decentralized but interconnected
Transaction Support	Fragmented, delayed, overused and error-prone	Real-time, seamless, minimized with schema-first design
Data Lifecycle	Ad-hoc and opaque	Structured and transparent
Data Products	Siloed reports	Reusable, API-enabled, domain-driven
Data Democratization	Restricted to specific teams or roles	Broad, role-based access with APIs and self-service tools

Final Thought:

Data domains, Business and Tech perspectives, Data Mesh at the transactional level isn't just about breaking silos—it's about reimagining data as an enabler of real-time operations, collaboration, and innovation. By focusing on the lifecycle of data products, organizations can unlock agility, reduce duplication, and stay competitive in a fast-paced digital world.

By focusing on Data Democratization in the Data Mesh approach, organizations can break down barriers to data access, empowering business users to make more informed decisions without the bottlenecks of traditional data management.

Ready to leap silos to a mesh? Let's map out the journey!