The Data Contract Pivot in Data Engineering | Issue #20
Community Highlights & Pivotal Pieces that Shaped the Data Contracts Journey
Hi, welcome to Modern Data 101! If you’re new here, this is an all-in-one community platform to enhance your data IQ. We curate insights, ideas, and innovations to help data practitioners and leaders squeeze every drop of value from their organization's data, all while fostering a global community of data trailblazers.
Editorial 🤓
Today, data moves at lightning speed within diverse systems and processes; thus, ensuring integrity, quality, and compatibility is paramount. That is where Data Contracts come into play.
Data Contracts can be understood as a formal agreement between the Data Producers and Data Consumers. It assures that data meets the prescribed prerequisites of quality, governance, SLA, and semantics and is fit for consumption by downstream data pipelines. Not surprisingly, contracts also play a key role in enabling organisations to transition into a unified data architecture to leverage true unified experiences across data governance, metadata and semantics.
To understand this better,
brought together data experts who have notably voiced their opinions and shared innovations around data contracts during the past few years. It was an honour to be among this crowd and address some important open questions. The piece covers fundamentals as well as advanced specifics. A few interesting bytes →Understanding Data Contracts with a Software Analogy
Chad Sanderson simply explains Contracts as APIs for data. If implemented at every node in a lineage graph supporting some data product, the schema checks they facilitate are equivalent to end-to-end integration testing.
It seemed almost the entire panel loved an API analogy. Typically, an API is an interface that defines how two systems communicate. They have rules that direct and govern the exchange of information. In a similar frame, contracts do the same for data. It lays down a set of quality, semantics, and governance SLOs as codified checks and governs the exchange of data between any two exchange points.
The Controversial Angle: Relationships 💛
This thread had tons of contrasting opinions. Jean-Georges Perrin shared that they didn't put relationships in the data contract, but it depends which level of relationship you want to consider. They didn't define relationships in the physical structure of the data contracts at PayPal because they didn’t need it at the time. But notably, he points out that if you want to have a data mesh of multiple data products, the contract should not describe the mesh and stay within the boundary of the data product.
We have a different approach to this. In our implementation, Data Contracts have a one-on-one relationship with data entities to preserve the independence of these entities at the contract level. By design, relationships would fall under the higher layer of the data model. If a contract for an entity includes relations, it would mean dependency of the entity on other entities to establish its SLOs. The contract and the entity it defines should be independently validateable. Interestingly, there are different folks with different implementations on both sides of this boat.
Contract-enabled Data Products & in essence, the Data Mesh
Andrew Jones of GoCardless emphasises that Data Products require an interface that defines the expectations around that data, the schema, the version, how it evolves, and so on – all of which are key parts of a Data Contract.
In fact, I'd say if those expectations are not defined, you don't have a data product. Or put another way, you don’t have a data product if you don't have a data contract around it.
A data product has four fundamental stages: Inputs, Transformations, SLOs, and Outputs, where SLOs are a bunch of requirements applicable to the other stages. Contracts come in handy at each stage.
Shane Murray, Monte Carlo shared that the value of Data Contracts largely depends on how data teams organise their teams and the overall data ecosystem. That said, design often makes the communication gap for decentralised data teams wider.
Different domains need to produce interoperable data products, or else Data Mesh can become Data Silo. He added that data contracts can instil trust in the underlying data product to encourage a “build once, use many times” model.
These were just the highlights from a handful in the panel, feel free to dive into the entire piece here from Chad Sanderson: Practical Data Contracts. The piece covers opinions across a broad range of data experts including Ananth Packkildurai, Andrea Gioia, Andrew Jones, Chad Sanderson, Jean-Georges Perrin, Sarah Floris, Shane Murray, Shirshanka Das, and yours truly.
Community Space 🫂
We’ve always had a lot of inspiration from the community and often source resonating ideas from the larger group. So it was high time to create a dedicated space for all the voices that have been shifting the needle and can help us go a step further in our data journey.
Data Contracts have seen amazing contributions from the community, especially from excellent data contract advocates such as
, Andrew Jones, and Jean-Georges Perrin! In a different take this time, we’ll highlight some of the biggest stirs in the Data Conversations that helped us take grand strides.The Rise of Data Contracts by Chad Sanderson was one of the most defining articles in the data contract evolution. He guides readers right from basics to understand the value of contracts.
A question I often get when talking about data contracts is ‘what happens to my existing pipelines? Do they go away?” In my opinion, NO.
How? Find out here.
Andrew Jones published his implementation story with Data Contracts, which was one of the biggest stirs to get the conversation started on Contracts.
It’s been 6 months since I introduced Data Contracts as our initiative to improve data quality at GoCardless. So, how are we getting on? What’s gone well, and what are the challenges we’ve faced?
Jean-Georges Perrin, along with the PayPal team, released the first widely known data contract specification, which stirred several data practitioners and drove the community several steps towards data contract standardisation.
In its current version (v2.1.1), PayPal’s data contract focuses on eight sections: demographics, dataset & schema, data quality, pricing, stakeholders, roles, service-level agreement, and other properties. As you can read, it does not limit itself to a mere schema.
Upcoming Data Events 📢
Sit-Down CDO | CIO, Singapore
Economists are predicting that the developed world is heading for a recession this year, which means that budgets will be tight and technology and data teams will need to do more with less. Enterprises that make the best use of their available data to provide real solutions to customer problems and are able to effectively secure that data, will have a real competitive advantage.
Speakers Include - Wouter Van Groenestijn(Head of Data and Analytics - EYP), Deep Thomas(Group CDO - Nomura), Samuel Koh(Director - Alteryx), Robin Fong(VP & GM - Denodo), and many more such experienced folks from the industry.
Event Date - 29 Aug 2023 | Mode - Offline | Register
2023 ANA Data, Analytics & Measurement Conference
The 2023 ANA Data, Analytics & Measurement Conference, presented by Google, will showcase the power of measurement and the value of a data-driven marketing strategy. The conference will go beyond the numbers – with a program set to inform and inspire you to harness all the data at your fingertips and bring the numbers to life.
Speakers Include - Andy Hasselwander(CAO - MarketBridge), Christine Turner(MD - Google), Marc Guldimann(Founder & CEOFounder & CEO - Adelaide), Kyle Shank(Director - The Hershy Company), and more such experienced people from business world.
Event Date - 21 - 23 August, 2023 | Mode - Chicago and Virtual | Register
Thanks for Reading 💌
Here’s a breather for you for sticking around till the end!
Follow for more on LinkedIn and Twitter to get the latest updates on what's buzzing in the modern data space.
Thanks for reading Modern Data 101! Subscribe for free to receive new posts and support our work.
Feel free to reach out to us on this email or reply with your feedback/queries regarding modern data landscapes. Don’t hesitate to share your much-valued input!
ModernData101 has garnered a select group of Data Leaders and Practitioners among its readership. We’d love to welcome more experts in the field to share their stories here and connect with more folks building for the better. If you have a story to tell, feel free to email us!