In the last article, we talked about the four cataclysmic consequences and causes of Data Debt or the four horsemen. The “four horsemen” is an interesting analogy for the pre-apocalypse and no doubt, the data world is internally in an apocalyptic stage, given the massive data debt that is being shouldered by weak data pipelines and data teams every day.
Data is scattered, unreliable, and of poor quality due to broken and unplanned architecture planning. Security is often compromised as there is no common integration plane that is centrally governed or logged. Data is taken out in instances, out of context, and siloed, and there is no effort to log back the insights or versions to the central structure that is operational across the organization. On a high level, these problems can be clubbed under data untrustworthiness, unmanageable data swamps, and high cost of maintenance.
I could drone on about the problems around the current data ecosystem, but let’s not open those gates. Instead, we can view these problems through a concise lens by revisiting the four horsemen or the apocalyptic causes behind the silently rupturing data ecosystem.
No accountability across a complex architecture
A wide gap between data and business teams
Fragile data pipelines that rupture with change
Delayed insights delivered in isolation
What is the one common solution that often comes to mind when solving these problems? Data Modeling. Data modeling might not solve each of these problems entirely, but it is a common thread between these four problems and a step closer to approaching a federated solution.
First, what is data modeling?
Data modeling is an approach that has saved data and business teams for decades, and whatsoever the hype or trends suggest, it is not going anywhere.
Data modeling is essentially a framework that interweaves all entities a business requires through logical definitions and relationships. This framework is materialized by combining the skills of both business and IT teams, where businesses define the logic and the IT teams are responsible for mapping the data accordingly.
Data modeling is materialized through three key layers:
Conceptual layer: Where business teams define the high-level business logic and relationships to design a user view
Logical layer: Where the business logic is linked to the physical data through well-defined mappings or logical representations
Physical layer: Where the actual data and data schema sits across data sources such as databases, lakehouses, or warehouses
What and where is the issue?
The issue seems to be that data modeling, even though a great framework, is not manifested in a systematic way. The ideal outcome of a perfect data model is achieving high operationalization of data for its specific business domain. But how this model is created and maintained is a whole different story.
Is the logic defined by business teams effectively mapped by data teams?
Do semantic changes from the business side instantly reflect on the data side?
How often do the data pipelines break due to poor change management?
How much effort and resources are invested to maintain high uptime for the data model?
The issues behind materializing a data model could easily translate into a long essay or even into a best-selling tragedy. But then, how come we say that data models are a savior when it comes to complex information architectures?
Data Models actually work
Data modeling has been a phenomenal lever to solve challenges that existed before its onset. It has achieved the feat of untangling high volumes of data by defining a structure and giving direction to IT teams to at least begin logical mapping in the right direction.
Do the processes to establish Data Models work?
With the present volume and evolving nature of data, achieving up-to-date data models is almost impossible with legacy processes and systems. Such processes and tools are slow and at best can accommodate a shallow layer of business logic.
Data modeling seems to be a lost art with the ever-expanding degree of information heat. Data models are easily crumbling under conditions of high data volume and pressure and the weak foundation of the way data models are materialized is not meant to weather the heavy data rain.
There are several causes behind this rupture, but for the scope of this article, we’ll focus the limelight on one: No Agile for the data ecosystem.
The growing need for Agile
Back in 2001, a few amazing minds came together to create the Agile Manifesto for Software Development. And we all know how it rapidly revolutionized the software industry by enabling all software products to become dependable, fast, evolving, and valuable to the business.
They achieved this by redefining objectives through a few tweaks along the lines of prioritization strategy (as is demonstrated in the image below: “while there is value in the items on the right, we value the items on the left more.”).
Image Credit: Agile Manifesto
These ideals were powered through the twelve principles of Agile development that software teams are still following two decades since their onset.
How does Agile manifest in the data world?
While the software industry benefited greatly from the Agile movement, there was no such initiative that benefited the data industry in such a revolutionary way. However, there is reason to believe that there are high hopes. Siloed initiatives to not just ideate but also define palpable practices to implement agile for data have started to pop up like little sparks.
The Agile mindset brings us full circle to Data Modeling. While data modeling is the ideal framework to operationalize data, the agile data movement is the way to materialize the framework into working models that:
Allows physical data to be in sync with evolving business logic
Manages the volatility of data with the least impact on the business side
Reduces the data-to-insights gap by bridging teams on either side
Abstracts complexity for businesses while preserving flexibility for IT
Agile + Data Modeling = A Stable Data Stack
How does agile translate from the software to the data industry to facilitate data models and clear out generations of data debt?
It defeats the root causes of a cataclysmic data world:
No accountability across a complex architecture
Simplicity is one of the key principles of Agile, and it can be translated into the data world by hiding complexity for business counterparts while preserving flexibility for IT teams. The simple agile approach cuts out any unnecessary complex branches and instead maintains a single composable plane for “right-to-left” or business-to-IT flow for data control and operations.
This framework also supports the inclusion of producers and consumers to define accountability. Who fixes a data table when something goes wrong? While it seems that organizations would obviously have a fall-back process for such issues, the hard reality is that no accountability and, therefore, no fall-back process is defined in most organizations.A wide gap between data and business teams
Agile methods or DataOps practices enable a semantic bridge between business and IT teams to achieve consistent pipelines that can convey evolving logic without much burden on data engineers and IT overheads for business teams. When business teams own and control both the definition and implementation of data logic, data engineers stop being bottlenecks and data pipelines stop being volatile.
This wide gap is not a recent problem but has persisted for decades. One of the reasons why a multitude of solutions recurrently failed was that there was no technical way to enforce the requirements of the business side to the data side. Agile thinking solves this by enabling a contractual abstraction over physical data that makes quality and conditions enforceable.Fragile data pipelines that rupture with change
Agile encourages sustainable development, which is the opposite of frequent pipeline breakages whenever any minute to major change comes along. Agile methods evolve rigid pipelines into pipelines that are capable of supporting continuous integration and delivery (CI/CD). “Continuous” is the key term here that is especially applicable to the data industry.
Any continuous flow can be simulated through a cushioning layer between business and IT teams that reduces the friction, or as I like to think, “puts out the fire” between the two ends. This layer would encompass metadata to facilitate continuous knowledge and a plug-and-play interface where business teams could define logic with simple SQL queries or UI interfaces.Delayed insights delivered in isolation
Siloed data is a major problem in the industry. While amazing insights could be generated on the analyst’s side, it stays restricted to, say, a spreadsheet and a couple of calls. After a few layers of these calls, emails, and files, the insight fades out into another isolated data point. It never reaches the customer-facing applications in an operational and real-time mode.
The Agile ideology is to deliver a good balance of speed and quality. A translation of agile in the data world is an enforceable set of rules all the way from the bottom to the top across data, information, and application layers: A common integration plane to tie all data sources, reverse ETL pipelines to tie insights from data applications, and a semantic bridge between the information and application layers.
Final Note
Data debt has been stacking up in existing architectures and data stacks, leading to higher operational and storage costs. The problems arising from data debt are more digestible and manageable with working data models that enable a swift bridge between business logic and physical data. While data models are a working and desirable solution, the methods to set up such models are broken and non-functional when it comes to managing the current volume and nature of data.
The agile data mindset and approach, which is already being implemented by several tech leaders and tech giants, is one of the ways to combat the challenges around people, processes, and tools in the data industry. The way Agile has revolutionized software is a proven testimonial to its capabilities. Agile methods are capable of supporting finer and faster methods to develop and maintain operational data models that could eventually erase most of the data debt that is nibbling away at valuable resources.
Since its inception, ModernData101 has garnered a select group of Data Leaders and Practitioners among its readership. We’d love to welcome more experts in the field to share their story here and connect with more folks building for better. If you have a story to tell, feel free to email your title and a brief synopsis to the email-ID below: