What comes to mind when you think Data Infrastructure? | Issue #17
Data Infrastructure in the context of Data Products
Hi, welcome to Modern Data 101! If you’re new here, this is an all-in-one community newsletter to enhance your data IQ. We curate insights, ideas, and innovations to help data practitioners and leaders squeeze every drop of value from their organization's data, all while fostering a global community of data trailblazers.
Editorial 🤓
Did you think ingestion, transformation, storage, etc., at first glance? I used to as well. In fact, most articles you’ll read on data infrastructure concern these capabilities and talk about them as separate facets. No doubt, these capabilities and more are ultimately what we want to achieve for data. There have been enough conversations, debates, and articles on these capabilities. But infrastructure is more than that.
For a long time, we were used to the concept of time as “linear”: one thing happens after another. But then we also found out how time could also be “circular”: one object comes back to its own state after travelling across different states.
We could start an interesting debate on physics here, but let’s move past that and perceive the analogy instead. We’ve always viewed the data infrastructure as linear - transformation after ingestion, storage after transformation, analytics after storage. We can always change the order, but the essence of how we perceive this is always linear. Given the dynamic and transient nature of data, we cannot restrict it within linear systems, where each object is, in a way, siloed and contained within its own boundaries. You can have multiple tools glued together to create a web of capabilities, but the ultimate direction is always from one tool to another, one layer to another.
This linear model seems very natural but doesn’t help data bear optimal fruits. Instead, the value of data is truly unlocked with unification. Your data stack must seem like a big connective tissue where each capability merges with another to seem as if it’s one limb.
The simplest example that would help understand this is a data plane that taps into polyglot data sources so that, for the user, it seems like they are querying one big database instead of having to write specific access patterns for each source. Moreover, this plane would be discoverable, secure, quality-approved and observable. All these capabilities seem as if they are part of one whole instead of separate tools and layers for observability, governance, and discoverability.
This is the bare minimum infrastructure for a data product.
The data infrastructure, from the perspective of data products, inherently supports a product-like experience. In other words, you can imagine that your data stack serves as a product instead of submitting to the chaos of complex pipelines and fragmented tooling. This infrastructure checks off key objectives such as unification, interoperability, polarity, reactiveness, ripple, and resonance. We’ve drafted a rather elaborate piece on the attributes that must be checked off by an infrastructure that intends to support data products. Read the full story here:
Community Space 🫂
We’ve always had a lot of inspiration from the community and often source resonating ideas from the larger group. This is a dedicated space for all the voices that have been shifting the needle and can help us go a step further in our data journey.
What we admire about the community is that it has specifically focused on the value of data products and the infrastructure behind them without getting carried away by the glamour of another new paradigm.
Deepak Bhardwaj recently talked about five ways to accelerate innovation with self-service infrastructure.
He defines self-service infrastructure as a principle that enables developers and other stakeholders to provision and manage the resources they need to create and deploy data products more quickly and efficiently without relying on IT or operations teams. It is a crucial part of a data mesh architecture, a decentralised and domain-oriented approach to managing data within organisations.
The five targets he sets for organisations with regard to infrastructure include:
Competitive advantage
Cost savings and resource allocation
Culture of innovation and speed
Customer satisfaction and revenue growth
Alignment of IT with business goals
In the second half of this piece, he elaborates on the implementation angle.
However, some pitfalls must be avoided while building or enhancing data infrastructures. Benjamin Rogojan talks about falling into the trenches of a duct tape data stack and how to avoid them.
This style of architecture remains a popular solution because it is so easy to set-up. Depending on how complex your pipelines are, how many you have and how often they are run. It really might not be the worse solution to have a few scripts that run your pipelines 1-2 times a quarter.
But.
Eventually, as your teams start needing daily updates, and live data feeds, you will have to switch over to a more mature and modern solution.
We also came across an amazing podcast after a recent conversation with Jon Cooke on a LinkedIn thread. Jon talks about the objectives of a data product and how exactly it meets business goals, which is ultimately the primary goal of the whole exercise.
In this podcast, he shares his perspective on data products and the infrastructure behind them in much more depth.
You get more fine grained as you go forwards, and the infrastructure needs to be able to support that…You break it down into individual technology components for the different use case with shared services like cloud, you actually get a much more flexible, much more Agile-type infrastructure that can actually cope with change and actually can cope with the full lifecycle.
Events 🎙️
Future Data Driven Summit 2023
Future Data Driven Summit is a free online event focused on Data Platforms. The summit aims to update attendees on the latest developments in Data & AI, DevOps, PowerBI & Visualization, Integration & Automation, and cloud infrastructure. This event is ideal for IT professionals, data engineers & analysts, data scientists, AI & machine learning engineers, business analysts, and developers.
Some amazing folks from the modern data space will be taking on informative sessions full of knowledge and new trends from the data space. To name a few - Taiob Ali(Microsoft MVP - Data Platform), Surbhi Pokharna(Charles River Development, Director, Cloud Data Platform Services), Minesh Chande(AWS - Sr Solutions Architect), and many more!
Event: 27 - 28 September 2023
Mode: Online
Gartner Data & Analytics Summit
Gain access to vendors at the forefront of technology. Don’t miss out on this unique opportunity to meet your peers, evaluate solution providers and explore what they can do for you.
View the conference brochure featuring special programs that will take your conference experience to the next level.
Speakers include some renowned names from the data space. To name a few - Alex Burton(CBA- GM business data products and data architecture), Emily Cornock(Suncorp - Manager: Core Data Platforms), Amjad Bashir(Chief Data Officer), and more!
Event - 31 July – 1 August 2023
Mode - Offline(Sydney, Australia)
Thanks for Reading 💌
Here’s a breather for you for sticking around till the end!
Courtesy of
🙌🏻Follow for more on LinkedIn and Twitter to get the latest updates on what's buzzing in the modern data space.
Feel free to reach out to us on this email or reply with your feedback/queries regarding modern data landscapes. Don’t hesitate to share your much-valued input!
ModernData101 has garnered a select group of Data Leaders and Practitioners among its readership. We’d love to welcome more experts in the field to share their stories here and connect with more folks building for the better. If you have a story to tell, feel free to email us!