Over the past decade, the landscape of data tooling and infrastructure has transformed dramatically. As the founder of a cloud data infrastructure company since 2009 and a meetup community for data engineers in 2013, I have witnessed the evolution of this community long before "data engineer" became a formal job title. This unique perspective allows me to reflect on the lessons learned from the past and how they should influence the development of the new AI era.
In the realm of tech anthropology, 2013 marked the transition from the "big data" era to the "modern data stack" (MDS) era. During the big data period, the prevailing belief was that more data equated to better insights, allegedly holding the key to unlocking new business value.
As a strategic consultant for a major internet company, I was once tasked with developing a strategy to analyze the massive data output from billions of daily DNS queries to uncover a potential $100 million insight. Unfortunately, despite our efforts, we were unable to identify any such insights within the project's limited timeline. This experience reinforced a crucial lesson: while storing vast amounts of data is relatively straightforward, extracting meaningful insights is a complex and resource-intensive endeavor.
Recognizing this challenge, companies rushed to strengthen their data infrastructures, driven by the mantra that insights could only be generated if their data systems were optimized. This rush led to an explosion of data tools, as vendors claimed to offer the missing piece of a complete data stack that could yield those elusive insights.
The term "explosion" is not used lightly; according to Matt Turck's 2024 MAD (Machine Learning, AI, and Data) Landscape, the number of companies offering data infrastructure tools surged from 139 in 2012 to 2,011 this year—a staggering 14.5X increase.
The Challenge of Tool Overload
Several factors shaped today's data landscape. Many enterprises migrated their on-premise workloads to the cloud, with modern data stack vendors providing managed services designed for reliability, flexibility, and scalable solutions.
However, as companies expanded their toolsets during the zero interest rate policy (ZIRP) period, significant challenges emerged. The complexity of utilizing multiple disparate tools, integration difficulties, and underutilized cloud services raised doubts about whether MDS could deliver on its promises.
Many Fortune 500 companies invested heavily in data infrastructure without a coherent strategy for realizing value from that data. The allure of collecting a wide array of tools led to redundancies, as teams within the same organization often leveraged overlapping platforms, such as Tableau and Looker, leading to inflated costs without corresponding benefits.
Despite the eventual bursting of the ZIRP bubble, the MAD landscape continues to expand. Why is this?
The New AI Stack
Many data tooling companies, well capitalized during the ZIRP era, remain operational despite tighter enterprise budgets and decreasing market demand. A significant factor is the strong interest in AI, which has birthed a new wave of data tooling without undergoing a substantial market consolidation from the previous era.
The “AI stack” represents a fundamentally new paradigm. While traditional data stacks were designed for structured data, the new wave of AI thrives on massive unstructured data sets—text, images, and video. Additionally, generative AI models distinguish themselves from older, deterministic machine learning models by producing varied outputs even from unchanged inputs, as seen with tools like ChatGPT.
Given these differences, developers must adopt new methodologies to evaluate and monitor AI model outputs, ensuring ethical governance and effective integration. Key areas of focus should include agent orchestration (inter-model communication), the development of specialized models for niche use cases, and innovative workflow tools for dataset curation.
Numerous startups are already addressing these challenges, leading to the emergence of cutting-edge tools within the new AI stack.
Building Smarter in the New AI Era
As we navigate this new AI era, it's crucial to acknowledge our past. Data serves as the foundation of AI, and the myriad of tooling options available today have paved the way for treating data as a vital asset. Yet, we must ask ourselves how to avoid the pitfalls of past excesses as we forge ahead.
One approach is for enterprises to clarify the specific value they expect from any particular data or AI tool. Overcommitting to technology trends without a strategic purpose can be detrimental, especially as the AI buzz consumes both attention and budgets. It’s essential to prioritize tools that demonstrate clear value and measurable ROI.
Founders should also be cautious about creating "me too" solutions. Before pursuing a new tool in a crowded market, they should evaluate whether their team possesses unique insights and differentiated expertise that would truly add value.
Investors, too, need to critically assess where value will aggregate across the data and AI tooling stack before investing. Relying solely on a founder’s pedigree from prestigious companies can lead to an oversaturated market filled with undifferentiated products.
A compelling question was posed at a recent conference: “What is the cost to your business if a single row of your data is inaccurate?” This prompts businesses to establish a clear framework for quantifying the value of data and data tooling within their operations.
Without this clarity, no amount of investment in data and AI tools will resolve existing confusion.
Pete Soderling is the founder and general partner of Zero Prime Ventures.