Data Boundaries are Blurring in a Multi-Cloud World

Snowflake and Databricks join the Big Three in race to build cloud data platforms

Cloud Data Platforms

All of us in the tech field have been talking for years about the value of data. Data is everything to a company, especially as more and more of it is created every day. Without data, no company will survive, let alone thrive in the current digital age. During AWS re:Invent, both Swami Sivasubramanian and Adam Selipsky’s keynotes focused heavily on Data, AI, and ML, underscoring how data is the genesis of any and all innovation today. But like Charles Lamanna said during our recent Intelligent Applications Summit, “Big data is dead.” The world is moving from Big Data to Big AI, and companies need to figure out their AI strategy to truly take advantage of the data they have access to and be on the cutting edge of insights and predictive capabilities.

It is generally understood that companies have massive amounts of data, but what they’re doing with that data is more critical than ever. At Madrona, we believe the next generation of applications will be intelligent applications — especially if the companies developing them want to thrive. And these applications need access to data — not just any data — they need a mix of public data and real-time, actionable, proprietary data from across their company to feed directly into AI/ML models, creating a continuous learning system that produces powerful and predictive insights.

To accomplish this and remain competitive in the future, every company must accept that the boundaries around data are blurring. As much as companies want to standardize and unify their data storage solution, the reality is that every company must deal with multiple data storage systems. We live in a world with robust data infrastructure needs that require operating in multiple cloud data platforms — nobody has just one, though the providers would like you to.

Blurred lines

The evolution to cloud data platforms that allow companies to create these next-generation data-driven intelligent applications has only happened in the last 2 to 3 years. While every company wants one easy solution to solve all their data needs, that is not the reality. Companies will forever try to standardize, but even as they’re approaching standardization, a department will make a decision independent of another, or a company will be acquired that operates on a different platform. Each cloud data platform makes it easy to share data internally within their environments, encouraging customers to keep their data in one environment. But for companies operating on multiple clouds, much of the cloud-native tooling leaves companies with functional gaps. As different components in an environment are stitched together, too many places require heavy lifting with human intervention. With more data constantly being created (10x more data in the next five years, for example), the more these data platforms can remove friction and automate, the more sought after they’ll be. That is the new reality in which services need to be anchored as well — one that takes into consideration automation, governance, privacy, and security for data streaming in from multiple locations on multiple clouds and multiple data systems.

Convergence

While the Big Three cloud providers have cloud data platforms and offer services for them, surprisingly, we have five companies that have emerged as leaders in this nascent space. Over the last several years, Snowflake and Databricks have emerged as platforms for data-centric applications. They each came at the data application equation from different directions, but their focus on these business-critical applications has brought them into the battle for IT and development budgets alongside their partners and competitors — Amazon, Microsoft, and Google. Each of these five players started with their roots somewhere else. But they’re now converging in this cloud data platform category to build similar platform functionality for users.

Amazon has Redshift, Dynamo DB, and other data-related services sitting on top of AWS. Azure has multiple services and capabilities under the umbrella of its Azure Synapse cloud data platform. Google’s cloud data platform is BigQuery, which has multiple functionalities centered around it. Snowflake started out dealing strictly with data warehousing in the cloud. Databricks started out storing and processing large amounts of unstructured data in data lakes so users could do interesting things with data. But now they’ve both stepped into the other’s territory.

One thing all five of these companies need to consider is how to build and deliver a great developer experience and tooling that provides superior end-end application development experience. Whichever platform ends up doing that will have a disproportional mindshare from developers.

Winner

Humanity loves to try to predict who will come out on top. You can find arguments keeping track of which cloud provider is currently best all across the internet. You would think that because the modern data stack is centered around the cloud, the Big Three would be dominant in the cloud data platform category – but Databricks and Snowflake have shown that startups can grow up to compete in the big leagues. While the traditional cloud provider debate is still happening, people have now also moved to debating who will win in the cloud data platform piece of the cloud. But I would argue there is enough space and need for all five cloud data platforms to be the foundation of the next-generation, data-driven intelligent applications being developed — as long as the platforms remove the friction of moving data and the services available are multi-cloud, eliminating the prominent vertical siloes of yesteryear.

While there are opportunities for multiple players to have meaningful market share, what matters is that these platforms are now available to build the next generation of intelligent applications in the best, fastest way possible. That is the breakthrough that gave us automatic transcription services, programs that edit photographs automatically, and the ability to copy documents and have auto-generated summaries created of the content. We just had to wait for the technological advancements of the Snowflakes and Databricks of the world and for people to be comfortable operating their businesses in the cloud. But to ensure other layers of the modern data stack can be used to their full potential, enabling more and more applications to emerge from the nearly endless possibilities of AI and ML models, companies and customers alike must move on from the world of single-cloud data platform service.

What we have seen over the last two to three years is an evolution that will be as great if not greater than the emergence of the public cloud – and we are excited to see how these five and other visionary companies will continue to build the capabilities of the modern data stack, making it more accessible to a broader range of companies and users.

Related Insights

    Foundation Models: The future isn’t happening fast enough — Better tooling will make it happen faster
    Data Visionary Bob Muglia on the Modern Data Stack and Lessons from Snowflake
    Introducing the 2022 Intelligent Applications 40
    Intelligent Applications 40 winners 2022

Related Insights

    Foundation Models: The future isn’t happening fast enough — Better tooling will make it happen faster
    Data Visionary Bob Muglia on the Modern Data Stack and Lessons from Snowflake
    Introducing the 2022 Intelligent Applications 40
    Intelligent Applications 40 winners 2022