In this week’s episode, which is leading up to our Intelligent Applications Summit on November 2nd, Soma speaks with Bob Muglia. Bob has thought deeply about the Modern Data Stack, and they speak about it here — what is needed in the data stack to enable Intelligent Applications (or data-driven apps as Bob calls them) and the opportunities for new companies to innovate. Bob is also well known as the CEO that took Snowflake from a promising application for the public cloud to success by focusing on the problem of scaling a data warehouse in the cloud, and building product and sales teams that could win the hearts and minds of their loyal customers. Bob talks here about the early days after he joined Snowflake and what he did to get a product to market, how partnering with the big public cloud providers worked- and had its challenging moments. It’s a great view into how both Soma and Bob are thinking about the future of enterprise data and intelligent applications.
Note: Bob is on the board of IA40 company, Fivetran, which was the focus of a recent podcast, and he is chairman of the board of FaunaDB and RelationalAI, Madrona portfolio companies. Madrona holds shares in Snowflake,
This transcript was automatically generated and edited for clarity.
Soma: Bob, good afternoon. It’s fantastic to have you here with us. I’m very excited to talk to you about the future of data and data-driven and intelligent applications
Bob: Good to be here with you, Soma.
Soma: Absolutely Bob. And as you know, we at Madrona have been longtime believers in ML/AI and, more importantly, how do we apply ML/AI to different enterprise use cases and to different scenarios to be able to build what we refer to as next-generation intelligent applications.
And I was thinking about this, and as I was getting ready for the session, I couldn’t think of a better person to have this conversation with — and let me tell you why I say that. First of all, let me introduce you. Here with me is Bob Muglia, the former CEO of Snowflake, and prior to that, a long-term senior executive at Microsoft. He has done a variety of incredible things in his career — a lot of it data-driven, and this is where I come back to why I think you are the best guy for this conversation. Bob. Ever since I’ve known you, and I’ve known you for almost 30 years now, I think about you as a data guy first and foremost.
You go back to when you started your career at Microsoft, you were a Product Manager in SQL server. And through the following decades, I’ve seen you do something or other with data in one way, shape, or form. After leaving Microsoft, you decided to take on the range of Snowflake when it was a pre-product and a pre-revenue company. You spent over six years at Snowflake, growing it literally from zero to hundreds of millions of dollars of revenue. And I think you laid a lot of the foundation for Snowflake to be the leader that it is today in the cloud data platform world. After your stint at Snowflake, you’ve been working with half a dozen or more companies, startups, I should say, private companies as an investor, adviser, and board director.
The one common thread among all these companies is they all are doing something or other with data. I just look at the body of work behind you, and I say, “What a fantastic opportunity for us, and by extension, our audience to be able to hear from you about the future of data and how you see the world of intelligent applications evolving.” With that as a backdrop, I thought let’s just dive into some questions to kickstart this conversation. Let’s first go back to your days at Snowflake. As I just finished mentioning, when you started at Snowflake the team was still working on a product.
The product wasn’t in the market, and you went through this sort of what I call the “growing pains” of birthing a product and bringing it to market, thinking about the business model and getting it to scale. But along the way, I’m sure there were a handful of what I call “defining moments.”
Where you had to make a decision, or you had to think about something that literally laid the foundation for why Snowflake is who is today and what it is today. Can you think about a couple of those defining moments and just share with us what those are and how you navigated through those?
Bob: Sure. There were a couple of things early on that happened in the early time of Snowflake. You’ve got to go back to the period we’re talking about, which is 2014, 2015, where it was the early days of the cloud. Really, AWS was the most viable cloud at the time. Azure was still very early, and GCP was in some ways even earlier. it was a very different time. And a lot the of the focus of Snowflake was really about changing that. But a big part of it was also getting the product to the market because we were fortunate in the sense that we could scale to data of any size basically and as many users as you wanted to throw on it. And only having one copy of data for the whole organization instead of having to have copies scattered hither, tither and yon, which was the default at the time. So, it was a revolutionary product, but it still had to come to market.
And it was funny because when I started at Snowflake, the founders, said to me that their plan was to make the product generally available to enterprises in about six months. That was in June of 2014. I knew that somewhat unlikely, frankly, from all of our experience. You’re smiling, Soma, so you know what this is like in terms of developers in the early days. And I watched them for a period as they went through a couple of these two months’ milestones that were doing. And I had this observation that during that two months, they said they were going to do a bunch of things, and basically, none of them got finished during that period of time.
Other things did get done. They were working hard. They were certainly working hard, but it wasn’t like they were really working towards some well defined goals. One of the things I focused on was really trying to help bring some rigor and discipline to what it means to be an enterprise class product , , over the period of the next year or so, a little bit less than that, we went through a process whereby we really defined what general availability meant, and we went through a process of, getting, focusing on getting those tasks done.
I literally turned weekly team meetings into a project review only thing the sales people cared about was the status of the product as we don’t everybody cared about. And, we went through focused effort and got the product shipped in the middle of June. And that was really the beginning the beginning of the Snowflake experience.
The another thing that, we had, there’s always these sort of things that happen to companies in their early days that they, that, that happened to them and they survive or not. one of the more challenging things we went through early on was inside Snowflake, the transactional heart of the product is a technology called Foundation DB.
And Foundation DB the time in 2014 was a company, it was actually a sister company of one of our VCs, Sutter Hill. So we knew the company well. But I was in the process of negotiating agreement and I was able to negotiate an agreement. If anything happened to that technology, if it went off the market, that we had access to it through a code escrow, of course we hoped never happened, but it turns out it did happen.
And in seven, eight months later, Apple bought Foundation DB and immediately took the product off the market and made it unavailable to which is our worst nightmare. And, fortunately we were running it and we’re a bunch of database people and the source code escrow actually worked and we got the product through that we actually had to do what it took to actually learn how to be good at bugs and Foundation DB.
And pretty much had to do that all from scratch. And that was a very big deal. Had that, we not had access to Foundation DB, there really was no other good choice. Not at that time. Products like Fawn and Cockroach didn’t exist. And I don’t know what we would’ve done, to be honest with you.
I don’t know. I honestly don’t know what we would’ve done. But we survived that and now, fortunately, Foundation to be as open sourced and that’s very healthy. It’s actually healthy and Snowflake’s a major, actually a major contributor to it. So it’s actually a really good story, but it was a tough one.
So there’s, things like that happen in your process. I would say the other thing is just customers, I focused on all the time and being successful with customers. And, we didn’t lose customers basically speaking because didn’t take on things we couldn’t do, times would turn people down because we couldn’t do the work they wanted us to do occasionally.
And and really focused on the success of working with costumers.
Soma: That’s super helpful, Bob. I also remember that there was a lot of talk during the initial days of Snowflake about hey, we should think about like, you separating out computer and storage and that could enable us to get to the level of scale and economics, right?
That would be good for our customers and hence for us kind of thing. Any sort of color on, on, like how that came about.
Bob: think architecturally that was always part of the idea of separating compute and storage was always part of the design demo. And Terry had, the architecture at Snowflake has something called Global Services, that manages the metadata and does the query and does the query planning.
And then they have an execution processor that runs the actual, is the virtual warehouse and runs the actual SQL jobs. And now I believe it’s running the Python jobs too and, other languages. So it’s become multi-language really. And, the evolution of that whole thing changed dramatically over time and how we stored the metadata and everything and that separation of the metadata was a fundamental component of Snowflake.
But the way we did it certainly changed over time and I think, we were able to stay ahead from a scale perspective. I always said was interesting because, we were just ahead of our customers. In the early days, they were chasing our tail and scale in a variety of ways. And we, it was always, we were always working hard to stay ahead of customers, so customers had a great experience.
Soma: Bob, I do want to take this opportunity to say thank you to allowing both Madrona and I to be able to invest in Snowflake and be a part of the journey along with you and see it come to scale
Bob: …you guys were super helpful too. At the time we were opening Bellevue, the Bellevue office, which was very, I think, very pivotal office for Snowflake. of course Madrona has such strength in the Seattle area,
Soma: . I’m glad it worked out all well. But when I got involved in Snowflake, one of the things I heard, a fair bit is you got all these like big cloud platform providers, whether it’s AWS or Azure or GCP, all wanting to have their own solution in the space and how they’re going to , you crush Snowflake and how a fledgling startup can just not compete with any of these, at scale, massive cloud providers , but somehow Snowflake navigated through that and just reached a level of scale and success and is literally a leader in the cloud data platform world today.
, two questions that I want to ask you. In that context, how did you feel about it when everybody was probably telling you, or you heard the same thing about “Hey, all these big guys are there, they got their own, data warehousing solution, the cloud”, and how did you feel like confident that Snowflake was going to be able to navigate through it?
But then the second part, which I want to focus on is the partnership that Snowflake had with all the cloud providers. Because on the one hand, you could argue like, if if there is a customer that know goes with Snowflake on Azure, it is still a win for Azure. On the other hand, if you think about Snowflake running on AWS, Snowflake is competing with Redshift on AWS, right?
So you got this what I call a cooperative, in mind, midway, you are, partnering with the platform, but you’re competing with the service. How did that whole landscape, work out for you?
Bob: Yeah. To, so in terms of how we competed with the big cloud vendors, we had a better product. It was really that simple. If the product has, I said many times, if Snowflake was 10% or 20% better than Redshift, snowflake wouldn’t have gained any material share, but it was many times better.
I It worked in situations where Redshift didn’t work and Redshift is a very good product, it, paved the roads for cloud data warehouse. And for that I’m eternally grateful.
By bringing out Redshift very early in the marketplace, which they did. It was, the thing is it was a on-premises product brought to the cloud, so it didn’t really take advantage of the cloud. And what it was cheaper. It was definitely cheaper than anything you could buy. It on premises.
, but it didn’t ultimately scale.
People in the cloud world particularly wanted it to scale. And we saw, I wonder, in fact, one of my earliest, my earliest first salespeople, Vince Trada, I remember vividly, I was on the subway with him in New York in February or so of 2015. And we’re seeing customers who were talking about adopting Redshift.
And Vince said to me, “Bob, don’t worry about this because every one of those customers that adopt Redshift are going to come to us in the next 18 months when they run out of gas”. And he was right. That’s essentially what happened. A lot of Snowflake’s early business was Redshift conversions, well as working with semi-structured data, which we did a good job on and nobody else did.
Certainly we were better than Hadoop, which is what people were using at the time. And and so that was a major, the major part of the success force. So we were just better and frankly, particularly in AWS, we had a much better product. We’re lucky that we were in the timeframe we were establishing, and that’s the other thing, it was the right time. That was the time for establishing , the position in that, in the data space in the cloud. Because it was all pretty new.
Terms of our relationships with the vendors, they were challenging to say the least. We certainly had, many challenging times with Amazon, who we were competing with in Redshift. What I would say is first of all, is Amazon did an, incredibly good job of supporting Snowflake at all times and they were great at support and
AWS is a great product to build on top of. But they were brutally fought against us in the business marketplace in the early days and, it was pretty challenging it sometimes, but we were winning. We, the thing is we won those challenges partially because again, we had a better product and frankly we had them much a better trained sales team.
Our sales team. Was able to, to outsell Amazon’s. And so that was the early days of Amazon. And then, when we moved to Azure and we established Azure as our second cloud in part because of my relationships with Microsoft people, we were able to build good partner relationships there and actually had some amazing, very positive going to market motions with Microsoft in the early days where they did a bunch of joint selling with us and really discovered, whole different business.
What we discovered was that Azure, it was a whole set of customers we’d just never seen before. It’s just a whole, it’s a whole different market almost, really customers and we always said this, choose your cloud first and then choose your data warehouse.
And Snowflake ran on all of them, so it makes it a little easier. But at the time we were just running on AWS and then Azure and so it was positive it was a win-win situation in some senses for Microsoft and Snowflake to go together. I think that about the time I left Snowflake in 2019, Snowflake was probably becoming more competitive in a number of ways.
And in some senses the strength of the partnerships in Snowflake, I think flipped really. And they had a rough time with the Azure folks for a while and they actually built some very strong relationships with AWS, .
There’s lot of good things happening there. I think Google is still tough, if I’m not mistaken. , Google has, generally speaking, not the most partner-centric company on the planet. And I know that’s been a little bit more challenging for Snowflake, in part because they really love Big Query and they have the same feelings about Big Query that, the folks used to have about Redshift.
Only time will tell. These are challenging because they’re competitive they’re definitely complimentary and competitive.
Soma: Yeah the thing that was interesting for me to watch is, there would be a face off time when you would think now, hey, this particular cloud provider is the best partner. And then things will change and then things will change back. Just the volatile team, the partnership as Snowflake went from strength to strength and depending on where the other cloud providers were all, it was just fascinating to see how it was a very interesting and ever changing landscape.
Bob: It It just proves my first rule of partnership. Soma, partnerships are tactical. You know when it’s win-win, they work. When not, they start to falter a bit
Soma: But I think no Snowflake could be a sort of a good good, what should I say? An inspiration or a good role model or a good case study. For a lot of the other new startups that are coming up and saying “Hey, am I competing with the cloud providers or am I partnering? How do I navigate this tough thing?”
And depending on what space they are in and what the cloud provider’s aspirations are, it, many companies could be in a similar situation. That’s why I want to make sure that we talked a little bit…
Bob: That’s very true in fact. I have this conversation with a number of the companies that I talked to about potential, about their potential conflicts with cloud vendors. A lot of the stuff people are working on these days is complimentary to it’s new things that I don’t think have the same kind of conflicts that we had with Snowflake.
I do think in general though, that Snowflake is a good role model. Building a partner-centric company in general. In addition to really working with the strategic, the strategic cloud vendors and spending a lot of energy there, we spend a massive amount of time working to build an ecosystem and working with partners all around, whether they be, partners, like BI partners, ML partners, I mean, ATL partners, whatever it might be there, as size, and I feel very good about what Snowflake has done in that space.
And I think that, I definitely felt like I had something to do with that. And the history our shared history together at Microsoft, is the lessons that I learned from
Soma: Great, Bob. I thought it’d be good to take a step back now and think about hey you’ve been, as I mentioned, you’ve been working in data in one way, shape, or form for the last 30 years. How do, how have you seen the..
Over 30 years,
Bob: Been over 30 years. It was Windows NT Summit. It was Windows NT.
Soma: Over 30 years. But during this period of time, Bob, how have you seen the world of data evolve? New platforms, new computing paradigms, new devices, new, everything has happened. But the importance of data seems to have only gone from strength to strength and has exponentially gone up in the last 10, 15 years.
Now, I wanted to get your quick thoughts on “Hey, where do you see data today and where do you see data moving forward?”
Bob: It was just over 30 years ago that Bill did his information at your fingertips. Saw talk, which really, I was working on, I was a program manager when I started at Microsoft and SQL Server. So I was working and I had been working on database things as a, as really building applications inside a company before I joined Microsoft.
So I’d been focusing on data pretty much my whole career. And so I, while I’ve been focusing on SQL in the business side, but I still feel in some senses, like the beginning and the perspective began with information at your fingertips and all the focus that we had on information of all kinds at Microsoft and building out businesses and enabling people to work with data.
In the early days, I was found in, involved in SQL Server from the, from very early on. And then I watched as other folks at Microsoft built SQL Server and built it into the business that it really became. And I watched that transform. I watched these kinds of data systems together with the applications that sit on top of it, transform businesses of all sizes.
And, Microsoft contribution was the of all sizes really. You know, if you were a big company you could buy from IBM, Digital or Sun. A big, expensive set of systems, but Microsoft made servers that were quite inexpensive and brought computing to literally millions of small businesses around the world, maybe tens of millions never had it before.
And that was really, data was a centric part of that. We’ve watched, now we’re clearly head to head, we’ve gone through the internet era and the evolution of that has been new types of data that have become important, in particular, semi-structured data, that’s generated in large quantities by machines.
In some ways is some of the most important data we analyze today. We’re now living in a cloud centric world, which allows us to do things that we never could do before. I am a big believer that, data’s generated everywhere, but you need to centralize it to a certain extent to do analysis around it, to bring different types of data together so that you can, you can perform the relationships, but look at the relationships between them and perform the kinds of and dashboarding that people want to do, as well as deeper analysis with machine learning. So things have changed so dramatically from, a fairly simple environment where literally people worked on pencil and paper.
Literally we’re, Excel was a massive step forward and in or 1, 2, 3, was a massive step forward in dealing with information now this world of the cloud where we have this vast amount of data available to us. It’s pretty amazing really.
Soma: You just summed it up Bob. It’s pretty amazing actually. What in the world, how far we’ve come along, but for all the progress we’ve made, I feel like there is still a turn more that is waiting to happen. And it’s just that the rate of innovation is only getting faster as opposed to slower as we move forward here.
Today, like now you can’t have a conversation about data without know, talking about the modern data stack. That’s that’s a sort of buzzwords or a new concept or whatever you want to think about it kind of thing. But everybody talks about like the modern data stack. In your mind, how do you define the modern data stack?
Bob: So I people have been trying to work with data in a variety of ways and fundamentally the cloud and the ability for companies to work together to provide a complete solution for organizations on the cloud, has never been as strong as it is today. And that’s really what the modern data stack is about.
Really enabling the industry to work together to provide solutions to companies. And those companies take on a cer, those solutions take on a certain shape in the modern data stack. And there’s three defining characteristics that I think exemplify that the modern data stack is really building modern, building data analytics.
That is delivered as a SaaS cloud service first and foremost, which means that building these components, you’re purchasing them from third parties that are providing the service for you and means that a lot of the things are taken care of for the customer. So the first thing is that it’s a SaaS service.
The second thing is that it runs in the cloud and it takes advantage the scalability, that the cloud provides to allow you to work. All of your users and any kind of data, and I mentioned earlier that data, is both structured data that people work with SQL and semi-structured data that comes out of machine generated systems.
But it’s also more and more other types of data that are, is quite rich in terms of its content. People sometimes refer to this as unstructured data. I really tend to think of it as complex data. Data types such as video, audio, photos, turns out to be a rich source of complex data.
All of these things that exist in business in the form of documents of all kinds and recordings of all kinds are essentially sources of data for the modern data stack. And with the cloud, it needs to scale for all, to work with as much as many users as you want, and so the final point is that when you’re doing the analysis against it the data is modeled for a SQL database.
And that’s, that I think is a distinguishing element of the modern data stack. Is when the data comes into the system the way you actually transform it, there are multiple techniques for doing it. And so let’s put that aside. But the target environment you’re modeling it for is a sequel database and you use relational commands.
Relational algebra, basically to operate against that, that data in a relational form. So three things. Data analytics is a service that leverages the cloud for scale and models data for SQL.
Soma: That’s great. And today, Bob, if you look at it and this is know I don’t know whether I would’ve predicted that. Maybe in hindsight it’s easy to say I thought about it this way kind of thing. But today you could say that there are like, key or big technology vendors that are providing like, vast parts of the modern data stack.
, you got the three cloud platform guys in Microsoft, Amazon, and Google. And then you got Snowflake and Data Bricks. And the fact that Snowflake and Data Bricks have literally come out from nowhere in the last, let’s say, eight years or so is fantastic because hey, that shows that hey, you can innovate, you can get to scale, you can get to a level of success even outside the biggest platform guys kind of thing.
And that I think is just goodness for the whole innovation ecosystem. Do you feel like, five is too many? Do you feel, five years going to become eight or any sort of thoughts on.
Bob: It’s about right. The database market has always been somewhat fragmented. It’s never been a winner take all. I Oracle has classically been the largest winner in the database market, but even they’re only like 40%. It’s a, it’s, it is a market that has a number of vendors and I think that’ll change.
My guess is you’ll actually see some more vendors appear. We see some smaller players coming in trying to on these five vendors in a variety of ways. I think it’s hard. There may be some, we may see six or seven having some small percentage share because for some more niche markets, I think these are the big five.
I think that, there’s going to be a big dog. There is a big dog fight happening between Snowflake and Data Bricks, and we’ll watch that get fought out for the next year or two. And meanwhile, the cloud vendors will just do what the cloud vendors do and their products will all get better. I do believe that the cloud vendor products, what clearly behind things like Snowflake, are getting better.
Google is probably the furthest along. And this micro, I know a fair amount of what the Microsoft team is doing. There’s a lot of actually great work happening there. I see some good stuff coming in the future.
Soma: That’s excellent as much as all these five players, and there are a whole host of other companies that are talking about “Hey, I’m building this for the modern data stack, “. As you see what is happening in the world today, do you see any big gaps in terms of what needs to happen in the modern data stack to make it really more complete and more robust for the next set of applications?
Bob: Yeah, I think there’s a number of really major gaps. I’m fairly sure that the platforms people are using for machine learning are fairly nascent and will evolve. I mean, I’m fairly sure that’s true. That, Spark is, has a lot of adoption, I don’t think it’s the end answer to every problem
and I think we’ll see evolution in that space. There are data types, , problem characteristics that are very poorly solved today, like graph. Problems are really situations where you have a lot of relationships between things and and if you look at the data model it’s a very large number of relationships that need to be managed more than a sequel database can handle.
And in general, the graph problem is poorly solved by today’s products. Meanwhile, there’s other things, that are critical to business logic. Like reasoning, which are still done pretty separate from the modern data stack. And you have bits and pieces of code all over the place and I think that’ll converge into more model based things over time.
I think a lot of the future is really around the evolution of model based development. And I think we’re in the early stages of that.
Soma: You talked about know, SQL systems and you talked about graph databases. Bob my perspective, and I’ll share that with you and tell me if that makes sense or not, is historically, and even today, the world is bifurcated into “Hey, you can go deal with relational database systems”.
And or you can go deal with the knowledge graph systems. Those two words are what I call two silos. They don’t, they really haven’t come together.
kind of thing.
Bob: You mean? relational systems or procedural systems today.
Soma: Or procedural systems. Yeah.
Bob: Yeah, like you’re working writing code in Python the one side and then sequel on the other side. Is that what you mean?
Soma: Do you think they’ll come together? They should come together? Do you think there is an opportunity?
Bob: I do. And I think that’s, as you said, a knowledge graph, that’s what a knowledge graph really can do. It and really the idea behind a knowledge graph is that you can encode of the attributes of the business into the database, the logic associated with the business.
And, that makes it, the idea then is it becomes a complete model. The organization is actually executable, where the model is the code itself. And in a way this has been a dream of computer science since I was a kid, when I was just not far out of school. I did work in early models based things where you model stuff with these diagram stuff and they spit out coball code at the bottom, which of course didn’t really was unmaintainable and didn’t work and all kinds of issues.
But, and because it never worked those sorts of ideas of modeling became more of a whiteboard effort, and I will argue that people always model business. Every, when you’re working at anything you’re doing, you’re modeling. But in today’s world, we do it implicitly. Implicitly, and we do it, with, might write a model of something that’s relatively thought through on a whiteboard, but then that gets implemented as bits and pieces of code all over the place, implicitly within the systems.
And I think we’ll move to a world that’s much more explicit in our, in what we’re defining. And that will happen when the knowledge graph comes about and when we think about implementing a knowledge graph. I’m pretty clear that they will be relational and they will leverage relational algebra and relational mathematics.
Partially because the industry has moved forward significantly in the last 10 years in terms of understanding algorithmic changes. New algorithms that allow you to work with large numbers of relationships sufficiently, and actually do things that you could never do previously with a SQL database because we just didn’t have the sophistication of the algorithms.
That are now appearing. So it’s pretty exciting actually. But it’s also early.
Soma: That’s true but as data becomes, what I call more democratized. One of the things that you talk to enterprise , CIOs, they’ll tell you that “Hey, we are really putting a lot of energy and effort into consolidating and standardizing our data infrastructure”.
But along with all these huge volumes of data and what can you do meaningfully with the data kind of thing. One issue that keeps coming up pretty much in every customer conversation today is data governance. This is also an area where, particularly in the last two years I would say, there have been a ton of new startups emerging.
All addressing one part of data governance or the other part of data governance kind of thing. How do broadly aid the space of data governance, and the kinds of companies that are coming up? Are there any specific companies that particularly catch your attention in the space?
Bob: Not really. They’re good companies. There are some good companies. They’re doing pieces of, solving pieces of the problem. But when I think about, you the issue with the modern data stack, governance is a very real problem. I It was always going to emerge as a major issue when we took data that was scattered everywhere and we put it together that it creates a certain risk profile where, which makes access control to that very important.
And in particular, that’s the aspect. I There are many aspects of governance data modeling. Data observability all sort, many things but the one that, that I think is that sort of at the top of people’s list is access control. And while there are products in the market that, that address some elements of that, I don’t think we’ve really reached the pinnacle of where we need to be.
And I, I don’t feel like we’re well served, that our customers are well served here. I believe that while there are some ways to, there are different ways to solve the problem and perhaps there are some shortcuts that people can take. I think in the long run, the right way to solve that is by having, establishing a semantic model that understands what the business is, which is essentially a knowledge graph.
And then from that you can derive the rules, for the policies that you want to establish on your data have a very much of a policy based approach, that’s based on the business data itself. And I think we’re still away from having a standardized platform to enable that. And that’s what we really need.
You know, one of the challenges we have, and I think one of the reasons why there we’re not seeing as much success in governance and modern data stack as customers might want, is that all of these tools that are coming out don’t use the modern data stack as their database. And the reason they don’t is cause they can’t. Because doesn’t solve the problems for them.
So they all use some sort of operational database of their own. They take different approaches, but none of them inter operate. I think what we need is a common platform for a semantic model, that will become the basis for modern data stack governance. I believe that platform will be a relational knowledge graph.
It’s still early. But that’s where I think it’ll go. In the meantime, I hope we can get customers and get some answers out there if it helps to solve their problems.
Soma: True. Let’s move up the stack a little bit. You’ve seen like open AI, do some great work and in the last many years on large scale machine learning models. You’ve got all kinds of face recognition and other kinds of machine learning models that are coming up at scale.
What do you think this, the situation is today in terms of these machine learning models? Do you feel , the right amount of innovation is happening there? How do you think these models are going to be evolving over time? Any perspective on where we are with models and where we are going.
Bob: It’s very exciting, I have to say. I mean it, we’ve seen incredible progress in the last five years even. I would say it’s accelerating progress. I recently had a conversation with Xuedang Huang who runs the, all the artificial, the machine learning team at Microsoft and is working with open AI and working on foundation models it, and they’re doing a lot of work on combinatorial foundation models where they bring multiple different types of data together into one large model.
These foundation models, let’s talk about that for just a second. What they, sometimes they’re called large language models. Which is fine, but it only speaks for one domain, which is the language oriented ones, because some foundation models also apply to photos and other domains besides written language.
What they really are is world scale trained machine learning models that are trained on a corpus that approaches, global scale. And so you know, what they become essentially is incredible concentrators human knowledge. Into a model. Now the models are statistically driven.
They’re not perfect. There’s still advances that need to be done in these. But this idea of of using machine learning to take the expertise of a given domain in the world and distill it into a model is fairly incredible in my opinion.
I don’t, I can’t think of a domain that it won’t effect. Honestly, I think it affects everything. I think it affects every single element of everywhere we go. So I think that’s a very exciting, element of what’s happening. We see some incredible stuff. This DALL-E stuff is interesting. Now people are doing videos against it.
This model that, that came to OpenAI, that was one of the early rewrite code, writing models code X. You know, has done some amazing things. GitHub copilot has been an incredible success for Microsoft and is really doing dramatic things to improve developer productivity.
And, I’m seeing people use that for different purposes. Can take and improve it writing in and do things around running sequel as an example. And very powerful ideas can come from that. On the other end of the sort of spectrum I think that there’s an opportunity, where you’re trying to use artificial inte machine learning AI to improve the business workflow in a given organization where the domain is actually the terminology of that organization itself.
And it, it’s much smaller and it’s, there’s no global model to look at. There’s some local set of content you can look at. And in that case, the interesting thing is how do you inform the model more and more about the business? I think what we’re going to see is, user assisted, interactive training models appearing, which are really applications they look like, applications are working with a given domain and then leveraging machine learning to really improve business process.
And the company I’ve been involved in that’s been working on this, that I’m pretty excited about is Docugami our friend, Jean Paul of Microsoft is CEO. And they’re really focusing on taking business documents, and other high value business documents and inverting them and in turning them into data that can be processed by data systems.
And in order to do that, you really need to understand the semantics of that document. And that requires user assistance. And that’s why this interactive development is important. So that’s a real, a UI kind of experience in that. That’s two different ends of the spectrum in some senses but both examples are pretty interesting to me.
Soma: . Bob you heard me talk about this quite a bit in in the last many years. We we are absolutely convinced that every application that gets built today and moving forward is what we refer to as intelligent application. You like to call it data driven applications but it’s basically hey, taking a corpus of data that is a available to the application.
Being able to, build a continuous learning system using some machine learning models, and then continue to get better and better. Deliver a better service, you get more data. The process just, it creatively makes the application, the service get better. That’s the world we see happening today, and as we move forward. (A) do you agree with that viewpoint? And then (B), what are some of the core things that you think are happening that is going to drive the world, getting there?
Bob: If I didn’t agree with that viewpoint, I wouldn’t be continuing to do this work. I Let’s face it, that’s the reason I continue to do the work. Look, my, my whole purpose basically, in, in my business career has been to build infrastructure components that make people’s lives easier in business in one sense or another.
And data has been a huge part of that. But I I worked on Windows Server for a lot of years and I built System Center and helped with Visual Studio and all those sorts of things where we’re not databases, they were all, they’re all about making it easier for people to build.
Systems to help them more effective in their lives. And in particular in business, I’ve mostly focused on the business side, not the consumer side. It’s interesting because, because when we think about this world today, we’re seeing a world where machine learning is transforming. I think pretty much every application category I talked about, foundation models, essentially distilling the world’s knowledge into a model in some sense or another.
It’s not perfect. And even though I would say it, it’s it it is a great learn model. It might not it probably isn’t what one would call fully intelligent. It doesn’t reach the point of saying this is intelligent. However, it is an incredible source of information and can be used as a base as many things.
But there’s things that are missing machine learning and that are missing in a more full intelligence things. And in particular, that’s things like reasoning. The ability to to reason over something that says, is this because I know that this is something else. And these systems, these models have a very hard time with that today.
They have a very hard time with that. And they can, sometimes that’s when they go off into wacko things, it’s because they haven’t got the ability of adding reasoning to that. Now, I’m sure that’ll change. I mean, I’m very confident that we’ll see reasoning get added to these models in a variety of ways.
I think of this problem is what’s the infrastructure that would actually solve this problem in a more generalized sense? I mean, give somebody a Python compiler or a C compiler. In a hundred nodes they can do an awful lot. But, to me it’s what sort of infrastructure you can build, can you build to make these systems more available to a larger number of companies.
And that’s why I think it all consolidates ultimately into a relational database that will take the form of a knowledge graph. And I do think ultimately these things will come together that where you can take all of the components of intelligence, and let me to somewhat define that, is a program that can sense.
It can reason, it can plan, act, and adapt. And we see these components coming together in different systems today, in different parts of intelligence systems today. But the idea of them coming together in a cohesive platform, we’re still some distance away from that. And to some extent where I’m thinking, how does the world, lots smarter people than me are going to build these models that do these amazing things. But to me, I’d like to figure out how I can help facilitate the creation of platforms to enable all these things to be created cohesively by mere mortals. Not just the great, smartest minds on the planet.
Soma: That, that’s awesome, Bob. That’s great to hear. And I’m so glad you’re continuing to be fully focused on that mission because I think the world needs that kind of an infrastructure and the kinds of innovation that the infrastructure can provide. Like you say, that makes what I call building an intelligent application, something that every developer can do and not just the rocket scientist in the world kind of thing.
So that I’m a big believer in democratizing access to all the developers and so kudos to everybody who’s working on the infrastructure that’s going to enable that to happen. In a, I know that we are coming up on time before we wrap up. There is one thing that keeps coming up quite a bit.
When I talk to all these, what I call modern data companies, are startups, right? There are two things that they ask about “Hey, how should I think about open source?” and the other thing is, how should I think about in a product led growth, these are two things that every startup founder or CEO’s thinking about, Hey when does it make sense for me to think about open source?
When do I not think about this? When do I think about this? Particularly given your sort of, experience with a variety of sort of an proprietary and open source work and product led growth versus enterprise sales kind of thing. Are there any sort of parting words of wisdom that you want to share with the next generation entrepreneurs?
Bob: What I would say is, the biggest advantage to open source is the potential path to rapid adoption of particularly a developer focused technology. And the, the ability to get more end users, using it more, developers using it more quickly. It’s appropriate if the component runs in the infrastructure of the customer.
And it, I would say today it’s may even be essential. If you expect a component to run as a core integral part of an infrastructure, Kaftka is a good example of it, right? I Kaftka’s a perfect sort of example of something like this where that thing’s going to be sitting in, know all over the place inside customer’s infrastructure and they’re going to, they just want it to be open source for their ability to choose vendors and all sorts of stuff.
Those are good reasons to, to do open source. You know, the, the challenge with open source is that you essentially have to abandon it. To build to build a business. I’m not going to say it’s a rouse, but it’s a, it, it’s it’s a, you’ve got to do an extended focus at the very least, where you’ve got open source and then you have something that’s commercial because that’s the only way to monetize.
I In the old days, people monetized open source with services and that was Red Hat’s business model. That’s gone away with the cloud. There’s no, the cloud doesn’t help that. You can’t take what you just put in open source classically and just run it in the cloud because the cloud vendors can do the same thing and they have infinite distributions.
And their cogs are lower than you, so you’re screwed from day one. But if you differentiate and start with an open source integral component, then build on top of it. In some ways it can be very successful. There are, there are certainly examples of that. But again, what is these companies are going off and innovating in, in, in non-open source ways right now.
Soma: Bob, fantastic to chat with you as always, fun conversation and really appreciate you taking the time to be with us here and do this podcast. Thank you.
Bob: Great. It’s good to talk to you again, Soma. Thanks.
Coral: Thanks for joining us for this week’s episode of Founded & Funded, and don’t forget to check out our Intelligent Application Summit event page if you’re interested in these types of discussions. Thanks again for joining us and tune in a couple of weeks for our next episode of Founded & Funded with dbt Labs Founder and CEO Tristan Handy.