AI+Data in the Enterprise: Lessons from Mosaic to Databricks

Listen on Spotify, Apple, and Amazon | Watch on YouTube

How do AI founders actually turn cutting-edge research into real products and scale them? In this week’s episode of Founded & Funded, Madrona Partner Jon Turow sits down with Jonathan Frankle, Chief AI Scientist at Databricks, to talk about AI+Data in the enterprise, the shift from AI hype to real adoption, and what founders need to know.

Jonathan joined Databricks, a 4-time IA40 Winner, as part of that company’s $1.3 billion acquisition of MosaicML, a company that he co-founded. Jonathan is a central operator at the intersection of data and AI. He leads the AI research team at Databricks where they deploy their work as commercial product, and also publish research, open-source repositories, and open-source models like DBRX and MPT. Jonathan shares his insight on the initial vision behind MosaicML, the transition to Databricks, and how production-ready AI is reshaping the industry. He and Jon explore how enterprises are moving beyond prototypes to large-scale deployments, the shifting skill sets AI founders need to succeed, and Jonathan’s take on exciting developments like test-time compute. Whether you’re a founder, builder, or curious technologist, this episode is packed with actionable advice on thriving in the fast-changing AI ecosystem.


This transcript was automatically generated and edited for clarity.

Jonathan: Thank you so much for having me. I can’t wait to take our private conversations and share them with everybody.

Jon: We always learn so much from those conversations. And so, let’s dive in. You’ve been supporting builders with AI infrastructure for years, first at Mosaic and now as part of Databricks. I’d like to go back to the beginning. Let’s start there. What was the core thesis of MosaicML, and how did you serve customers then?

Jonathan: The core thesis quite simply was making machine learning efficient for everyone. The idea is that this is not a technology that should be defined by a small number of people, that should be built to be one-size-fits-all in general, but that should be customized by everybody for their own needs based on their data. In the same way that we don’t need to rely on a handful of companies if we want to build an app or write code, we can just go and do it. Everybody has a website. Everybody can define how they want to present themselves and what they want to do with that technology. We really firmly believed in the same thing for machine learning and AI, especially as things started to get exciting in deep learning. And then, of course, LLMs became a big thing halfway through our Mosaic journey.

I think that mission matters even more today to be honest. We’re in a world where we bounce back and forth between huge fear over the fact that only a very small number of companies can participate in building these models, and huge excitement whenever a new open-source model comes out that can be customized really easily, and all the incredible things people can do with it. I firmly believe that this technology should be in everyone’s hands to define as they like for the purposes they see fit on their data in their own way.

Jon: It’s a really good point, and you and I have spoken publicly and privately about the democratizing effect of all this infrastructure. I would observe that the aperture of functionality that Mosaic offered, which was especially about hyper-efficient training of really large models, putting it in the hands of a lot more companies, that aperture is now wider. Now that you’re at Databricks, you can democratize more pieces of the AI life cycle. Can you talk about how the mission has expanded?

AI+Data in the Enterprise: The Expanding Mission at Databricks

Jonathan: Yeah. I mean, it was really interesting. Matei, our CTO, I was looking at his notes for a meeting that we had for our research team last week and he had written in his notes casually, our mission has always been to democratize data and AI for everyone. I was like, “Well, wait a minute. That sounds very familiar.” I think we may chat at some point about this acquisition and why we chose to work together. It’s the same mission. We’re on the same journey. Databricks obviously was much more further along than Mosaic was and wildly successful, but it’s great to be along for the ride.

The aperture is widened for two reasons. One is simply that you don’t need to pre-train anymore. There are awesome open-source base models that you can build off of and customize. Even pre-training was the thing that wasn’t quite for everyone, but that’s not necessary anymore. You can get straight to the fun part and customize these models through prompting or RAG or fine-tuning or RLHF these days.

The aperture is also widened to the fact that now we’re at the world’s best company for data and data analytics, and the world’s best data platform. What is AI without data and what is data without AI? We can now start to think much more broadly about a company’s entire process from start to finish with a problem they’re trying to solve. What data do they have, what is unique about that data and unique about their company? Then from there, how can AI help them or how can they use AI to solve problems? This is a concept we call data intelligence.

The idea that it’s meant to be in contrast to general intelligence. General intelligence is the idea that there’s going to be one model or one system that will generally be able to solve every problem or make significant progress in every problem with minimal customization. At Databricks, we espouse the idea of data intelligence that every company has unique data, unique processes and a unique view on the world that is captured within their data, how they work, and within their people. AI should be shaped around that. The AI should represent the identity of the business, and the identity of that business is captured in their data. Obviously, this is very polemic to say data intelligence versus general intelligence. The answer will be somewhere in between. To me, honestly, every day at work feels like I’m doing the same thing I’ve been doing since the day Mosaic started, just now at a much bigger place with a much bigger ability to make an impact in the world.

Jon: There’s something very special about the advantage that you have that you’re seeing this parade of customers who have been on a journey from prototype to production for years now, and the most sophisticated among them are now in production. And so for that, I have two questions for you. Number one, what do you think it was that has finally unblocked and made that possible? And number two, what are those customers learning who are at the leading edge? What are they finding out that the rest of the customers are about to discover?

AI+Data in the Enterprise: Scaling from Prototype to Production

Jonathan: So, I’m going to, I guess, reveal how much less I’m a scientist these days and how much more I become a business person. I’m going to use the hype cycle as the way to describe this, and it breaks my heart and makes me sound like an MBA to do this. Among enterprises, there are always the bleeding edge early adopter tech-first companies; they’re the companies that catch on pretty quickly and the companies that are more careful and conservative. What I’m seeing is those companies are all in different places in the hype cycle right now. For the companies that are early adopters in tech-forward, the peak of inflated expectations, they hit that two years ago around the time ChatGPT first came out. They hit the trough of disillusionment last year when it was really hard to get these systems to work reliably, and they are now getting productive and getting things shipped in production. They’ve learned a lot of things along the way.

They’ve learned to set their expectations properly to be honest, and which problems make sense and don’t make sense. This technology is not perfect by any stretch, and I think the more important part is we’re still learning how to harness it and how to use it in the same way that having punch cards back in the 1950s or ’60s is still turning complete and still a little bit slower, but just as capable as our computing systems today from a theoretical perspective. 50 years of software engineering later, and it’s much easier to build an architect a system that will be reliable and build it in much other way, and all these principles we’ve learned. I think those companies are furthest along in that journey, but it’s going to be a very long journey to come. We know how big of a system we can build at this point without it keeling over and where the AI’s going to be unreliable, and where we need to kick up to a human, which tasks make sense, which tasks don’t make sense.

A lot of them I’ve seen have whittled it down into very bite-sized tasks. The way that I typically frame it for people is you should use AI either in cases where it’s open-ended and there’s no right answer, or where a task is hard to perform, but simple to check, and you can have a human check. I think GitHub Copilot is a great example of this where you could imagine a situation where you ask AI to write a ton of code. Now, a human has to check all that code and understand it. Honestly, it may be as difficult as writing the code from the beginning or pretty close to it. Or you can have the AI suggest very small amounts of code that a human can almost mechanically accept or reject, and you’re getting huge productivity improvements. This is a scenario where the AI is doing something that is somewhat more laborious for the human, but the human can check it very easily.

Finding those sorts of sweet spots is where the companies who have been at this the longest. They’ve also been willing to take the risk and invest in the technology. They’ve been willing to try things, they’ve been willing to fail to be honest. They’re willing to take that risk and be okay if the technology doesn’t work the first or second time, and keep whatever team they have doing this going and trying it again. A bunch of companies are in the trough of disillusionment right now, companies that are a little less on the bleeding edge. Then a bunch of companies are still at that peak of inflated expectations where they think that AI will solve every problem for them. Those companies are going to be very disappointed in a year and very productive in two years.

Jon: Naturally, a lot of founders who are going to be listening are asking, how do they get in these conversations? How do they identify the customers that are about to exit the trough, and how do they focus for them? What would you say to those founders?

AI+Data in the Enterprise: Landing Customers from Startups to Fortune 500s

Jonathan: I have two contradictory lessons from my time at Mosaic. The first is that VCs love enterprise customers because enterprise customers are evidence. At least if you’re doing B2B, you’re going to be able to scale your business, and you have some traction with companies that are going to be around for a while, that have big budgets, that when they invest in a technology invest for the long run. On the flip side, the best customers are often other startups because there’s no year-long procurement process. They’re willing to dive right in, they understand where you’re coming from and understand the level of service you’ll be able to provide because they’re used to it. You can get a lot more feedback much faster, but that is taken as less valuable validation. Even when I’m evaluating companies, enterprise customers are worth more to me, but startup customers are more useful for building the product and moving quickly. So, the answer is — strive for enterprise customers, don’t block on enterprise customers.

Jon: I think that’s fair, and optimizing for learning is really smart. There’s another thread that I would pull on, and this is something that I think you and I have both seen in the businesses that we’ve built, which is the storytelling. I won’t even say GTM, the storytelling around our product can be segmented even if the product is horizontal as so many infrastructure products are. Mosaic was a horizontal product. Databricks is a horizontal family of products, but there are stories that we tell that explain why Databricks and Mosaic are useful in financial services, really useful in healthcare, and there’s going to be a mini adoption flywheel. Not so many in each of these segments where you do want to find, first, the fast customers and then the big customers as you dial that story in. There may be product implications, but there may be not.

Jonathan: That’s a great point, and there are stories, I think, along multiple axes. These days in a social media world and in a world where everybody’s paying attention to AI, there are horizontal stories you can tell that will get everyone’s attention. One of the big lessons I took away from Mosaic was to talk frequently about the work you’re doing and have some big moments where you buckle down and you do something big. Don’t disappear while you’re doing it, but releasing the MPT models for us, which sound so quaint only a year and a half later. It really was only a year and a half ago that we trained a 7 billion parameter model on 1 trillion tokens. It was the first open-source commercially viable replication of the Llama 1 models, which sounds hilarious now that we have a 680 billion parameter mixture of expert model that just came out. The most recent metamodel was a 405 billion parameter model trained on 15 trillion tokens.

It sounds quaint, but that moment was completely game-changing for Mosaic, and it got the attention up and down the stack and across all verticals, across all sizes of companies, and led to a ton of business. Further moments like DBRX more recently same experience. Storytelling through these important moments, especially in an area where people are paying close attention, actually does resonate universally. At the same time, I totally hear you on the fact that for each vertical, and for each size of company, there is a different story to tell. My biggest lesson learned there is getting that first customer in any industry or in any company size or anything like that is incredibly hard. Somebody has to really take a risk on you before you have much evidence that you’re going to be successful in their domain.

Having that one story that you can tell leads to a ton more stories. Once you work with one bank, a bunch of other banks will be willing to talk to you. Getting that first bank to sign a deal with you and actually do something, even for the phenomenal go-to-market team we had at Mosaic was a real battle. They had to really fight and convince someone that they should even give us a shot, that it was worth a conversation.

Jon: Can you take me back to an early win at Mosaic where you didn’t have a lot of credentials to fall back on?

Jonathan: It was a collaboration we did with a company called Replit. Before we had even released the MPT models, we were chatting with Replit about the idea that we could train an LLM together, that we’d be able to support their needs there. They trained MPT before we trained MPT. They were willing to take a risk in our infrastructure, and we delayed MPT because we only had a small number of GPUs and we let Replit take the first go of it. I basically didn’t sleep that week because I was monitoring the cluster constantly. We didn’t know whether the run was going to converge. We didn’t know what was going to happen. It was all still internal code at that point in time, but Replit was willing to take a risk on us, and it paid off in a huge way. It gave us our first real customer that had trained an LLM with us and been successful and deployed in production. That led to probably a dozen other folks signing on right then and there. The MPT model came out after that.

Jon: How did you put yourself in a position for that lucky thing to happen?

Jonathan: We wrote a lot of blogs. We shared what we were working on. We worked on the open-source, we talked about our science, and we built a reputation as the people who really cared about efficiency and cost and the people who might actually be able to do this. We talked very frequently about what we were up to, and that was a lesson we had learned early on where I don’t think we talked frequently enough, but we wrote lots of blogs. When we were working on a project, we would write part one of the blog as soon as we hit a milestone, we wouldn’t wait for the project to be done. And then do part two and part three. And those MPT models were actually, I think, part four of a nine-month blog series on training LLMs from scratch. And that got Replits attention much earlier and started the conversation.

Maybe one way of looking at it if you want to be cynical is selling ahead of what your product is, but I look at it the other way, which is to show people what you’re doing and convince them that they can believe you’re going to take that next step. They want to be there right at the beginning when you first take that next step, because they want to be on the bleeding edge. I think that’s what got the conversation started with Replit and put us in that position. We were going to events all the time, talking to people, trying to find anyone who might be interested in enterprise that had a team that was thinking about this. There were a bunch of folks we were chatting with, but we had already started contracting deals with folks, but Replit was able to basically move right then and there. They were a startup. They could just say, “We’re going to do this,” and write the check and do it.

Jon: So being loud about what it is that you stood for and what it is that you believed.

Jonathan: And being good at it. I think we worked really hard to be good at one thing, and that was training efficiently. You can’t fake it until you make it on that. Like we did the work, and it was hard and we struggled a lot, but we kept pushing. At the strong encouragement of Naveen and Hanlin, our co-founders, they kicked my butt to keep pushing even when it was really hard and really scary and we were burning a lot of money, but we got really good at it. And I think people recognize that and it led to customers, it led to the Databricks acquisition. And I’m now seeing this among other small startups that I’m talking to in the context of collaboration, in the context of acquisition, anything like that.

The startups I’m talking to are the ones that are really good at something. It’s clear they’re good at something. It’s been clear through their work, I can check their work, they’ve done their homework and they show their work. Those are the folks that are getting the closest look because they’re genuinely just really good at it, and you believe in them and you know the story they’re telling is legitimate.

Jon: There’s one more point on this, which I think complements and extends what you said, that your folks believed in something. This is not about a story and it’s not about results either that you believe training could be and should be made more efficient. A lot of the work you were doing anticipated things like Chinchilla that quantified how it could be done later.

Jonathan: Oh, we didn’t anticipate. We followed in the footsteps of Chinchilla. Chinchilla was early visionary work, and I can say this, Eric Olson, who worked on Chinchilla is now one of my colleagues on the Databricks research team. I mean, there are a few moments if I really want to look for the pioneers of just truly visionary work that was quite early. When I look back is just like tent pole work for LLMs.

Now, Chinchilla is one of those things. The other is like a EleutherAI putting together The Pile dataset, which was done in late 2020, 2 years before anyone was really thinking about LLMs. They put together what was still the best LLM training data set into 2022. We did genuinely believe in it, I think to your point. We believed in it and we believed in science, we believed that it was possible to do this, and through really, really rigorous research. We were very principled and had our scientific frameworks that we believed in our way of working. We had a philosophy on how to do science and how to make progress on these problems. OpenAI believes in scale, and now everybody believes in scale. We believed in rigor and that doing our homework and measuring carefully would allow us to make consistent, methodical progress, and that remains true and remains the way we work. It’s sometimes not always the fastest way of working, but at the end of the day, at least ait is consistent progress.

Jon: So here we are in 2025, and amazing innovation is happening and there’s even more opportunity than there has been, it seems to me. Even more excitement, even more excited people. How do you think the profile and the mix of skills in a new team should be the same and should be different as to when you formed Mosaic?

AI+Data in the Enterprise: How Research Shapes Business AI

Jonathan: It depends on what you’re trying to do. We hire phenomenal researchers who are rigorous scientists who care about this problem and are aligned with our goals, who share our values, who are relentless, and honestly who are great to work with. I think culture cannot be understated, and conviction is the most important quality. If you don’t believe that it is possible to solve a scientific problem, you will lose all your motivation and creativity to solve it because you’re going to fail a lot. The first failure, you’re going to give up, but beyond that, I think this is data science in its truest form. I never really understood what it meant to be a data scientist, but this feels like data science. You have to pose hypotheses about which combinations of approaches will allow you to solve a problem. About measuring carefully and developing good benchmarks to understand whether you’ve solved that problem.

I don’t think that’s a skill that’s confined to people with Ph.D.s far from it. So, the fact that Databricks was founded by a Ph.D. super team now means that more than 10,000 enterprises don’t need a Ph.D. super team when it comes to their data. I look at our Mosaic story through to our Databricks story now in the same way. We built a training platform and a bunch of technologies around that, and now we’re building a wide variety of products to make it possible for anyone to build great AI systems. In the same way that when you get a computer, and you want to build a company, you don’t have to write an operating system, you don’t have to build a cloud, and you don’t have to invent a virtual machine. I mean, the abstraction is the most important concept in computer science. Databricks has had a Ph.D. super team to build that low level infrastructure that required it to build Spark and Delta and Unity Catalog and everything on top of that.

And now it’s the same thing for AI. The future of AI isn’t in the hands of people like me. It’s in the hands of people who have problems and can imagine a solution to those problems. In the same way that, I’m sure, Tim Berners-Lee who pioneered the web, did not exactly imagine, I don’t know, TikTok. That was not what he had in mind when he was building the World Wide Web. The startups I’m most thrilled about engaging with today are companies that are using AI to make it easier to get more out of your health insurance, making it easier for you to solve your everyday problems, making it easier for you to just get a doctor’s appointment or for a doctor to help you. For us to spot medical challenges earlier, that’s the people who are empowered because they don’t have to go and build an LLM from scratch to do all that. That layer has now been created.

The future is in the hands of people who have problems and care about something. For a Ph.D. super team these days, there’s still tons and tons of work to do in making AI reliable and usable, building the tools that these folks need, building a way for anyone to build an evaluation set in an afternoon so that they can measure their system really quickly and get back to work on their problem. There’s a ton of really hard, complex, fuzzy-like machine learning work to do, but I think the interesting part is in the hands of the people with problems.

Jon: How is your role changing as you adopt these AI technologies inside Databricks? And you try to be, I’m sure, as sophisticated as you can be about it.

Jonathan: I’m still a scientist, but I haven’t necessarily written a ton of code lately. But I spend a lot of times these days connecting the dots between research and product, and research and customer, and research and business. Then come back to the research team and say, “I think we really need to do this. How can RL help us do that?” And then go to the research team and say, “You’ve got this great idea about this cool new thing we can do with RL. Let me go back to the product and try to blow their mind with this thing that they didn’t even think about because they didn’t know it was possible.”

Show up with something brand new and convince them, we should build a product for that because we can, and because we think people will need it. In many ways, I’m a bit of a PM these days, but I’m also a bit of a salesperson. I’m also a manager and I’m trying to continue to grow the incredible skills of this research team, both the people who have been with me for four years and the people who have just arrived out of their Ph.D.s and make them into the next generation of successful Databricks talent that stays here for a while, and maybe goes on to found more companies like a lot of my former colleagues at Mosaic have.

It’s a little bit of everything, but I have had to make this choice about whether I’m going to be as really, really deep as a scientist, write code all day, get really, really good at getting the GPUs to do my bidding, or get good at leadership and running a team and inspiring people and getting them excited and growing them, or get good at thinking about product and customers, and what combination I wanted to have there. That combination has naturally led me away from being the world’s expert on one specific scientific topic, but towards something I think is more important for our customers, which is understanding how to use science to actually solve problems.

Jon: There’s an imaginative leap that you have to make from the technology to the persona of your customer, and the empathy with that that I imagine involves being in a lot of customer conversations, but it’s an inversion of your thinking. It’s not, here’s a hard problem that we’ve solved, what can we do with it?” It’s keeping an index of important problems in your head and spotting possible solutions to that maybe?

Jonathan: I think it’s the same skill as any good researcher. No good researcher should be saying, “I did a cool thing. Let me find a reason that I should have done it.” Sometimes very occasionally, this leads to big scientific breakthroughs, but for the most part, I think a good productive everyday researcher should be taking a problem and saying, “How can I make a dent in this?” Or finding what the right questions are to ask and asking them and coming up with a very basic solution. All of these sound like product scenarios to me. Whether you’re building an MVP, like figuring out a question that hasn’t been asked before that you think is important to be asking in building an MVP and then trying to figure out whether there’s product-market fit, or the other way around finding a problem and then trying to build a solution to it.

I don’t think much research should really involve just saying, “I did this thing because I could.” That is very high risk and it’s hard to make a career out of doing that all the time because you’re generally not going to come up with anything. I’m going out and trying to figure out what the important questions are to be asking, both asking new questions and then checking with my PM to see if that was the right question to ask, and talking to my customers. It’s just now, instead of, my audience being the research community and a bunch of Ph.D. students who are reviewers and convincing them to accept my work, my audience is now customers and I’m convincing them to pay us money for it. I think that is a much more rigorous, much higher standard than getting a paper and then the reps. I had dinner with a customer earlier this week and they’re doing some really cool stuff.

They have some interesting problems. I’m going to get on a plane in two weeks and go down to their office for the day and meet with their team all day to learn more about this problem because I want to understand it and bring it back to my team as a question worth asking. It’s not a 100% of my time, but I think you should be willing to jump on a plane and go chat with an insurance company and spend a day with their machine learning team, learning from them and what they’ve done, and hearing their problems and seeing if we can do something creative to help them. That’s good research. If you ever sent me back to academia, that’s probably still exactly what I do.

Jon: One of my favorite things that you and I spoke about in New York some weeks ago was the existence of a high school track at the NeurIPS Academic Conference about AI. I wonder if you could share a little bit about that and what you saw, and what that tells you about the next wave of thinking in AI.

Jonathan: The high school track at NeurIPS was really cool, and also controversial for a number of reasons. Is this another way for students who are incredibly well off and have access to knowledge and resources and a parent who works for a tech company to get ahead further, or is this an opportunity for some extraordinary people to show how extraordinary they are and for people to learn about research much earlier than certainly I did and try out doing science? There are generational changes in the way that people are interacting with computing. This is something that my colleague Hanlin, who was one of the co-founders of Mosaic has observed and I’m totally stealing from him, so thank you Hanlin. Seeing companies that are founded by people who clearly came of age in an era where your interface to a computer was typing in natural language, whether it’s to Siri or, especially now, to ChatGPT.

That is the way they think about a user interface. You want to build a system? Well, just tell the AI what you want. On the back end, we’ll pick it apart and figure out what the actual process is in an AI-driven way. Build the system for you and hand it back to you. That’s a very different way of interacting with computing, but that’s the way that a lot of people who have grown up in tech over the past several years, a lot of people who are graduating from college now or have graduated in the past couple of years who are in high school now, especially that is their iPhone, that is their personal computer, is ChatGPT. It’s not buttons and dropdowns and dashboards and check boxes and apps. It’s tell the computer what you want. It doesn’t work amazingly well right now. Someday it probably will, and that day may not be very far away, but that’s a very different approach and one that is worth bearing in mind.

Jon: I want to switch gears a little bit and get to a technical debate that we’ve had over the years as well, which is about the mix of techniques enterprises and app developers are going to use to apply AI to their data. And of course, RAG and in-context learning have been exciting developments for years because it’s so easy and appealing to put data in the prompt and reason about that with the best model that you can find. There has been a wave of excitement, renewed wave of excitement, I’d say, around complementary approaches like fine-tuning and test-time compute, reinforcement tuning from OpenAI and lots more. I wonder if now is the moment for that from a customer perspective, or if you think we’re far ahead of our skis. What’s the right time and mix of these techniques that enterprises and app developers are going to want to use?

Jonathan: My thinking has really evolved on this, and you’ve watched that happen. We’ve reached the point where the customer shouldn’t even know or care. I want an AI system that is good at my task, and I want to define my task crisply and I want to get an AI system out the other end. Whether you prompt, whether you do few shot, whether you do an RL-based approach and fine-tune, whether you do LoRa or whether you do full fine-tuning, or whether you use DSPy and do some prompt optimization, that doesn’t even matter to me. Just give me a system, get me something up and running, and then improve that system. Surface some examples that may not match what I told you my intention was, and let me clarify how I want you to handle those examples as a way of improving my specification for my system, and making my intention clearer to you. And now, do it again and improve my system.

Let’s have some users interact with the system and gather a lot of data. Then let’s use that data to make the system better, and make the system a better fit for this particular task. Who cares whether it’s RAG, who cares whether it’s fine-tuning? The only thing that matters is did you solve my problem and did you solve it at a cost I can live with? Can you make it cheaper and better at this over time? From a scientific perspective, that is my research agenda right now at Databricks, but you shouldn’t care how the system was built. You care about what it does and how much it costs, and you should be able to specify, “This is what I want the system to do.” In all sorts of ways, natural language examples, critiques, human feedback, natural feedback, explicit feedback, everything, and the system should just improve and become better at your task, the more feedback you collect. Your goal should be to get a system out in production even if it’s a prototype as quickly as possible. So you start getting that data and the system starts getting better.

The more it gets used, the better it should get. The rest, whether it’s long context or very short context, whether it’s RAG with a custom embedding model and a re-ranker or whether it’s fine-tuning, at that point, you don’t really care. The answer should be a bit of all of the above. Most of the successful systems I’ve seen have had a little bit of everything, or have evolved into having a little bit of everything after a few iterations.

Jon: In previous versions of this conversation, you’ve said, “Dude, RAG is it.” That’s what people really want. There’s other things you can do to extend it, but so much is possible with RAG that we don’t need to look past that horizon yet. I hear you saying something very different now. I hear you saying, customers don’t care but you care and sounds like you’re building a mix of things.

Jonathan: Yeah, I think what I’m seeing, the more experience I get is there is no one-size-fits-all solution, that RAG works phenomenally well in some use cases, and absolutely keels over in other use cases. It’s hard for me to tell you where it’s going to succeed and where it’s not. My best advice to customers right now is try it and find out. There should be a product that can do that for you or help you go through that scientific process in a guided way so you don’t have to make up your own progression. For me, it’s now about, how can I meet our customers where they are? Whatever you bring to the table, tell me what you want the system to do, and right now we’ll go and build that for you and figure it out together with your team.

We can automate a lot of this and make it really simple for people to simply bring what they have, declare what they want, and get pretty close to what a good solution or at least the best possible solution will look like. It’s also part of my recognition that this isn’t a one-time deal where you just go and solve your problem. It’s a repeated engagement where you should try to iterate quickly, get something out there and get some interactions with the system. Learn whether it’s behaving the way you want it to, learn from those examples and go back and build it again and again and again and again, and do that repeatedly until you get what you want. A lot of that can be automated too. At least that’s my research thesis that we can automate or at least have a very easy guided way of going through this process to the point where anybody can get the AI system they want if they’re willing to just come to the table and describe what they want it to do.

Jon: What’s the implication for this sphere of opportunity of new model paradigms such as test-time compute, now, even open-source with DeepSeek?

Jonathan: I would consider those to be two separate categories. I was playing this game with someone on my team earlier today where he was telling me, “Yeah, DeepSeek has changed everything.” I was like, “Didn’t you say that about Falcon and Llama 2 and Mistral and Mixtral and DVRX and so on and so on and so on?” We’re living in an age where the starting point we have keeps getting better. We get to be more ambitious because we’re starting further down the journey. This is like when our friends at AWS or Azure come out with a new instance type that’s more efficient or cheaper. I don’t go and look at that and go, “Everything has changed.” I go and look at that and go, “Those people are really good at what they do and they just made life better for me and my customers.”

We get to work on cooler problems, and a lot more problems have ROI because some new instance type came out that’s faster and cheaper. It’s the same thing with models. For new approaches, it could be something like a DPO or it could be something like test-time compute. Those are probably not comparable with each other, but these are more things to try. These are more points in the trade-off space. I think about everything in life as a Pareto frontier on the trade-off between cost and quality. Test-time compute gives you this very interesting new trade-off, possibly between the cost of creating a system, the cost of using that system, and the overall quality that you can get. Every time another one of these ideas comes out, the design space gets a little bigger, more points on this trade-off curve become available, or the curve moves further up into the left or up into the right depending on how you define it.

Life gets a little better, and we get to have a little more fun. For this product and the system that we’re all building at Databricks, things get a little more interesting, and we can do a little more for our customers. So, I don’t think there’s any one thing that changes everything, but it’s constantly getting easier and constantly getting faster and constantly getting more fun to build products and solve problems. And I love that. A couple of years ago, I had to sit down and build the foundation model if I wanted to work with it. Now, I already start way ahead.

Jon: I love that. Jonathan, I’ve got some rapid fire questions that I’d like to use to bring us home.

Jonathan: Bring it on.

Jon: Let’s do it. What’s a hard lesson you’ve learned throughout your journey? Maybe something you wish you did better, or maybe the best advice that you received that other founders would like to hear today?

Jonathan: I’ll give you an answer for both. I mean, the hardest lesson I’ve learned is honestly, it’s been the people aspects. It’s been how to interact productively with everyone, how to be a good manager. I don’t think I was an amazing manager four years ago, fresh out of my Ph.D.. And my team members who have been with me that long or the team members who are with me then will surely tell you that. I like to hope the team members who are still with me think I’m a much better manager now. The managers who have managed me that entire time, who have trained me and coached me, think I’m a much better manager now. Learning how to interact with colleagues and other disciplines or other parts of the company, learning how to handle tension or conflict in a productive way, learning how to disagree in a productive way and focus on what’s good for the company.

Learning how to interact with customers in a productive way and a healthy way, even when sometimes you’re not having the easiest time working with the customer and they’re not having the easiest time working with you. Those have been incredibly hard-won lessons. That’s been the hardest part of the entire journey. The part where I’ve grown the most, but also the part that has been the most challenging. The best advice I’ve received probably from my co-founders, Naveen and Hanlin.

One piece of advice from Hanlin that sticks in my mind is, he kept telling me over and over again that a startup is a series of hypotheses that you’re testing. That kept us very disciplined in the early days of Mosaic, stating what our hypothesis was, trying to test it systematically, finding out if we were right or wrong. That hypothesis could have been scientific, it could have been product, it could have been about customers and what they’ll want, but it was turning that into a systematic scientific endeavor for me, made it a lot easier for me to understand how to make progress when things were really hard and they were really hard for a long time. I know that wasn’t a rapid-fire answer to a rapid-fire question, but it’s a question I feel very strongly about.

Jon: Aside from your own, what data and AI infrastructure are you most excited about and why?

Jonathan: There are two things I’m really excited about. Number one, products that help you create evaluation for your LLMs. I think these are fundamental infrastructure at this point. There are a million startups doing this, and I think all of them are actually pretty phenomenal. I could probably give you a laundry list of at least a dozen off the top of my head right here, and I bet you could give me a dozen more that I didn’t name because we’re all seeing great pitches for this. I have a couple that I really like, a couple that I’ve personally invested in, but I think this is a problem we have to crack. It’s a hard problem, and it’s a great piece of infrastructure that is critical. The other thing that I’m excited about personally is data annotation. I think that data annotation continues to be the critical infrastructure of the AI world.

No matter how good our models get and how good we get at synthetic data, there’s always still a need for more data annotation of some kind. And revenue keeps going up for the companies that are doing it. The problem changes, what you need changes. I don’t know, I think it’s a fascinating space, in many ways, it’s a product. In many ways like my customers these days, the data scientists at whatever companies I’m working with are also doing data annotation or trying to get data annotation out of their teams. Building an eval is data annotation. I mentioned two things, these are both my second favorites. I think they’re the same at the end of the day. One is about going and buying the data you need. The other is about tools to make it easy enough to build the data that you need, that you don’t need to go and buy it.

I have a feeling both companies have made a lot of progress on AI augmentation, or both companies on AI augmentation of this process. When I do the math on the original Llama 3.0 models, this is the last time I sat down and did the math. My best guess was $50 million worth of compute and 250 million worth of data annotation. That’s the exciting secret of how we’re building these amazing models today. That’s only going to become more true with these sorts of reasoning models where I don’t know that reasoning itself is going to generalize, but it does seem like you don’t need that many examples of reasoning in your domain to get a model to start doing decent reasoning in your domain. And that’s going to put even more weight on figuring out how to get the humans in your organization or to get humans somewhere to help you create some data for your task that you can start to bootstrap models that reason on your task.

Jon: Beyond your core technical focus area, what are the technical or non-technical trends that you are most excited about?

Jonathan: There are two, one as a layperson and one as a specialist. As a layperson, I’m watching robotics very closely. For all of the interesting data tasks that we have in the world, there are a lot of physical tasks in the world that it would be amazing if a robot could perform. Thank goodness for my dishwasher, thank goodness for my washing machine, I can’t imagine what my life would look like if I had to scrub every dish and scrub every piece of clothing to keep it clean. Robotics is in many ways already in our lives. These are just very specific single-purpose robots. If we can make a dent in that problem, and I don’t know if we will this decade or in three decades. Like a VR, I feel like robotics is a problem that we keep feeling like we’re on the cusp of, and then we don’t quite get there, but we get some innovation.

I love my robot vacuum. That is the best investment I’ve ever made. I got my girlfriend a robot litter box for her cats a few weeks ago. I get texts every day going, “Oh my God, this is the best thing ever.” And this is just scratching the surface of the daily tasks we might not have to do. I would love something that could help people who, for whatever reason, can’t get around very easily on their own to get around more easily, even in environments where they’re not necessarily built for that.

I have a colleague who I heard say this recently, so I’m not going to take credit for it, but the idea of things that make absolutely no logistical or physical sense in the world that you could do if you had robots. In Bryant Park right now, right below our Databricks office in New York, there’s a wonderful ice skating rink all winter. If you were willing to have a bunch of robots do a bunch of work, you could literally take down the ice skating rink every night and set up a beer garden, and then swap that every day if you wanted to. Things that make no logistical sense because they’re so labor-intensive. You could do that, and suddenly that makes a lot of sense. You can do things that are very labor-intensive and resource-intensive. So that gets me really excited.

Jon: From data intelligence to physical intelligence?

Jonathan: Well, somebody’s already coined the physical intelligence term, but yeah, I don’t see why not. Honestly, we’re dealing with a lot of physical intelligence situations at Databricks right now. I think data intelligence is already bringing us to physical intelligence, but there’s so much more one can do, and we’re scratching the surface of that. It cost Google, what, $30 billion to build extraordinary autonomous vehicles. The whole narrative in the past year has completely shifted from autonomous vehicles are dead, and that was wasted money to, “Oh my gosh, Waymo might take over the world.” So, I’m excited about that future. I wish I knew whether it was going to be next year or in 30 years. I spend a lot of time in the policy world, and I think that’s maybe even a good place to wrap up.

Before I was an AI technologist, I was an AI policy practitioner. That’s why I got into this field in the first place. That’s why I decided to go back and do my Ph.D.. I spend a lot of time these days chatting with people in the policy world, chatting with various offices, chatting with journalists, working with NGOs, trying to make sense of this technology and how we as a society should govern it. It’s something I do in my spare time. I don’t do it officially on behalf of Databricks or anything like that. I think it’s important that we as the people who know the most about the technology, try to be of service. I don’t like to come with an agenda. I think that people who come from a company and come highly motivated to make sure a particular policy takes place are conflicted like crazy, and will always come with motivated reasoning and can never really be trusted.

I think coming as a technologist and asking, how can I be of service and what questions can I answer, and can I help you think this through and figure out whether this makes sense? It’s a very fine line and you need to be careful about it. If you come in with your heart set on figuring out how to be of service to the people whose job it is to think about what to speak on behalf of society or to think on behalf of society, you can make a real difference. You have to be careful not to come with your own agenda to push something. A lot of people have highly motivated reasoning about, we shouldn’t allow other people to possess these AI systems or work with them, so you’ve got to be careful. You’ve got to build a reputation and build trust over many years. The flip side is you can do a lot of good for the world.

Jon: That is definitely a good place to leave it. So Jonathan Frankle, Chief AI scientist of Databricks, thank you so much for joining. This is a lot fun.

Jonathan: Thank you for having me.

Related Insights

    AI Agents Are Stuck in First Gear, but 2025 Will Change That
    AI and the $750B Opportunity in Software Development
    DeepSeek R1 and the Rise of Expertise-Driven AI

Related Insights

    AI Agents Are Stuck in First Gear, but 2025 Will Change That
    AI and the $750B Opportunity in Software Development
    DeepSeek R1 and the Rise of Expertise-Driven AI