It was a pleasure hosting the third annual IA Summit in person on October 2, 2024. Nearly 300 founders, builders, investors, and thought leaders across the AI community and over 30 speakers dove into everything from foundational AI models to real-world applications in enterprise and productivity. The day featured a series of fireside chats with industry leaders, panels with key AI and tech innovators, and interactive discussion groups. We’re excited to share the recording and transcript of the panel “Navigating the Model Landscape” with Asha Sharma, CVP of Product for the AI Platform at Microsoft; Ali Farhadi, CEO of the Allen Institute for AI; Saurabh Baji, SVP of Engineering at Cohere — and moderated by Newcomer Founder and Author Eric Newcomer.
TLDR: (Generated with AI and edited for clarity)
At Madrona, we believe great conversations spark great innovation—and the final panel from our IA Summit last month was no exception. Industry leaders Asha Sharma (Microsoft), Saurabh Baji (Cohere), and Ali Farhadi (AI2) dove deep into the evolving role of foundation models, sharing tactical insights that founders can act on today to shape their AI strategies for tomorrow.
Open vs. Proprietary: Know Your Trade-Offs
Open-source models like Llama have made waves, but true openness requires more than releasing weights — it’s about transparency in training and architecture. Founders face a choice between cost-efficient, customizable open models and enterprise-grade, secure proprietary solutions. Understanding this trade-off is critical to designing an AI stack that balances innovation with operational needs.
From Models to Systems: Building for Impact
AI is shifting from experimental to production-ready systems. Founders should think beyond deploying a model. Winning in AI means building systems that solve real business problems end-to-end — leveraging retrieval, orchestration, and user-friendly interfaces to maximize value.
The Multi-Model Era is Here
Gone are the days of “one model to rule them all.” Founders should focus on identifying the right model for the job rather than defaulting to the largest or most popular option. Multi-model strategies not only cut costs but also unlock performance gains in specific use cases.
Eric: Hi, I’m Eric. I Couldn’t be more excited about this discussion, foundation models, foundational to everything that’s going on in AI. The same time, I think, somewhat out of favor, at least in terms of business model, I feel like, at the beginning of the day, we saw applications and agents, we’ve gone through a period of excitement over RAG. I’m excited to get back to the roots, and I just wanted to start off, for each of your organizations, when you think about the models that you’re interacting with, describe if they’re proprietary models or if it’s a partnership strategy. What are the models that you’re working with most day to day?
Saurabh: Absolutely. Thanks for having me here. I’m Saurabh Baji, I run engineering at Cohere, and Cohere builds its own models from scratch. We have our own set of generative models called Command R and R+, we have our own set of retrieval and search models, so embeddings a pretty innovative model called V-Rank, and so on. All of these are proprietary. We definitely take pride in the fact that we have our own special sauce in them, but what’s great is we actually allow researchers to use these in the form of open weights on Hugging Face, so we want to make sure that the research actually continues on these and others are able to build off of these. But at the same time, for commerce, for actual enterprises, which is our focus, we absolutely give them the best in terms of what fits for them.
Eric: Asha?
Asha: Yes, I’m Asha, I lead product for the Microsoft AI platform, and so we work with 1700 different models, we are a platform of private open models, proprietary models, and so we are trying to help customers get the right model for the right job. We also have a team that builds small models internally called Phi and we do a lot of customization and training for all of the products that Microsoft has to offer as well. We are still very early in learning the ecosystem with everybody else, but we definitely believe in offering choice and then the tools to help solve different needs for different applications.
Eric: Are Open AI’s models the most popular on your platform?
Asha: Open AI is a very popular model partner for Microsoft
Eric: And Cohere is on your platform.
Asha: Yes, Cohere is a great partner for us as well. Honestly, it does just come down to what is the use case for the application.
Eric: Ali, tell us about your models.
Ali: Sure. At AI 2, we are after opening up the black box-
Eric: You’re the nonprofit guy on the panel.
Ali: We are a nonprofit. We are a research institute, we’re after opening up the black box, we are building models, training models, and also releasing them fully in the open, so people can actually build on top of what we’ve learned. The way AI evolves today, the pace is unprecedented, it’s hard to catch up. Also, the list of unknowns grows exponentially. We don’t know a lot about these models as a whole community. There’s only one tool that we’re aware of that solves that problem, and that is open communal research, and engineering, AI is here today because it was practiced in the open and we’re hoping to enable everybody to continue doing that.
Eric: How much is your goal, “We’ll build the best models if we can and that will make everybody else’s models better,” versus just going directly to helping the world build the best models? Is your job to build the single smartest model or what’s your north star?
Ali: Yeah, I don’t believe in the world in which there exists one model that serves all the needs. I think this was the story maybe three years ago, four years ago. All we learned is that the models are evolving, there are large number of models. We just learned, 1700 models, that’s quite fascinating. We do believe in the future where you have an ecosystem of large number of models with different characteristics serving different needs. However, we would argue and push for the future through the open models, where you can actually see through the whole stack. We’ll win that space, enabling all to truly build on top of that. With the closed model, even with the model trained behind closed doors, but the weights are tossed over the fence, there’s only so much you could do. They’re great, they serve a certain purpose. We’re after building models that opens up the whole stack, so we learn, so we can develop, and argue that you could actually build the best models this way.
Eric: Saurabh, aren’t they ruining your business model? I mean, he’s like, “Make models open source, don’t make them the foundational block.” She’s saying, “We will build a platform and you can use any of them.” How does Cohere say, “No, use our great models,” and how do you think about that?
Saurabh: Yeah, I tried to corner both of them at dinner last night, didn’t work. I think, like what you said earlier, about having the right model for the right use case, that’s exactly what we are doing. Just take GPT-4, all right? You mentioned 1700 models. Obviously, GPT-4.0 is still the most popular one on their platform, but it doesn’t solve everything. We routinely get really large enterprise customers who have tried GPT-4.0, it doesn’t necessarily solve the use case in the best manner possible, and so that shows that there’s still a significant amount of room for the right model for the right use case. We just announced our partners with Fujitsu a couple of days ago, they had done the same thing-
Eric: Japanese is now maybe-
Saurabh: Exactly, yeah. You have different dimensions of strength, right? Multilinguality is a huge dimension for us. In conjunction with Fujitsu, using their proprietary data to train on top of our models, even with a 35-B model, which is order submitted to smaller than the several trillions of parameters that you have with Open AI, for example, we were able to handily beat GPT-4.0 on the actual characteristics that the customer needed. I think that’s really-
Eric: What are some of those characteristics?
Saurabh: Japanese, for example, has a very specific formal structure, especially when you conduct business and honorifics are really important. They’re just one out of several hundreds of things that they were looking for, and so it brings up the question of good evals as well, because all we see normally in public are benchmarks. While a lot of models do very well on benchmarks, they don’t necessarily pass the test when it comes to actual use cases. That’s been the case for us, we focus solidly on the opportunity with enterprises on what matters most to them and then we make sure we nail that.
Eric: All right. Especially on this end of the panel, this diversity of models, I think we all accept that there are a lot of different models for a lot of different use cases. Okay, smartest, most cost-efficient, I don’t know if you guys will participate in this, riskiest, like a model where you’re taking some risk by, and then if you have a fourth of some category you want to throw out, happy. I’m stalling, so you can think for a second, but I’m going to ask each of you, which smartest, most cost-efficient, riskiest grab bag models you think are interesting right now and people should be paying attention to? If it’s true that it’s not ChatGPT to rule them all. Ali?
Ali: Sure. Instead of naming models, let me actually talk about trends, because that’s more and more informative. There’s just so many models out there and naming them won’t buy us anything.
Eric: Well, people need to know what to build around.
Ali: Absolutely, and I think that’s a hard problem. I think you were alluding to the fact that evaluation is, quote, unquote, “Bogus” today. It’s as non-scientific as it can get. It’s really hard to say, “My model’s better than yours,” or not given those benchmarks. There’s a lot to be developed in that space before we can actually even answer which models are better. The science is missing, the practice is actually… Principle practice is completely missing. We’re seeing tables out there every other week where compares apples to oranges, and you would just laugh at those tables. One of those columns are benchmarks and data sets we developed, but used in a grossly wrong way, and there’s just not much we could do about that whole space. Given that caveat, I think naming what is better than one is actually probably a lost cause. But let me talk about the trends.
I think there’s stuff that we’re seeing which are phenomenal and, maybe a couple of years ago, if somebody would’ve told this to me, I would’ve said, “No, that’s not going to happen.” One is the gap between closed and open is shrinking rapidly. The other that we’re seeing is that the gap between small and big is shrinking rapidly, and it’s quite phenomenal to watch, especially as you try to connect these models to businesses, use cases, as you’re thinking about them. The third pattern that I… A more recent one, again, an old school machine learning principle, but now evolving back is… I call it “Less is the new more.” We’re seeing that now models being trained on data sets that are a fraction of the original data sets that we trained with.
These are all natural new learnings. As we evolve as a whole industry, we’ll learn new things, but if I want to put all of these three things together, then the trend actually gets to a certain point, because at the end of the day, it’s all about adoption and reducing the friction to adopt a model. If you have a enormous model that is a beast for me to fine-tune, that is a beast for me to serve, it doesn’t even fit into the memory of one GPU box, and I have to do all sort of tricks to be able to run it, that just adds friction, lowers the volume of adoption, and all that jazz. What I’m excited about is this whole new trend evolving that way.
Eric: Asha, are you going to-
Asha: Am I going to answer? No, but I’ll talk about that as well. It’s interesting. We are starting to see most of our customers graduate from experimentation into production workloads, and most of them are using more than one model for their use case. I think it’s more than just what is the right model for the right job to be done, most times, you need multiple models, a small model if you want to optimize for costs or optimize for running on a device or a large model that’s specific to some language or specific to time series, math, or whatever it is. I think we’ll continue to see new models popping up everywhere and I think they’ll be great for certain use cases, but almost all of our customers who are in production are using multiple models and they’re fine-tuning them or they’re distilling them or adapting them in some way, so they can actually get the performance quality speed that they want.
Eric: Closed source, smarter. Open source, cheaper. Do we accept that framework or are you seeing cases where that’s not true?
Asha: No, I mean, I think Llama just put out some new models, and I think they’re extremely performant on a lot of dimensions, so I think that’s a superficial comparison to say, “If you’re open, you’re less performant.” Certainly, there’s different pros and cons to being open, but I think we’re really early days in the open source movement, and if you reflect back on what happened in the early days of open source, that didn’t happen overnight. Open weights is a big step. I think we’ve seen that democratizing the weights has given the community a lot more advancements, and so we’ll continue to see what happens and what happens with licensing and all of that.
Eric: Are there models that you work with the most? If a customer comes to you and says, “Listen, we’re going to be multi-model,” are there models you see the most? Is it about something cheaper for some of the use cases, or how do you approach that at Cohere?
Saurabh: Great question. We have different sizes of models, and so what fits best for a particular use case can be… It might be a 35-B, it might be 100 Plus-B. I think more so than just seeing what’s available in terms of different models, I think you have to go beyond the model. I think like it was said in the opening session, it’s not just the model itself, it’s a bit to what Ali said as well. Can you actually work with a given model in your environment? What does it actually mean for your specific set of constraints? Our customers, for example, often need deployment in very constrained environments completely privately, and so a model that just doesn’t fit on premises or even in a virtual private cloud, it’s just not useful to them, so it has to be that we bring our model-
Eric: It’s, “We can be on premise, we’ve thought about security, you’re not running wild.”
Saurabh: Exactly, and we have to bring our model to the data. I think that’s been the most overlooked part so far, it’s been in the whole either one-model-to-rule-them-all or try and get the… You mentioned Smartest for example, I think Smartest has, up to now, been basically the largest model available, but do you actually need that and can you actually use that in the given context? You often want something that works very well with your data, so you can add your own secret sauce on top of it as a customer, rather than simply saying, “The model has to work exactly the way I want out of the box.” That’s been our focus. We do see customers using multiple models. Asha, you mentioned the fact that people had been doing way more POCs before, and now they’re transitioning to production. We definitely see the same thing.
Last year was more a case of death by POC. Customers would try tons of different models, often running the same potentially not-great set of evals or use cases on all of those models, and you still wouldn’t get anywhere, because at the end of it, you still don’t have the perfect model for your use case. I think it’s equally important to have a trusted partner, somebody who actually knows the model inside out, who can actually help you customize the model to your specific needs as well, and then all of the other constraints tend to pale in the background.
Asha: Our hypothesis is that’s not going to be static. If you think about applications in mobile, you can A/B test your way through almost any decision for what features customers are going to use the most. I don’t see a reason why that wouldn’t be true in the future of where, in production, you’re able to A/B test different models for different use cases and you’re able to run a certain model based on what you’re optimizing for without having to rewrite your application.
Eric: Does Microsoft think models have a lot of pricing power? How much do you see certain models being able to really price above market because of what they offer?
Asha: I think, just generally speaking, there’s a huge movement to democratize these models. If you think about it, like yesterday, Open AI released their real-time API, which is also available on Azure, and they’ve decreased prices 99% over the last two years to make sure that that’s accessible. I think we’re seeing compute costs have come down over the last 10 years, 44 Accent. I think, just generally speaking, it’s like Moore’s law, everything is coming down in terms of price, everything’s going up in terms of performance right now.
Eric: On that trajectory, yeah, how much more do we think the models are going to improve over the next year, two years? Do you think there’s, I don’t know, the $10 billion for companies to put behind it? Do you think it’ll be this continued line of more and more spend? I mean, you hear fears about energy consumption, given what it would take to build some of these models. Ali, do you see the investment appetite to build those models out there, or are people going to get smarter about how they make improvements going forward?
Ali: Yeah, let’s go back a few years in time. There was a point in time where we all believed that, even if we didn’t say it loud, that models equal solutions. The fact that prompt engineering exists, the fact that, even to some extent, RAG exists to me is an indicator that we all assume that, “I have a solution, let me just hack around it to convert this solution to do what I want to do,” and I think based on what we’re hearing from both Asha and Saurabh is that, now, we’re actually moving to this space of, “Let’s build a solution.” The whole notion of model is also getting a little more fluid than we thought. There was a point in time where we had a model is a thing that, “Once I’m done pre-training, I’m going to declare I have a model, and now if you want to fine-tune or RAG it, you will do with it,” but that whole notion has also evolved over time.
We don’t have a clear boundary between pre-trained, post-trained, now we mid-train. With pre-trained, mid-trained, post-trained, with annealing in the middle of them that fuse these stages together, I think we’re seeing more and more of integration of these discrete phases that we have for a model. If I want to combine that with this notion that models are not solutions, I think we’re now already in this phase that, “Let’s find a solution.” We go around the same style. Naturally, we’re going to start hacking around a model to turn it into solution. Maybe I actually have a large number of models and some logic that connects these models together, and eventually, move into more principled approach or, “Let me actually learn a solution.”
Eric: are you saying most of the improvement’s going to come from refining and building around smart existing models versus expecting those models to make a big leap?
Ali: No, no, I don’t even say that. I actually don’t even know how to think about improvements in the absence of evaluation, so let’s get that with a grain of salt, but to solve a specific problem, I think the common practice for the next few months or maybe a year would be, “Let me just find which model performs better on my own benchmark,” which you don’t know about, and then let me just put some logic around it, convert it into a, quote, unquote, ‘System.'” If I want to be a little more creative about it, call it an agent and now I have a solution around that. Then, after that, we will learn again that, “That’s great, it does this much, but I actually have all of these corner cases where my frozen logic might not actually work between these models,” and now maybe we can actually move to a space of learning all of those.
Eric: Asha, you guys have the money. How much more, do you think, investment there will be behind the growth of foundation models?
Asha: I think you can expect more intelligence from foundation models and you can make them more useful without more intelligence. I think both of those things are true. The latest class of models that have been released, folks say they’re like PhD students, whereas before, they were high school students. Certainly, we have not reached AGI yet and there is a pathway to that that I think the labs will continue to go after. But I am impressed, I build an agent or assistant every single week to do different things, and I’m super impressed with now how they can actually achieve different tasks and automate different systems of my life or my work.
Eric: Like what?
Asha: I have an assistant to help me calendar for example.
Eric: Instead of buying some application, you’re building your own?
Asha: Yeah,.we have a studio called Azure AI Studio, and so you can just configure an assistant there and connect it into… You can give it certain skills and superpowers, things like that. We have agents available on Workspace and GitHub where they can help you code, things like that. The interesting thing is I think we’re just starting to scratch the surface of what an agentic society or a multi-agent society actually looks like, and I think there’s exponential amount of productivity wins there as well. Again, I think it’s, yes, the models will get more intelligent. Yes, they have to perform on your data and be benchmarked against what you care about, but I think, also, we can help program the models and agents to perform more complex tasks and I think we’re still really early in doing that.
Eric: Saurabh, how much do you think your customers are coming to you and saying, “We believe that your technology is ready to do what we need today,” versus they’re like, “Wow, we’ve been watching this rate of improvement and we know, if that continues, then we’ll really need it, and so we should sort of get set up now”? What’s the split and what are the behaviors of your customers?
Saurabh: Great question, yeah. How long do you actually wait? Is there a specific point at which you’ll get exactly what you need? I don’t think that’s quite what we’ve seen. Customers have been trying things out already for quite some time, they’ve tried various options, different models, for example. What we see is a movement more towards systems and applications, exactly like the other panelists were saying. I think the first part is, how do you really go from just, “Here’s a model, do what you will with it,” to actually building something that helps you get to your results quicker? I’ll give an example. RAG has been the rage for the last year or so, our models are very good at retrieval augmented generation. What you’ve seen, to a large degree, in the industry, and so what customers face is just put your documents in the context.
Well, it’s okay if you have a few PDFs or a few thousand PDFs maybe, but what do you do when you have customers like us? We have tens of billions of documents, all of which might be relevant for a particular flow, for a particular analysis, for a particular you workflow use case. We focus equally on the retrieval site, so we are building an end-to-end solution that actually lets you just point to a data source. It might be, at a very large scale, like enterprises need. How do you actually parse all of that data? Completely different data types, completely different structures, completely different domains. How do you parse those? How do you chunk, how do you actually index? All of the non-glamorous underlying work that has to happen in order to actually get really good access to that information, and then you actually pull that, only the most relevant portions out of those, into the context for these models.
That’s not something that you get directly out of a generative model itself, or you could to a degree, but it’s going to be extremely expensive even with the low per token prices. If you’re trying to throw 10 billion documents into the context, it’s not going to work.
Eric: They have an application they want to build and you want to get them all the way there.
Saurabh: Exactly. You want to get a system, something that gives them an end-to-end solution either that caters to what they need directly or to build on top of, so that could be just a middle layer, like the end-to-end solution that I just described for retrieval, or it could be something even further on top, as an end-user application. One of the most popular industries, from a customer perspective, for us is financial services. You’re trying to work with several different data sources, the kind of multi-step workflow that you have requires using several different tools, it requires analysis of what you get back, and then deciding what the next step is going to be, so the plan is quite different based on what happens for a particular instantiation of that flow. Because of that, it’s equally important to give that analyst a very easy way to create a flow, to be able to adapt that flow, so if the model comes up with a plan, how do you actually edit that plan, so that you tell it, “This is the step that you actually didn’t quite get right. Here’s exactly what I wanted”?
How do you do it in an easy-to-use manner? These are not developers, they’re not going to actually write an application from scratch, and so I think that ease-of-use factor is still missing. That’s something that we’re working towards.
Eric: I want to move to a direct open source conversation. Well, I’ll just say, is Llama open source? Or what is open source and what’s the point? I guess, in analyzing whether you consider Llama, what many people believe is the most powerful open source model, is open source, I want you to really, Ali, get to… What’s the point of open source and what do we need in models to be open source?
Ali: Absolutely. This is a very dynamic topic, the whole industry is learning about what to call what, but here’s how I think about it. The term “Open source” for software when it came out, the principle was I should be able to build on top of what you did, I should be able to learn from what you did, and maybe also fork in the middle of the way, because I just want this much of what you did. The key is, can I build on top of what you did and how much do I know about what you did? Because the source should be open, so I need to know exactly what you did. When we talk about models, obviously, models or AI ecosystems are not software alone, so all of these concepts are… They don’t match one-to-one.
Eric: Underline data, weights.
Ali: There is a lot involved ,and we could fight forever about these terminologies, what to call what, but it doesn’t matter at the end of the day. There’s certain amount of… I think people in the media call it “Open washing” that’s happening, which is actually quite interesting, because it speaks to the power of open source.
Eric: Do you embrace that term or are you reject-
Ali: No, no, I’m just quoting people in media.
Eric: Other people, they say it.
Ali: When I first learned about it, I was like, “Wow, that’s an interesting concept,” but the fact that that concept exists and people talk about it speaks to the power of open source and what it brings. The minute that we truly empower people to build on top of each other’s work, then that’s the moment that we just say, “Okay, we are now open.” There are some level of things that I could do if you just train a model behind closed door and give me the weights, and I should be able to fine-tune it. Soon, I will learn that, beyond certain amount, if I fine-tune it, I’ll see regression and my model starts to behave in a weird way, therefore, I have to do something about it. Recent results show that, not from us, from the literature, that if you actually mix pre-trained data with your fine-tuned data, a lot of interesting thing happens and your models won’t regress as much as you would expect.
Eric: Is Llama open source?
Ali: Under the definition that I could understand it and use it? No, because there’s very little we know about Llama. It’s a powerful model, we love it. As Asha said, actually, I think the move to the open weights is a great initiative, it’s a good movement, in the right direction, it is just not open source, because I don’t know what went to do it, I don’t know how I was trained. If I actually want to just fork off of it halfway, the forking is the principle of open source. I want to fork from your work. Halfway, how do I do that?
Also, we are basically at the mercy of model trainers, because there’s a long list of decisions that you have to make when you train a model that there’s no way back from those, and once you use those frozen models, you are at the mercy of those model trainers. If, for whatever reason, they decided not to include a certain distribution when they pre-trained, you’re pretty much doomed. Whatever you do post-training, it won’t get you what you want it to. To me, true open source means empowering people to truly understand and adjust and adapt based on your need.
Eric: Asha, I want to change the conversation. Super interesting, you building personal applications, but much bigger question. What are the applications that you’re most excited about that these models are building, that you see customers using successfully? If you just had to give us a couple applications that you think the models today are capable of building, what would they be?
Asha: Maybe I’ll talk about some scenarios, because usually, there’s applications, but there’s business processes that people are using AI for at scale, so there’s a few that we’re seeing quite a bit. One is just cogeneration, like the future of developing applications and not just how you program, but how applications are generated and built. We’re seeing a lot of investment in that space, we’re seeing the models become particularly great at that. Obviously, Microsoft is doing a lot with that when it comes to GitHub and our VS Code tools. Another one is around supply chain. If you think about the complexity and the data that goes into running a supply chain for hot and cold and for hardware, there’s a lot that’s happening in that space, and so we’re pretty excited about that as well.
Also, customer support. When you think about multimodal and what that enables and being able to dynamically reduce costs and increase the customer experience on every single call or contact point, we’re seeing tremendous savings from companies that are investing in those use cases. There isn’t, at this point, a business process that’s not being touched by AI, which is pretty spectacular, and I think we’re in different parts of the curve in terms of success on that or not, but certainly, I believe almost every business process will have some AI-assisted use case that’s touched over the next couple of years.
Eric: I’m going to open up to questions after this question, so get thinking. Saurabh, I mean, data could be its own panel, but I want to ask specifically, are your models, our models generally, have we consumed all the data in the world and, therefore, they can’t get smaller on the basis of data? Or what are you seeing in terms of data for training these models and what that says about the possible future improvement?
Saurabh: Good question. I think that goes to some degree towards what Ali was saying earlier. Pre-training and post-training are now blending quite a lot, so I think the way models developed early on was certainly with a large amount of open data or maybe publicly-available data, I would say. How open exactly that was or not is a different story, but it started off with publicly-available data. To a large degree, yes, a lot of that data has been used, but that data certainly isn’t the highest quality, or the way it was processed and used was certainly not in the most optimal way early on.,So there’s been a lot of improvements in just using-
Eric: Taking out useless data is becoming a priority for a lot of people?
Saurabh: Yeah, so how do you actually get the most useful bits out of it? How do you process it so that they actually do the right things for the model? But then going beyond that, there’s certainly a large part of getting the right private data, getting the right data with the right procurement process, with the right partnership, with certain entities, with the right customer interaction as well, so that you can get the most useful data in. The other part is also synthetic data, because once you know what the model is really good at versus what the model is not good at, if you understand the distribution really well for the use cases that you’re interested in, what data do you need in order to augment the model a lot? We certainly use synthetic data quite a lot. Methods for generating synthetic data have only gotten better in the last year or two years or so, and so it’s certainly a whole basket of ways that you can actually use to get the right data.
I think improving the quality really makes a huge difference, especially depending on the stage at which you’re actually applying the data, and so with literally a few thousands of examples, you can actually nudge the model in the right direction for the right use case.
Eric: Yeah. A question? Yeah, right here. Do we have a mic? All the way in the front? All right, yeah.
Question: Thank you all for being here today. My name is Paula Satchens. I’m a startup founder of a startup called Certify AI, where we’re building technology to [inaudible 00:31:28] deep fakes for women and girls, and one question that I have is, what approaches do you believe are most effective in mitigating bias in AI models? Especially at scale for startups.
Eric: Approaches for mitigating bias in AI models, especially at scale? Who wants to jump on that first?
Saurabh: I can go, I think everyone probably has great insights on this. We certainly do a lot of red teaming with our models, for example, to suss these out. Figuring out what data actually goes in, ensuring that there’s the right data provenance, as well as the right data quality certainly matters a lot, so part of it is what goes into the training itself, part of it is how you do the training, and then the last part, certainly at the point of application is, what do you actually put in terms of guardrails for the model? You asked a little bit about risk earlier. Our models, we don’t just tamp down output from the model entirely. Our models today are primarily text, they don’t generate a video, for example, or images, so the risk might be a bit different. But in terms of bias, we certainly perform testing, but depending on the use case that you actually have for the model, if you’re actually trying to generate data that would be useful for detecting bias, for example, you certainly want the model to not stop in terms of generating the data that you want.
The right configuration, the right ability for the user to actually figure out what is the right setting on the dial for them, that’s what we focus on rather than completely stopping the model from generating any kind of data. We have certain guidelines in place already, terms and conditions and so on, but also specific things we stop. Those are on the more extreme side, but I think beyond that, we leave it to the user.
Eric: All right, that’s our time. It’s amazing how far we’ve come in getting models into production, the competitive dynamics, and delivering a full system to the customer. Thank you so much for taking the time to talk with me, I think we have a break after this. All right, thank you.