IA Summit 2024: Models Are NOT All You Need

A Conversation Between between Rama Akkiraju (NVIDIA), Christian Kleinerman (Snowflake), and Chetan Kapoor (CoreWeave) — moderated by WSJ Reporter Tom Dotan.

It was a pleasure hosting the third annual IA Summit in person on October 2, 2024. Nearly 300 founders, builders, investors, and thought leaders across the AI community and over 30 speakers dove into everything from foundational AI models to real-world applications in enterprise and productivity. The day featured a series of fireside chats with industry leaders, panels with key AI and tech innovators, and interactive discussion groups. We’re excited to share the recording and transcript of the panel “Models Are NOT All You Need” with ‍Rama Akkiraju, VP of Enterprise AI & Automation at NVIDIA; Christian Kleinerman, EVP of Product at Snowflake; and Chetan Kapoor, CPO of CoreWeave — and moderated by WSJ Reporter Tom Dotan.

TLDR: (Generated with AI and edited for clarity)

During the IA Summit “Models Are NOT All You Need” panel, industry leaders Rama Akkiraju (NVIDIA), Chetan Kapoor (CoreWeave), and Christian Kleinerman (Snowflake) discussed how enterprises are leveraging large language models (LLMs) and AI to optimize operations, improve productivity, and address infrastructure challenges. Moderated by Tom Dotan of The Wall Street Journal, the conversation spanned from AI model commoditization to specific enterprise applications and infrastructure needs, underscoring the evolving landscape of AI-powered enterprise solutions.

  • AI as a Necessary but Incomplete Solution: Founders should see AI models as an integral yet partial solution for enhancing enterprise efficiency. Beyond models, building AI-powered solutions requires robust security, modularity, and workflow orchestration.
  • Infrastructure as a Differentiator: CoreWeave’s perspective highlighted the importance of optimized infrastructure for model deployment. Startups with a focus on infrastructure can differentiate by refining model performance, latency, and cost-effectiveness, which are critical for enterprise adoption.
  • Task-Specific Models with Domain Customization: Leaders emphasized the value of task-specific, fine-tuned models, which can offer more precise and cost-efficient solutions. Entrepreneurs can leverage smaller, domain-focused models for unique applications, aligning with enterprise needs without heavy reliance on large, generalized models.

Tom: Models They’re Not All You Need. If the Beatles were still around and they sold out, that would be probably one of their hits. All right, so before we get into this, let’s go on down the line of our great panelists here and just get their names and what they do because it’s all very impressive and very relevant to what we’re talking about here. So go ahead, Rama.

Rama: Hi, I’m Rama Akkiraju. I am a VP of Enterprise AI at NVIDIA. My mission is to transform NVIDIA’s own enterprise. I look at developer productivity, our employee productivity, business and IT operational efficiencies. So we build Chatbots, co-pilots for all of these things.

Chetan: Hi everybody. Chetan Kapoor here. It’s an honor to be here. I lead the product organization at CoreWeave, and many of you might not know about CoreWeave. We are taking on the seemingly silly idea of deploying a lot of raw infrastructure on the ground, so we actually build out data centers. We actually deploy compute storage and networking. Couple it up with a whole bunch of managed services to provide a platform to enable some of the largest AI labs in the world to build and fine-tune and deploy their AI models. I’ve recently joined CoreWeave. I’ve joined from AWS where I was running part of the compute business. And yeah, really happy to be here.

Christian: Good morning everyone. Christian Kleinerman. Can you hear me? Can you hear me? Now that’s better.

Tom: Much better, yeah.

Christian: Good morning. Christian Kleinerman lead the product team over at Snowflake. I consider myself a data junkie, have been building database systems, data management platforms for, I don’t know, 20 plus years. And at Snowflake, I think the broader initiative is how do we take the ease of use simplicity that we did originally on analytics and data warehousing to a broader set of capabilities, of course, including AI. How do we simplify the use and the leverage of AI for enterprises?

Tom: Fantastic. I’m Tom Dotan. I cover AI at the Wall Street Journal and my entire morning was taken up covering the OpenAI funding announcement today. All right, so we’re going to work our way from the bottom of the stack up to get really into the heart of this question here. And so we’ll start just with the foundational model and the question that came to my mind as we saw the $6.6 billion that are going into OpenAI and their foundational models and the business they’ve built on that, is the foundational model commoditized? Are we at a point now where it’s capabilities between one to another, whether it’s open source, any of the for-profit companies, is non-differentiated? And let’s go on down the list.

Rama: Yeah. Good question. Increasingly they are getting to a point, they’re asymptotically reaching that point of getting to be commoditized. But I think there is still a lot of variability that we see even among the big models or even you take a family of models of a similar size, you see variability in the way they respond and in the way the quality or hallucination or the latency. So there are still some differences. I’m sure over time they’ll even out as they are all training and using similar type of data and all of that. But I think independent of this question of whether models are getting commoditized or not, as we use them for solving real world problems, we’ll see that there is a lot more to solving real world problems than these models. So we can over time take them for granted, but that itself is only a very small part in the overall solution that needs to be solved. And I hope through the rest of this conversation we will get a chance to dig into what that is because this panel is about beyond AI models.

Tom: Yeah, Chetan.

Chetan: So I have a slightly contrarian point of view. So I think the general theme, as we’ve talked about, is like, “Yes, at some point, we might see a consolidation of model types, and they get commoditized.” We work with a lot of customers day in day out to actually optimize the performance of these models from a throughput and a latency perspective. So outside of the performance, we actually see on a lot of these tables that compare the model X versus model Y across a series of benchmarks. There’s a lot that happens underneath in order to optimize the performance, the latency and the cost per tokens of these models, that in our perspective actually starts to differentiate a holistic offering.

So I have a saying that typically, the infrastructure part is underappreciated in the market right now because there’s a lot of talk about how many trillions of parameters a particular model has and how well it is actually scoring. But fundamentally, when you’re actually deploying these models in production and serving the traffic that OpenAI and others in the space they’re seeing, it does come down to what level you’re pushing the infrastructure, to what level you’re actually driving down costs, and enabling a lot of this entire ecosystem to flourish because again, fundamentally those two things could potentially significantly limit the adoption. So there are aspects of productionalizing and model and making it accessible in the right form factor, the right throughput and the right cost that also differentiates the model where I think some of these key players will continue to differentiate.

Christian: Yeah, I’m more in the, it is definitely trending toward commoditization. I don’t think it’s there today, but the trend is for sure. I would compare it to what happened with Compute. You had Peter this morning talking about EC2, and you can say, “Well, Compute is reasonably commoditized.” That does not mean that innovation is not yet, it’s still happening. That better economics, better efficiency, that will continue all along. But what I think is the differences between models are starting to get smaller, especially relative to their ability to create value or deliver the promise of AI in the enterprise at least. So basically, what my take is models are ahead of the value that they’re delivering for enterprises from that perspective. Yeah, you could say it’s commoditized.

Tom: Okay, so this leads right into my next question then, which is for the enterprise. We’re several years into models being developed at large scale being sold to businesses. Are we aware yet of what the killer use case is within the enterprise for these AI models? If the value lies in their application? What are we seeing so far in terms of the most effective and high uptake usage of these models within the enterprise?

Rama: That’s what I do. So I can talk to several use cases that we are deploying within the enterprise. So code assistance and everything to do with engineer productivity is a huge use case of course, and there’s a huge set of things that one can do in there. It’s not just about code assistance in terms of code generation, code reviews, unit test generation, documentation generation. There’s a whole slew of things in overall engineering productivity, engineer productivity, and in fact, at NVIDIA we also do use large language models for our own chip design as well. They’re actively helping our hardware engineers and chip designers with unit test verification testing and all of those kinds of things.

Tom: They’re generative AI models using for chip enhancements?

Rama: Generative AI models, yeah. So that’s one family in the engineering productivity. Then there is the bread and butter, the big class of-

Tom: Sorry, quickly, are those models that you guys have designed internally at NVIDIA or these are licensed open-source or some other company’s models?

Rama: It’s our internal teams have built, it’s called chip NEMO, leveraging NVIDIA’s neural models that we have built and they’re fine-tuned for chip design domains specifically.

Tom: Interesting.

Rama: Yeah, and then there is a whole slew of other use cases. We’ve deployed them for SRE productivity where they’re actively helping SREs in diagnosing an ongoing incident. In terms of summarizing the incident up until now, what happened and how do you get help from where they show me the logs corresponding to this incident, show me the metrics, bring me who are the experts into this and what are some of the previous incidents that are related to this incidents and what are the resolution actions and so on. So there’s IT operational efficiency. SRE domain is a huge area where we are actively deploying and using generative AI and in business category there is a lot of use cases where we are starting to apply it specifically in talk to your data types of initiatives. What if scenario analysis, so say for example, our supply chain teams who are planning for the overall supply chain of chip design and all of that, they need to run a lot of supply chain, what if scenarios.

So instead of interacting with spreadsheets and all, they can just interact in natural language. What if this particular manufacturer delays their production by this much? What is the impact for this customer allocation and those types of things. Planners are starting to interactively use natural language-based chat interactions and, of course, enterprise search and overall employee productivity at intranet level and we use it for our bug management where people can interact with their bugs and get a summary of what’s going on. So across the board in supply chain finance, IT, legal, marketing, everywhere, in most of enterprise business functions, we are starting to apply generative AI.

Tom: Right. Chetan.

Chetan: So, a lot of the use cases that we’ve already talked about this morning, we are seeing similar trends from our customer base. Also, again, at a high level, you can break them up by optimizations for a company to their internal processes and mechanisms, and obviously many of them are reinventing their businesses from a customer-facing standpoint also. There’s one aspect that at least I want to highlight that is unique to us operating at the infrastructure layer is around how we are using generative AI to actually empower our customers. So when it comes to providing access to clusters that are tens of thousands of GPUs, we all know that these GPU clusters are fragile, they’re susceptible to failures and there’s a lot of inherent value in helping customers understand exactly what happened.

Was it something at the code level, was it something at the driver level, or was something at the hardware level? And being able to recover quickly from those failures and continue to make progress on pre-training or fine-tuning the specific AI models. So on our side, on the CoreWeave side, we are actually bringing out some GenAI-powered solutions specifically on the observability side and on the job detection and failure side. That actually makes it super, super easy for many of our customers to get down to the problem really quickly, address it, and get back up and running again. So at a high level, I think the enterprise applications that we are seeing then, I think, it’s plus one to the conversations that we’ve had this morning, plus the ones that Rama mentioned, but again, there’s a decent bit of innovation that is yet to happen on the infrastructure side that could potentially benefit from GenAI.

Christian: So definitely, we see some of the use cases already mentioned coding is front and center for many organizations, but I think the sweet spot use case, at least from the organizations and people that I talk to, there’s a bias on people want to talk to Snowflake usually about data, and it’s this notion of how do you democratize access to data? The conversation on how do I chat with my data, how do I make it more accessible to more people in my company? I think what business intelligence started doing 20, 30 years ago, AI now has the promise of take that another non-trivial step on, I was talking to a car manufacturer and they were saying, “I want to be able to have folks in the shop floor be able to ask questions or be able to get analysis based on pictures and images, quality assessment on the spot, ask questions, how does it compare?” That is almost unheard of. So it’s opening lots of use cases, but the common theme is how do I make my data more available and effectively accessible?

Tom: How would you characterize the uptake from customers at this point about paying the premiums that all of these AI models are charging or the model makers or the agentic tools to actually buy this stuff? I mean commit real IT budget towards this technology. We’re a year and a half into this thing. It doesn’t seem to be reaching scale across other companies, but maybe we’re just at the beginning. What would you say to that? I went to Christian.

Christian: Yeah, I’m happy to comment. I don’t think that the economics are yet what gets in the way. The number one aspect that is slowing down and in some instances shutting down projects is the trustworthiness of those solutions. I heard of someone that did what everyone did last year, they did a sample, the demo worked well, they rolled it out to a few users and the solutions didn’t honor some permissions and showed someone data that they were not supposed to see and literally turned into, “Okay, let’s have someone else build it. We shouldn’t just be rolling out our own.” The notion of hallucinations has gotten better with the implementation of the RAG paradigm, but security and permission honoring seems to be front and center. Economics come down to a second aspect. As of right now, most of the people that are here, they have budgets, they have allocated project teams because everyone has some AI initiative, but I would say that the trustworthiness and reliability of yes, this is enterprise data and it’s honoring our governance, that seems to be the number one blocker from the conversations I have.

Tom: Yeah,

Rama: No, I can attest to that from a CIO perspective, as somebody who’s actually last year we were ready to go deploy our first Chatbot across the enterprise and hit similar kinds of issues, and so with all the bruises and learnings, I can attest to that. One of the-

Tom: Tell me about the bruises, what happened?

Rama: Well, one of the biggest learnings from this is that these tools are very powerful and suddenly a lot of data that was hidden in the enterprise with very poor enterprise search are now visible and you’re not only able to see the documents, you’re able to summarize from all of those documents and get information right up front. So what that does is it exposes all the debt and overly shared sensitive data problems in the enterprise, which every enterprise has. It’s just that until ChatGPT came along, until these powerful tools came along, it was all a sense of security by obscurity. It was all obscure. So there’s a veil and you think everything is fine, but now you have these powerful tools exposing all this sensitive data that should not have been overly shared. So what you now have to do before you deploy this very powerful tool is really understand these derivative risks that these tools bring and embark upon enterprise content security solutions.

Which means you may have to use the LLMs themselves to classify your enterprise documents into their sensitivity types and automatically suppress the documents that you think are sensitive from ingestions into these RAG pipelines. Once they’re going to RAG pipelines, they’re out of control. As in now you’re able to ask all kinds of questions and everybody can see that data. I think it comes along with this powerful tool comes that responsibility of now going and fixing all of your overly shared sensitive content in the enterprise before you can deploy them on a large scale. On the other hand, if you’re doing it in a very small contained domain, like for example, within a particular set of data sources, then only accessible to certain types of people, then you can roll out some of those in a fast pace. But when it comes to doing across the whole enterprise, you have to be very careful.

Christian: I’ve seen a lot of that. They reduce the scope and start with the boundary that you know, control and then over time, but a lot of people last year started with the bigger one and that’s where this came from.

Tom: I actually want to go into that topic more. You brought up basically the problems with RAG when you start using retrieval augmented-generation. To supplement what often are smaller models, I mean that’s one of the promises of the small model plus RAG is that you can get fairly high-quality output and not need to run an enormous AI model paying all the bills, the API costs of in the cloud. Putting aside some of the issues that you mentioned with RAG and data security, are we seeing more enterprises opt toward that paradigm of a smaller model using RAG or any other augmentation to a small model to be able to perform effective tasks but not cost the same amount that a large anthropic or open AI model would charge?

Rama: Yeah, absolutely. We’ve done some experiments. Initially you start with RAGs and the biggest model that’s possible then that’s because as long as you get your retrieval right and you got the right chunks of data retrieved that are relevant, you pass it to a very strong LLM and it’ll do its job. But then if you are doing it across many different use cases in the enterprise, these bigger models may not be cost-effective. So you want to find if the smaller AI models that can be fine-tuned for these specific domains would do a better job. And in our experiments for some of the domains, I won’t generalize it across all, but in the domains that we have tried, which is in code complexity classification, we’ve seen that a smaller model that is fine-tuned even with synthetic data outperforms a bigger model. Not only the model that’s not fine-tuned for that class, but even the bigger model like Llama 8B that’s fine-tuned does even far better than Llama 3.1 that’s in the 70B class.

So that is very encouraging because what that tells you is that you can now start to leverage these smaller models for very special purpose sub-domains. You can break up the problem and probably be more cost-effective and also more latency effective. They both are equally important. If you in a real world, real-time situation in a Chatbot, say for example if it’s taking 14 seconds, it’s really not a very good user experience. So if you can use smaller models that are lower in latency, that’s also better. So we’re starting to see some encouraging trends there. Obviously we have to try that across to see if we can generalize and that generality holds or not.

Tom: And Chetan, I actually want to ask you this. One of the trends that I was interested in for a while was the ability for businesses to run a smaller model on premises, maybe even on device if possible, not be reliant on a cloud-based large language model and be able to do what you need to get done within the enterprise. Is that a real trend?

Chetan: It is, absolutely. So a lot of this comes down to as we were discussing, the cost of the inference, the latency, and obviously the context around the inference itself. So the trend we’re seeing in the market right now is again, it’s been like exemplified several times over the last few months that if and when it makes the most sense actually for an LLM or foundation model to run locally, that’s what the foundation model providers are going to push for and that’s what the customers are going to request. So if there’s tech summarization capability on my phone and Apple is going to roll out this capability with the 18.1 that comes out in a few weeks where it’s going to be beneficial for the end customer and also for Apple for that LLM to actually run locally because we as consumers are paying for the device, we’re paying for the power and Apple can actually provide that experience to us in a much more real time, low latency fashion.

So that bifurcation is already happening. You are seeing Meta also do something similar with their new Llama 3 models. And Google is also doing something very similar with Gemini running locally on devices itself. So even on the enterprise side, it’s going to come down to what specific workload that is top of mind and what are some of the expectations around the cost in the associated infrastructure, at least on that cloud side, what we’re seeing is there’s a tremendous amount of demand for hosting what I would call as tip of the spear or the frontier models, the biggest models, the most capable models. Because again, if you just look at simple math of how many parameters you have in that AI model type, what data type, they’re using you very quickly to a math where you need terabytes of GPU memory to host these models.

And you can only get that in a cloud infrastructure basis. So absolutely that diversification in the market is already happening where there will be collection, we saw this by the way on the computer vision side where the original computer vision models used to run in the cloud, but they were refined to a point where you can actually run on smart devices like Ring and other things. Amazon did something very similar with Alexa where after the initial wake work detection, the audio was actually streamed to the cloud. They actually did audio to text conversion on the device itself, saved them a bunch of money on network traffic and also improve the customer response time.

Tom: Got it. Let’s talk about reasoning models for a bit here, and maybe this is more your category chetan than the others, so please feel free to chime in. It seems to me like these things are fairly compute intensive if with every question it is running through multiple possibilities, these chain of thought processes seems like a lot of work. I know it’s a lot of work for me to think, I can only imagine for an AI, is this going to result in a huge new requirement of even more infrastructure to be built, even more GPU data centers being propagated across the Midwest next, extra oil wells.

Chetan: Yeah. So I think in order to draw that trend line, so LLMs have gone from fitting really well inside a single GPU to requiring eight GPUs in a server to host it. Now we’re getting to a point where the amount of HBM memory that’s available in a particular server type is actually not Sufficient to actually host some of these higher-end models, let alone the ones that are actually empowered or enabled to actually do chain of thought or reasoning capabilities itself. So, absolutely, what we are seeing right now is that the compute infrastructure to actually host these reasoning models is starting to look very, very similar to the infrastructure you need for training these AI models where a single node is not enough.

You actually need multiple nodes that are actually connected with high network fabric so that you can actually have a low latency Fabric communicating between them and provide the latency expectations and the throughput expectation that customers have. So for sure it is again, starting to push the boundaries of what is required and the platform from NVIDIA that’s coming out next year, GB 200 is going to be a great platform to host these models, and then we should expect other providers to follow suit with similar type of infrastructure pieces that are actually optimized for not only again, large-scale distributed training with hundreds and thousands of chips, but also infrastructure that is optimized for hosting the reasoning class of AI models.

Tom: And Christian, are you yet seeing value with reasoning models for data, for customer data when it comes to, again, whatever it is that customers decide they need this technology for? I mean, is there a differentiator with reasoning models that typical transformer models weren’t able to do effectively?

Christian: I think it’s early on, but the desire to explore some of the advanced use cases is totally there. Back to the use case of I want to have my own personal analyst in the same way that Brad was talking earlier, I want to have my personal assistant, the vision on the enterprise world is, “Hey, I have my own analyst and he can answer questions.” And if you think of the engagement with one such analyst, usually it’s a more complex set of questions and interactions. I’ve talked to a couple of organizations that are on the forefront trying to create more complex questions. How do you unpack it, create a workflow, and then not only answer the different parts of the workflow, but in many instances start to take action. So I would say it’s early on, but you see people starting to push the boundary of what’s possible.

Tom: Yeah, and I know none of you guys are individually in the business of building these AI models, but if you had to look maybe two to three years down the line, would you say the predominant large language model would be a reasoning model or are these dual tracks that the transformer models are going on and reasoning is another? Will there be a merging of the two? Where is this road going with this, what seems to be different paradigms?

Christian: I’ll say we have built some small models in NVIDIA.

Tom: Okay. That’s right. You guys have an open-source model. A lot of ex-Microsoft people are part of your team now. And, of course, NVIDIA has done quite a bit.

Christian: Correct. But I don’t know where the architectures converge. There are state-based models and all sorts of interesting different architectures. What I do think is that the notion of task specific models, models that are focused on one specific use case is a lot more applicable in the enterprise. And in the consumer world it’s harder to say, “Hey, here are 10 different flavors of ChatGPT. Which one do you?” No. But in the enterprise, because most of the value delivery is happening through applications, it’s easier to say, “Here’s a model, it’s tuned for this specific use case.” Oftentimes it’s a combination of models. So I do think that you’ll see combination of models, oftentimes smaller models and within those combinations of architectures.

Rama: I think one of the things that as a deep learning community, we tend to only focus on these bigger AI models that we are training that are getting better and better at exhibiting the properties of reasoning, exhibiting the properties of understanding natural language at an in-depth level. But I think it’s important to keep in mind there is in parallel mostly in the academic and in the startup world, there’s a lot of work also happening in neuro-symbolic AI where they’re trying to bring the symbolic reasoning from the more classical AI field to work more with deep learning, which we don’t fully understand how it reasons, but it is starting to show the properties of reasoning but not quite. So if you capture the neuro-symbolic aspects of how the world works, that there is gravity, there is air, there are some basic tenets that we are all wired to know when we operate and navigate the world, how do you teach that to these models? And I think it’s something to keep an eye out for as that work starts to merge with the deep learning work to take reasoning to the next level.

Chetan: What I would say, Tom, is that there are different attributes of how foundational models have evolved over time. So when GPT-2, GPT-3 came out, they were mostly text-based models, and then you very quickly saw multimodal models come into the forefront. Reasoning is going to be just one additional attribute of these AI models where again, different models are going to be good at different aspects of these capabilities and they’re going to be additional capabilities that are going to be added over time.

Tom: While I have you up here, slight divergence here. How much more infrastructure do we really need when it comes to AI? I mean, I feel like every week I’m seeing, let’s put aside the billions, tens of billions of dollars up to hundreds of billions that the tech giants are investing from their CapEx and building out their cloud infrastructure. We have BlackRock trying to raise $100 billion to build out Sam Altman, I believe wants, is it seven trillion now? I haven’t checked the latest, but this is, or he’s actually backed off that number. In any event, when is it going to be enough? How much longer will this build out go on for it to meet the requirements of this demand?

Chetan: So I think scaling laws are still holding, right? So we have seen this with GPT-4, and at least our personal take is that there’ll be at some point where one particular aspect of a particular model type will be actually good enough and you might not be stressing or pushing the boundaries of that particular attribute, significantly higher gen over gen. But there’ll be other aspects where you will want to have the AI model be capable of doing more, right? So again, I talked about just text capabilities, multi modality reasoning being another one where there are going to be aspects of different model types that’ll actually benefit with more data, more training, more refinement on the model architecture side. And we still believe that it’s still the early stages of actually evolving that tech.

Tom: Crazy. One last question, then I’ll turn it to the audience agents that earlier this year was the keyword that a lot of tech companies were pushing as the new iteration, not necessarily new iteration, but the most appealing way to manifest this technology. Where do you guys stand right now on the capabilities of agents? Are they good enough? Is it something that you see enterprises adopting at scale or is it still maybe one or two generations away from these things to be able to do what they’re actually promising they can do?

Rama: I can speak from solving real-world problems with LLM’s perspective and AI models are clearly not enough. They’re just the beginning. They’re a tool in solving a real-world problem and that always has been the case. You have to have proper security, you have to have proper modularization, you have to have task and workflow automation. There are so many things that have to come together to solve a problem for any given particular use case and a user problem. So agents are really a way to orchestrate and bring all of these aspects that need to come together to solve a problem.

So in that sense, to me, agents are really a good software engineering architectural principle. And yes, you have to write software that way and you have to be able to, in a modular way, connect to a task automation after you understand the intent and break down the problem and do some planning, probably orchestrate, but eventually you’re trying to solve a problem for the user. And agents are a mechanism. It’s a software pattern for really orchestrating all different parts that you can bring together to solve a problem. So that will become a necessary ingredient to building and solving real-world problems with LLMs and yeah for sure.

Christian: Super aligned with that. Back to the analogy of your assistant or your analyst, you need it to be able to take action and that’s where agents fit in. And I think that’s when the even bigger disruptive power of the technology kicks in. But aligned with what Rama said and what I mentioned earlier, the high order bid right now is things like security and privacy and actions raise the stakes even more. So the interest is there. There are many enterprises starting to play with it, but I don’t think that it’s mainstream yet.

Tom: Yeah, play with it, I guess is the key word there. I think we have time for a couple of questions. I know we’re a little over, but I talk too much. Does anyone out there have something like to ask? There’s one there.

Question: You talked about the reasoning models requiring different types of infrastructure. I mean more like the training infrastructure. So do you think the Grok-like infrastructure that’s been developed for inference may not work as well for reasoning models?

Chetan: Yeah, I can take that. So it will come down to the type and the size of the model. So there are some specialized solutions out there from a silicon standpoint that instead of having tons of HBM with a lot of bandwidth between the compute and the memory, they’re leveraging techniques such as having a lot of SRAM on the actual ship itself. That actually makes it super, super effective and high throughput to actually host these models in the actual compute architecture itself. So there’s going to be a place in optimization for those type of workloads, and you’ll actually see that some of those applications will tend to stand out. If you look at some of the benchmarking for certain model types for silicon that is actually using SRAM instead of HBM, you’ll actually see a ginormous delta in the throughput you’re able to get on these bespoke architectures.

But the challenge with that approach is again, it is going to be bound to certain class and category of models, and generally speaking, based on how quickly the landscape is evolving in terms of model architectures and sizes, that the general purpose AI models will still have a preference to actually run on HBM memory-based architectures that are generalizable, easy to program and evolve. But for applications that do hit scale, where there are again millions of customers hitting up on a particular service from an endpoint basis, there’s definitely headroom for optimization similar to what Grok and others are working on in the industry.

Tom: Right. Maybe one more. All right, I’ll end up maybe one last question for you all. If you were talking to any enterprise client out there, and I know you touch enterprises in different ways, would you say right now that AI models as they exist are an essential component of running your operations, it should be a critical part of your IT budget?

Rama: Yeah, I would say they’re necessary but not sufficient.

Chetan: Plus one. Yeah.

Christian: And same here. I would tell customers, you should figure out how do you improve productivity and improve business outcomes based on models. And I agree that AI models are an important part, but it’s only part of the solution.

Tom: Great. All right. Thank you, guys.

Related Insights

    The Rise of AI Agent Infrastructure
    IA Summit 2024: CIO Enterprise AI Strategy
    Highlights From the 2024 IA Summit

Related Insights

    The Rise of AI Agent Infrastructure
    IA Summit 2024: CIO Enterprise AI Strategy
    Highlights From the 2024 IA Summit