Retrieval Augmented Generation (RAG) and in-context learning have been exciting developments in the field of AI since about 2020. These techniques promised to revolutionize how enterprises and app developers leverage customer data — allowing them to tap into powerful models without retraining or fine-tuning. By simply “feeding” the model relevant data during the prompt, companies could instantly apply AI to their own data. That’s faster and easier to get started from a customer’s perspective. Enterprises, app developers, and startup activity are heavily focused on the RAG pattern today.
But what got us started won’t get us where we need to go.
While technologists have long understood RAG’s limitations, many enterprise and app developer CTOs have embraced it as potentially the entirety of their AI+data architecture. This is understandable – RAG offers an effective path to applying AI to enterprise data, and it’s a logical place to start. But 2025 will reveal its limits: AI’s ability to reason about a customer’s data is only as good as the data the models were originally trained on. And here’s the catch: If your data doesn’t resemble the training set, even the most advanced off-the-shelf models fall short. The gap grows even wider as companies grapple with more diverse and rapidly changing data — and seek cost-effective, smaller models that sacrifice generality for speed and efficiency.
To be sure, RAG will remain essential for several critical reasons. It’s the best technique for handling new and rapidly changing data, as no amount of training can anticipate tomorrow’s documents. It provides more reliable grounding of model outputs in source documents than attempting to encode all knowledge in model weights. And perhaps most importantly, enterprise security and privacy requirements place hard limits on what data can be used for training – models cannot easily forget what they’ve learned (as evidenced by various model jailbreaks), making RAG the safer choice for handling sensitive enterprise data.
To truly unlock AI’s potential, businesses will need to build on RAG while integrating a broader spectrum of approaches: pre-training builds the foundation on broad datasets, mid-training introduces specialized data during base model development, post-training applies techniques like reinforcement learning, fine-tuning adapts models for specific domains, and test-time compute enhances reasoning capabilities with longer inference cycles. Each approach offers different tradeoffs between generalization, specialization, resource requirements, and processing time.
Of course, these approaches aren’t new — training and tuning have been powerful techniques since 2018. RAG provided a simpler starting point, helping organizations build initial AI architectures. Now that these foundations are in place, enterprises and app developers are ready for more sophisticated approaches. Leaders like Unstructured.io are making this possible by transforming complex enterprise documents into high-quality data these systems can understand.
The shift is already underway. Mastercard is fine-tuning models to understand their financial data schemas. Glean and Read AI are building custom models for each customer organization. Even Contextual AI, co-founded by one of RAG’s creators, is extending its architecture with so-called specialized RAG agents. Some customers are moving beyond testing even to training — Ello built a world-class child speech perception model by creating a data flywheel around their app — 60% of their users opt into sharing data to improve the AI.
For founders, here’s the good news:
First, as compute costs continue to fall, and tools like OpenAI’s Reinforcement Fine-Tuning democratize advanced training techniques, sophisticated AI architectures are becoming accessible to a broader set of practitioners. The success of companies like Glean, Ello, and Read AI shows that startups of many sizes can effectively train and tune their own models, especially when focused on specific domains, and then deploy those models as part of a RAG architecture.
Second, advances in test-time compute create a powerful flywheel effect. These techniques enhance model reasoning by spending more time on inference when deeper analysis is needed. This makes the returns from specialized training and domain optimization even more valuable — enhanced reasoning means a better understanding of domain-specific data and contexts. As compute costs continue to fall, this virtuous cycle becomes increasingly practical for production deployments.
Third, the shift toward open-source and (often) smaller models creates its own reinforcing cycle. Open-source models such as Deepseek make it possible to train and tune on customers’ own data and take advantage of their own domain expertise. Many customers choose to deploy smaller ~7B models in production for performance and cost reasons. For those smaller models, more of their data naturally falls “out of domain” — smaller models simply can’t maintain the broad knowledge of their larger counterparts. This increases the returns from fine-tuning and specialization, making domain-specific optimization even more valuable.
The convergence of these trends means no single approach will dominate. Instead, we’re entering an era where RAG becomes one tool in a broader toolkit, combining specialized training, sophisticated retrieval, and test-time compute optimization. The companies that enable and capitalize on this shift – while deeply understanding how these approaches work together – will be the ones who best apply AI to their customers’ data, helping enterprises and app developers deliver for their customers and make the future happen faster.
Madrona is actively investing in AI+data architecture, from infrastructure to applications. We have backed multiple companies in this area and will continue to do so. We would love to meet you if you’re building in this space. You can reach out to us directly at: [email protected]