In today’s digital age, the world runs on documents. From research reports and memos to quarterly filings and plans of action, documents are the unit of information organizations depend on. Yet, over 80% of enterprise data is trapped within those documents! Organizations have long struggled to unlock this unstructured data, leading to information silos, inefficient decision-making, and repetitive work.
With the advent of large language models (LLMs), we believe enterprises can finally realize the untapped potential of document data. That’s why, today, we are thrilled to announce Madrona is leading Unstructured.io’s Series A round alongside our friends at Bain Capital Ventures, Mango Capital, M12, Shield Capital, MongoDB, Harrison Chase (CEO of LangChain), Bob van Luijt (CEO of Weaviate) and others.
Applications like ChatGPT have demonstrated that LLMs are capable of human-like reasoning. However, that reasoning is limited to only the facts on which the models were trained. That limitation poses a challenge for users who want to leverage LLMs to make decisions based on specific data relevant to them. Much of this valuable data is often trapped behind firewalls, inside repositories in the private domain, and in file types LLMs cannot access. Until today!
Unstructured.io is building an enterprise-grade extract, transform, load (ETL) pipeline to bring unstructured data primarily trapped within documents to LLMs. First, with enterprise-grade data connectors, Unstructured enables organizations to safely and securely “extract” data from systems of record, including local file systems, object stores, and data lakes. Second, Unstructured allows developers to “transform” document data into an AI-friendly form factor using AI-powered open-source building blocks. The company today also released a single API that handles the transformation of over 20 file types. Third, developers can “load” data into a growing number of vector databases or directly into the context window so it can be leveraged by LLMs. With Unstructured, organizations can harness the power of their data like never before, empowering decision-makers, fueling innovation, and driving toward success.
Developers are already using Unstructured in production to build a range of applications, from chat-your-docs to large-scale enterprise search systems and personalized content generation tools. Unstructured has been downloaded over 700,000 times since April and has become an integral part of over 2,400 GitHub repositories.
It is rare in our business to find a founder who possesses a clarity of vision, deep technical insights, and a commercial instinct, but Brian is that rare breed of founder with all these qualities. We have been amazed by Brian’s ability to think from first principles and navigate the rapidly evolving Generative AI landscape working closely with customers in large organizations, the public sector, and the developer community.
At Madrona, we believe LLMs are creating a generational platform shift powered by a rapidly developing Generative AI stack. We also acknowledge and recognize the froth and hyperbole that comes from a new wave of technology, especially one like AI that holds so much potential. Unstructured represents a critical need in the market, without which we will never unlock the true potential of AI, and if executed well, will usher in a new era of intelligent applications that are AI first. We are committed to investing in companies enabling this technological wave and leveraging AI to solve real problems for their customers.
We are thrilled to lead Unstructured’s Series A round, announce our partnership with Brian and the rest of the Unstructured team, and welcome our friends from Bain, Mango, Shield, M12, and others to join us on this journey.
Thank you, Brian, Crag, Matt, and team, for giving us this opportunity. The best is yet to come!