LanceDB Partners with Midjourney to Develop Advanced Databases for Multimodal AI Applications

Chang She, a former VP of Engineering at Tubi and an accomplished Cloudera alum, brings extensive experience in developing data tools and infrastructure. However, upon transitioning into the AI sector, She encountered significant challenges with traditional data infrastructure that hindered the deployment of AI models.

“Machine learning engineers and AI researchers often struggle with a subpar development experience,” She explained in an interview. “Data infrastructure companies don’t grasp the fundamental issues surrounding machine learning data.”

To address these challenges, Chang—one of the co-creators of Pandas, the influential Python data science library—partnered with software engineer Lei Xu to co-launch LanceDB.

LanceDB is focused on creating an open-source database software that supports multimodal AI models, which can process and generate diverse data types, including images and videos, in addition to text. Backed by Y Combinator, LanceDB recently successfully closed an $8 million seed funding round led by CRV, Essence VC, and Swift Ventures, bringing its total funding to $11 million.

“If multimodal AI is vital for your company's future success, it's essential for your AI team to concentrate on model development and aligning AI with business value," She noted. “Tragically, many AI teams spend excessive time grappling with granular data infrastructure issues. LanceDB establishes a robust foundation for AI teams, allowing them to focus on what truly drives enterprise value and expedites AI product launches.”

At its core, LanceDB operates as a vector database, which organizes series of numbers—known as “vectors”—that represent the meaning of unstructured data such as images and text.

As highlighted by my colleague Paul Sawers, the rising prominence of vector databases coincides with the peak of the AI hype cycle. Their versatility is key for various AI applications, from enhancing content recommendations in e-commerce and social media to minimizing hallucinations in machine learning models.

The competition among vector database solutions is intense, with notable players such as Qdrant, Vespa, Weaviate, Pinecone, and Chroma leading the charge, alongside established tech giants. So, what sets LanceDB apart? Chang cites improved flexibility, performance, and scalability as distinguishing features.

Chang further explains that LanceDB, built on Apache Arrow, utilizes a proprietary data format known as Lance Format, which is tailored for multimodal AI training and analytics. This unique format empowers LanceDB to handle billions of vectors and vast amounts of text, images, and videos, enabling engineers to efficiently manage various metadata associated with this data.

“Historically, there hasn't been a system capable of unifying training, exploration, search, and large-scale data processing," Chang stated. “Lance Format provides AI researchers and engineers with a singular source of truth and allows for lightning-fast performance across their entire AI pipeline. It goes beyond mere vector storage.”

LanceDB generates revenue by offering fully managed versions of its open-source software, enhanced with features like hardware acceleration and governance controls, showing promising business growth. Its clients include innovative companies such as text-to-image platform Midjourney, AI chatbot pioneer Character.ai, autonomous vehicle startup WeRide, and Airtable.

Chang emphasized that recent venture capital backing wouldn’t detract from their commitment to the open-source initiative, which currently enjoys approximately 600,000 downloads per month.

“We aimed to create a solution that simplifies operations for AI teams dealing with large-scale multimodal data by tenfold,” he said. “LanceDB will continue to provide a rich array of ecosystem integrations to ease adoption.”

Stay tuned! We’re launching an AI newsletter! Sign up here to begin receiving updates in your inbox on June 5.

Most people like

Find AI tools in YBX