Rockset, a leading real-time database vendor, is enhancing its database capabilities with advanced vector search and improved scalability.
Founded on the open-source RocksDB key-value store developed at Meta (formerly Facebook), Rockset utilizes an evolving technology that enables its real-time indexing features. The company has secured a total of $105 million in funding, including a recent $44 million round disclosed in August.
With the latest update, Rockset is fully launching vector search within its real-time database platform. This capability was first showcased in April and has received significant improvements over the past few months. Early adopters like discount airline JetBlue have already reported successful implementations of Rockset's technology. In conjunction with the vector search update, Rockset is also integrating with the popular LangChain tool for AI orchestration and the LlamaIndex data framework.
“Our vector search capability is now generally available and highly sophisticated. You can create similarity indexes using approximate nearest neighbor (ANN) technology at massive scale, with real-time updates on vector embeddings and metadata,” said Venkat Venkataramani, co-founder and CEO of Rockset.
Rockset's Real-Time Indexing for Vector Search
The competition in the vector search market has intensified in 2023. Vectors—numerical data representations—are crucial for powering large language models (LLMs). Numerous specialized vector databases, such as Pinecone and Milvus, have emerged, alongside established database technologies like DataStax, MongoDB, and Neo4j.
Rockset aims to stand out in the market by delivering real-time updates to vector search. As new data enters a Rockset database, both the database index and vector embeddings are refreshed in real-time, with latencies in the single-digit milliseconds range. This efficiency stems from a compute-compute separation model that isolates the resources for index building from those used for query execution.
“With most vector databases, real-time updates aren’t possible; they require periodic index rebuilding,” Venkataramani explained.
Accelerating ANN Vector Similarity Search
Vector search can be executed using various methods, including approximate nearest neighbor (ANN) and the more precise K Nearest Neighbor (KNN) techniques. While ANN provides approximate results efficiently, KNN calculates the exact top matches, which can be resource-intensive for large datasets.
Rockset employs both KNN and ANN strategies based on the specific query and dataset context. The SQL interface allows users to combine vector searches with metadata filters, with Rockset’s optimizer automatically selecting the best method for speed.
Due to its real-time update capability, Rockset’s ANN indexes reflect the latest data within mere milliseconds.
The Endurance of Vector Databases
At OpenAI’s recent dev day, the company announced new services that have the potential to reshape the generative AI landscape. OpenAI's GPT builder and assistant APIs have sparked discussions about the future of vector database technologies.
Despite industry speculation, Venkataramani remains confident in the ongoing demand for vector databases. He argues that large organizations with high security and compliance requirements cannot solely depend on third-party services for their AI initiatives.
"The need for vector databases won’t diminish, especially for complex datasets that drive Retrieval Augmented Generation (RAG) use cases,” Venkataramani stated. He emphasized that as AI applications evolve, the underlying infrastructure—vector databases—will continue to play a crucial role.
“I believe vector databases are here to stay, supported by a variety of emerging use cases beyond just chatbots,” he concluded.