The rise of large language models (LLMs) has fueled interest in embedding models—deep learning systems that convert various data types into numerical representations.
Embedding models are essential for retrieval-augmented generation (RAG), a key application of LLMs in enterprise settings. However, their potential extends beyond RAG. The past year has witnessed significant advances in embedding applications, and 2024 is expected to bring even more innovations.
How Embeddings Work
Embeddings transform data—such as images or text documents—into lists of numbers that represent their most significant features. Trained on extensive datasets, embedding models learn to differentiate between various types of data.
In computer vision, embeddings may highlight features like objects, shapes, and colors. In text applications, they capture semantic information related to concepts, locations, people, organizations, and more.
In RAG applications, embedding models encode the features of a company's documents, storing each document's embedding in a vector store, a specialized database for comparing embeddings. When a new prompt is presented, the system computes its embedding and retrieves documents with similar values. The relevant document content is then incorporated into the prompt, guiding the LLM to generate contextually informed responses.
This streamlined process customizes LLMs to provide insights based on proprietary information not included in their training data, addressing challenges like hallucinations, where LLMs produce inaccurate facts due to insufficient information.
Beyond Basic RAG
While RAG has significantly enhanced LLM functionality, the benefits of retrieval and embeddings extend far beyond simple document matching.
“Embeddings are primarily used for retrieval—and for often enhancing visualizations of concepts,” says Jerry Liu, CEO of LlamaIndex. “However, retrieval is much broader and can support various enterprise applications.”
According to Liu, retrieval is a fundamental component in any LLM use case. LlamaIndex is developing tools and frameworks to connect LLM prompts with diverse tasks, such as interfacing with SQL databases and automating workflows.
“Retrieval is crucial for enriching LLMs with pertinent context, and I expect most enterprise applications will require some form of retrieval,” Liu adds.
Embeddings also find utility in applications beyond document retrieval. Researchers at the University of Illinois and Tsinghua University have developed techniques that leverage embeddings to select the most relevant and diverse subsets of training data for coding LLMs, significantly reducing training costs while maintaining high quality.
Embeddings in Enterprise Applications
“Vector embeddings enable work with any unstructured or semi-structured data. Semantic search—and RAG is a form of semantic search—is just one application,” states Andre Zayarni, CEO of Qdrant. “Expanding beyond textual data to include images, audio, and video is crucial, and new multimodal transformers will facilitate this.”
Qdrant is already implementing embedding models across various applications, including anomaly detection, recommendation systems, and time-series analysis.
“With many untapped use cases, the number of applications is expected to rise as new embedding models emerge,” Zayarni notes.
More enterprises are utilizing embedding models to sift through vast amounts of unstructured data, enabling them to categorize customer feedback and social media posts to identify trends and sentiment shifts.
“Embeddings are ideal for enterprises aiming to analyze large datasets for trends and insights,” explains Nils Reimers, Embeddings Lead at Cohere.
Fine-Tuning Embeddings
In 2023, progress was made in fine-tuning LLMs with custom datasets; however, this process remains challenging. Few companies equipped with the necessary data and expertise manage to fine-tune effectively.
“There will likely be a flow from RAG to fine-tuning—initially utilizing RAG for accessibility, and then optimizing through fine-tuning,” Liu anticipates. “Although more companies are expected to fine-tune their LLMs and embeddings as open-source models improve, the number will likely remain smaller than those utilizing RAG unless fine-tuning becomes significantly easier.”
Fine-tuning embeddings presents its own difficulties, including sensitivity to data shifts. Training on short queries may hinder performance on longer ones, and vice versa. If trained on “what” questions, embeddings may struggle with “why” questions.
“Enterprises need robust in-house ML teams for effective embedding fine-tuning, making out-of-the-box solutions more practical in many cases,” Reimers advises.
Nevertheless, there have been strides in streamlining the training process for embedding models. A study by Microsoft suggests that pre-trained models, such as Mistral-7B, can be fine-tuned for embedding tasks using a compact dataset generated by a powerful LLM, simplifying traditional, resource-intensive methods.
Given the rapid advancements in LLMs and embedding models, we can expect even more exciting developments in the coming months.