San Francisco-based Datasaur, an AI startup specializing in text and audio labeling for AI projects, has launched LLM Lab—a comprehensive platform designed to assist teams in creating and training custom large language model applications similar to ChatGPT.
LLM Lab offers both cloud and on-premise deployment options, enabling enterprises to build internal generative AI applications while mitigating risks related to business and data privacy often associated with third-party services. This gives teams greater control over their projects.
“We’ve created a tool that addresses common pain points, supports evolving best practices, and embodies our design philosophy to simplify the process,” said Ivan Lee, CEO and founder of Datasaur. “Drawing from our experience building custom models for internal use and clients, we developed a scalable, user-friendly LLM product.”
Key Features of Datasaur LLM Lab
Since its inception in 2019, Datasaur has been advancing a robust data annotation platform for AI and NLP. The launch of LLM Lab marks a significant evolution of these offerings.
“This tool goes beyond our traditional Natural Language Processing (NLP) focus, which includes methods like entity recognition and text classification,” Lee explained. “LLMs represent the next generation of language technology, and we aim to be the industry’s go-to solution for text, document, and audio AI applications.”
Currently, LLM Lab provides a unified interface for various components of LLM application development, including internal data ingestion, data preparation, retrieval-augmented generation (RAG), embedded model selection, and optimizing LLM responses. The product is designed with principles of modularity, composability, simplicity, and maintainability in mind.
“This approach efficiently manages different text embeddings, vector databases, and foundation models. The dynamic nature of the LLM space necessitates a technology-agnostic platform, allowing users to interchange technologies for optimal solutions,” Lee added.
To initiate use of LLM Lab, users select a foundational model and adjust associated settings, such as temperature and maximum response length. Supported models include Meta’s Llama 2, Abu Dhabi’s Falcon from the Technology Innovation Institute, and Anthropic’s Claude, along with Pinecone for vector databases.
Next, users can select prompt templates to test their effectiveness and upload documents for RAG. After these configurations, they can finalize settings for quality performance and deploy the application. Users can then rate prompt/completion pairs and incorporate feedback for model fine-tuning through reinforcement learning via human feedback (RLHF).
Overcoming Technical Challenges
Although Lee did not disclose the number of companies currently testing LLM Lab, he reported positive feedback from early users.
Michell Handaka, founder and CEO of GLAIR.ai, a user of the platform, highlighted that the Lab facilitates better communication between engineering and non-engineering teams, effectively breaking down barriers to LLM application development.
Datasaur has already supported key industries, including finance, law, and healthcare, in transforming unstructured data into valuable machine learning datasets. Notable partnerships include Qualtrics, Ontra, Consensus, LegalTech, and Von Wobeser y Sierra.
“We are backing forward-thinking industry leaders and projecting a fivefold revenue increase in 2024,” Lee noted.
Future Developments for Datasaur and LLM Lab
In the upcoming year, Datasaur plans to enhance LLM Lab and invest further in enterprise-level LLM development. Users will be able to save successful configurations and share insights with colleagues. The Lab will also incorporate new and emerging foundational models.
Given the rising demand for custom, privacy-focused LLM applications, LLM Lab is poised to make a notable impact. According to the 2023 LLM Survey Report, nearly 62% of respondents are utilizing LLM applications such as ChatGPT and GitHub Copilot for functions like chatbots, customer support, and coding.
In light of growing privacy concerns, many companies are transitioning from general-purpose models to custom internal solutions that adhere to security, privacy, and regulatory standards.