Giskard: Pioneering Open Source Testing for Large Language Models
Giskard, a dynamic French startup, is developing an open-source testing framework designed for large language models (LLMs). This innovative tool aims to alert developers to potential biases, security vulnerabilities, and the risks of generating harmful or toxic content.
While the excitement surrounding AI models grows, the importance of machine learning testing systems is also rising. With the upcoming implementation of the EU's AI Act and similar regulations in other countries, companies creating AI solutions must demonstrate compliance with established guidelines to avoid substantial fines.
Giskard is leading the way in embracing regulatory requirements, offering one of the first developer tools specifically tailored for efficient testing. "During my time at Dataiku, focusing on NLP model integration, I noticed significant challenges when testing practical applications and comparing suppliers' performances," recalled Giskard co-founder and CEO Alex Combessie.
Three Key Components of Giskard's Framework
Giskard's testing framework is built around three main components. Firstly, the startup has launched an open-source Python library that seamlessly integrates into LLM projects, specifically targeting retrieval-augmented generation (RAG) initiatives. This library is gaining popularity on GitHub and is compatible with other machine learning tools such as Hugging Face, MLFlow, Weights & Biases, PyTorch, TensorFlow, and LangChain.
Once set up, Giskard generates a comprehensive test suite that focuses on numerous issues, including model performance, hallucination, misinformation, biases, data leakage, harmful content generation, and prompt injections. "Performance is crucial for data scientists, but ethical considerations are gaining traction due to brand reputation and regulatory demands," Combessie explained.
Developers can easily integrate these tests into their continuous integration and continuous delivery (CI/CD) pipeline, ensuring that assessments run every time the code base is updated. If any issues are detected, developers receive a detailed scan report in their GitHub repository.
The testing process is customizable based on the specific use case of the model. For instance, companies utilizing RAG can grant Giskard access to vector databases and knowledge repositories, enhancing the relevance of the test suite. For example, if you're developing a chatbot that provides information on climate change based on the latest IPCC report using an OpenAI LLM, Giskard's tests will verify that the model can avoid misinformation and self-contradicting information.
AI Quality Hub and Real-Time Monitoring Tools
Giskard’s second offering is the AI Quality Hub, designed to assist in debugging LLMs while facilitating comparisons with other models. This premium feature is already attracting clients such as Banque de France and L’Oréal, helping them identify errors and streamline their compliance processes. "Our future vision is to integrate all regulatory features into the AI Quality Hub," Combessie added.
The third product, LLMon, is a real-time monitoring tool that evaluates LLM responses for common issues like toxicity, hallucination, and fact-checking before delivering answers to users. Currently, LLMon functions with companies leveraging OpenAI's APIs and foundational LLMs, while Giskard is actively pursuing integrations with Hugging Face and Anthropic.
Navigating AI Regulation and Market Opportunities
The regulation of AI models is evolving, with ongoing discussions about whether the AI Act will apply to foundational models like those from OpenAI and Anthropic or solely to specific applications. In this context, Giskard is well-positioned to guide developers on potential misuses of LLMs, particularly those augmented by external data sources.
With a dedicated team of 20, Giskard aims to expand significantly. "We have identified a clear market need for effective LLM testing, and we plan to double our team size to become the leading LLM security solution," stated Combessie.
This growth positions Giskard at the forefront of AI compliance and testing, ensuring that developers can navigate the complex landscape of emerging regulations while maintaining the integrity of their models.