To maximize the benefits of large language models (LLMs), enterprises must fine-tune them using domain-specific data. This process enhances the model's ability to generate relevant outputs.
However, fine-tuning pre-trained models introduces a critical challenge: adjusting weights for different data distributions can lead to “catastrophic forgetting,” where the model loses previously acquired knowledge. This degradation negatively impacts the LLM's performance and reasoning skills.
Voice AI company Tenyx has announced a fine-tuning solution designed to combat this issue. Their platform allows businesses to tailor LLMs to their specific needs without sacrificing foundational knowledge or safety measures.
"Catastrophic forgetting is a long-standing issue in the machine learning community," said Itamar Arel, CEO and founder of Tenyx. "Traditionally, it was assumed that models could continuously train on new data while retaining old information."
The Risks of Fine-Tuning
Arel highlights that fine-tuning is becoming increasingly vital for enterprise applications of LLMs. Yet, data scientists often lack complete access to the original training datasets, and traditional fine-tuning methods fail to mitigate the forgetting effect. This can result in the loss of essential capabilities and expose organizations to harmful or biased content.
For instance, using LLaMA 7B as a customer service chatbot—a common off-the-shelf application—requires fine-tuning it with typical customer interactions. Standard techniques, like Low-Rank Adaptation (LoRA), may inadvertently lead to a loss of valuable knowledge, such as accurately answering, "What’s the distance from the hotel to the airport?" or inferring context from statements like, "I’ll be arriving on Dec. 7 for four nights."
"The fine-tuned model may excel in specific tasks but could generate incorrect or biased responses regarding broader knowledge and reasoning," Arel noted.
Limitations of Low-Rank Adaptation
Although LoRA is popular for its computational efficiency, Arel explains it was not designed to address catastrophic forgetting. When fine-tuning shifts the data distribution away from the original, unpredictable distortions occur.
"Our findings indicate that despite LoRA's advantages, it carries the same risks of knowledge and reasoning loss," Arel stated. Model complexity also complicates the identification and rectification of these distortions. Furthermore, traditional fine-tuning methods can weaken safety protocols established via reinforcement learning from human feedback (RLHF), which are crucial in preventing biased outputs.
"RLHF is also a training process and is therefore impacted during fine-tuning," Arel emphasized.
Inefficiencies in Current Mitigation Strategies
Currently, enterprises attempt to manage catastrophic forgetting by relying on numerous machine learning engineers to limit fine-tuning and utilize prompt engineering for optimal outcomes. However, this approach is inconsistent, costly, and lacks a clear understanding of when and why it works. Additionally, evaluating knowledge and reasoning while fine-tuning, often through manual intervention, complicates the process without automation possibilities.
Tenyx’s Approach to Fine-Tuning
Tenyx's innovative fine-tuning method identifies which model parameters can be updated to learn from new data while preserving most prior input-output mappings. Their platform ensures that updates during fine-tuning do not disrupt the model's ability to process original data.
"By analyzing a trained LLM, our method determines the optimal weights to update, enabling new data learning while minimizing catastrophic forgetting," Arel explained. Tenyx’s approach employs a novel mathematical interpretation of the geometric representations formulated during initial LLM training, effectively retaining previously learned information while accommodating changes.
Crucially, Tenyx's method preserves RLHF protections and aligns with regulatory guidelines, including the White House Executive Order on Safe, Secure, and Trustworthy AI.
Results of Tenyx's Fine-Tuning Method
In a pilot study evaluating both popular enterprise and open-source fine-tuning algorithms, Tenyx demonstrated notable advantages in safety, proficiency, and knowledge retention:
- Safety: Tenyx achieved an 11% reduction in risks, outperforming OpenAI’s -66%, Together AI’s -94%, and LoRA’s -91%.
- Proficiency: While OpenAI’s GPT 3.5 Turbo showed superior initial proficiency due to its parameters, Tenyx’s Llama-2 7B excelled after fine-tuning.
- Knowledge: Tenyx recorded only a 3% loss in catastrophic forgetting, compared to OpenAI’s 10%, Together AI’s 40%, and LoRA’s 43%.
"Catastrophic forgetting remains a recognized hurdle in deep learning, impacting even the most advanced models," noted Noah Goodman, associate professor at Stanford University. "As models fine-tune on new domain data, they typically enhance performance in that area but at the risk of altering established abilities."
Goodman added, "Tenyx has a strong research team exploring innovative solutions to tackle this complex challenge."