Researchers from Microsoft and Beihang University have developed a cost-effective technique for fine-tuning large language models (LLMs), significantly reducing the usual expenses involved.
Named MoRA, this novel parameter-efficient fine-tuning (PEFT) method addresses limitations commonly associated with existing techniques like low-rank adaptation (LoRA). MoRA is particularly advantageous for fine-tuning models on tasks that require them to assimilate new knowledge. As PEFT strategies gain traction in enterprise settings, MoRA represents a valuable tool for LLM application developers.
Understanding PEFT and LoRA
Traditional fine-tuning requires adjusting all parameters of an LLM, which can be prohibitively expensive and time-consuming given that these models often contain billions of parameters. PEFT techniques, however, optimize this process by identifying the minimal subset of parameters necessary for task-specific adjustments.
LoRA has become a popular PEFT method due to its ability to update parameters using low-rank matrices, allowing for reduction in memory requirements and facilitating storage and deployment of fine-tuned models. However, LoRA tends to falter with more complex tasks, such as mathematical reasoning and continual pre-training, as its low-rank approach restricts the model's capacity to acquire and retain new information.
According to researchers, “this limitation restricts capacity to store new information via fine-tuning.”
Introducing MoRA
MoRA improves upon LoRA by relying on a single square matrix instead of low-rank matrices, enabling a more efficient fine-tuning process. The key concept behind MoRA is to leverage trainable parameters to achieve the highest possible rank compatible with the model’s original dimensions.
Unlike LoRA, MoRA’s input and output dimensions do not align with the original model's, preventing straightforward matrix multiplication. To resolve this, the researchers devised a compression/decompression function that facilitates input transformation between the two spaces, allowing MoRA to be seamlessly integrated into LLMs of various sizes. The square weight matrix enhances MoRA's ability to learn and memorize new knowledge compared to an equivalently sized LoRA model.
MoRA's Performance
In comparative studies, MoRA consistently outperformed LoRA on memorization tasks, approaching the performance of fully fine-tuned models while utilizing fewer parameters and training steps. The researchers observed that MoRA’s loss curve closely aligns with full fine-tuning for knowledge memorization tasks, indicating its efficiency.
“Our method shows significant improvements over LoRA with the same number of trainable parameters, benefiting from high-rank updating,” they stated.
In tasks involving instruction tuning and mathematical reasoning, MoRA's performance was nearly on par with LoRA. However, in continual pre-training scenarios within biomedical and financial contexts, MoRA excelled due to its high-rank updating capacity, which facilitates the memorization of new information.
Researchers also noted that increasing the MoRA adapter's rank could close the performance gap between PEFT and full fine-tuning in mathematical reasoning tasks, albeit with heightened training and storage demands.
The Role of PEFT in Enterprises
Fine-tuning is crucial for enterprise applications of LLMs. It enhances the capabilities and accuracy of LLMs, allowing organizations to utilize smaller models for tasks that might otherwise necessitate more costly advanced models.
Currently, LoRA and its variants are considered the benchmarks for parameter-efficient fine-tuning, supported by a robust ecosystem of tools and platforms for creating LoRA adapters. For instance, S-LoRA enables developers to execute multiple LoRA adapters on a single GPU, facilitating applications that require numerous fine-tuned LLMs tailored to individual user content.
The researchers have made MoRA available as an open-source implementation compatible with LoRA, positioning it as a significant resource for enterprises aiming to enrich base models with new knowledge.