On June 20, Bilibili announced the open-sourcing of its lightweight Index-1.9B model series, which includes several versions: base model, control group, chat model, and character-role model.
Official Overview:
- Index-1.9B Base: This foundational model features 1.9 billion non-word embedding parameters, pre-trained on a diverse corpus of 2.8 terabytes of Chinese and English data. It consistently outperforms competing models across multiple evaluation benchmarks.
- Index-1.9B Pure: This control model mirrors the base version in parameters and training strategy but excludes all instruction-related data to evaluate the impact of instructions on performance metrics.
- Index-1.9B Chat: Built on the Index-1.9B base, this chat model is refined through Supervised Fine-Tuning (SFT) and Direct Preference Optimization (DPO). Its training included extensive internet community data, resulting in a more engaging chat experience.
- Index-1.9B Character: This model further enhances SFT and DPO by incorporating Retrieval-Augmented Generation (RAG) for customizable few-shot role-playing. Utilizing the same 2.8TB dataset with a 4:5 language distribution of Chinese to English and 6% code representation, it features a built-in character named "San San," while users can create their own characters.
This new model series strengthens Bilibili's AI capabilities, enhancing user interaction and engagement with its advanced customizable features.