Mistral AI is making its debut in the multimodal space with the release of Pixtral 12B, the first model from this French AI startup that integrates both language and vision processing capabilities. This development positions Mistral to compete with major players like OpenAI and Anthropic.
Currently, Pixtral 12B isn’t accessible on the public web, but developers can download its source code from Hugging Face or GitHub for individual testing. Breaking from typical AI release protocols, Mistral first shared a torrent link for users to download the model files.
Sophia Yang, head of developer relations, announced via an X post that the model will soon be available through Mistral's web chatbot, allowing developers to experiment with its features. Additionally, it will be integrated into Mistral’s La Platforme, offering API endpoints to access the company's models.
What Does Pixtral 12B Offer?
While the specifics regarding its training data remain undisclosed, Pixtral 12B aims to facilitate image analysis in conjunction with text prompts. Users should be able to upload images or provide links and ask questions related to the content within.
Although this is Mistral's first multimodal model, it’s important to note that competitors like OpenAI and Anthropic already have similar capabilities. When questioned about Pixtral's unique features, Yang highlighted that it can natively handle an arbitrary number of images in various sizes.
Initial testers on X have observed that the 24GB model boasts a robust architecture, including 40 layers, 14,336 hidden dimensions, and 32 attention heads for enhanced computational processing. The dedicated vision encoder supports images up to a resolution of 1024×1024 and features 24 hidden layers for advanced image analyses.
As Mistral prepares to release the model via API, its potential for vision applications like content and data analysis will become clearer. The precise performance of this open model is yet to be determined, but it symbolizes Mistral's ambitious trajectory within the AI landscape.
Since its inception last year, Mistral has rapidly developed a pipeline of models to challenge industry leaders like OpenAI. It has also forged strategic partnerships with major companies such as Microsoft, AWS, and Snowflake to extend the reach of its technologies. Recently, Mistral raised $640 million at a valuation of $6 billion and introduced Mistral Large 2, a GPT-4 level model featuring advanced multilingual capabilities and improved reasoning, code generation, and mathematical performance.
Moreover, the company has launched Mixtral, a mixture-of-experts model, and Codestral, an open-weight coding model with 22 billion parameters, alongside a model tailored for mathematical reasoning and scientific discovery.