Mistral has introduced Codestral Embed, a code embedding model that outperforms OpenAI and Cohere

French company Mistral has released Codestral Embed, its first codestral embedding model, which the company says outperforms competitors on popular benchmarks, including SWE-Bench. The new product is designed for use in information retrieval tasks, especially in real-world code scenarios.
Codestral Embed is the first code embedding model that the company says outperforms competitors, including SWE-Bench.
Codestral Embed offers higher quality and flexible customization
Codestral Embed is part of the Codestral family of models and is designed to convert code and data into numeric vectors that are used in augmented knowledge extraction (RAG) systems and other code processing scenarios. The model allows you to choose the dimensionality and precision of the embeddings, which helps optimize storage costs without sacrificing quality.
The Mistral blog notes that even at settings of 256 dimensions and int8 precision, Codestral Embed outperforms competitors including Voyage Code 3, Cohere Embed v4.0 and OpenAI Text Embedding 3 Large. Developers can utilize the model for $0.15 per million tokens.

Codestral Embed is designed for RAG and code analysis
- RAG: creating embeddings for faster information retrieval in code.
- Semantic Code Search: search for code snippets in natural language, which is useful for developer platforms, documentation and code assistants.
- Code duplicate search: identifying repeated or similar code fragments, which is important for complying with corporate code reuse policies.
- Semantic code clustering: group fragments by functionality and structure to analyze repositories and project architecture.
The model has been tested on several benchmarks, including SWE-Bench and GitHub Text2Code, where it performed better than competitors, according to the company.


The embedding market is becoming increasingly competitive
The release of Codestral Embed coincided with growing interest in RAG tasks and an increase in embedding offerings. Earlier, Mistral introduced Mistral Medium 3, a medium-sized version of its LLM model, as well as an API for building multi-agent systems and running real-world tasks.
Mistral Medium 3 is the latest version of its LLM model, as well as an API for building multi-agent systems and running real-world tasks.
Although Codestral Embed performs well on benchmarks, the company has yet to validate the model’s effectiveness in real-world tasks. Competition comes from closed OpenAI and Cohere models as well as open source solutions such as Qodo-Embed-1-1.5B.
The article Mistral unveils Codestral Embed, a code-embedding model that outperforms OpenAI and Cohere was first published on ITZine.ru.