Mistral has introduced Codestral Embed, a code embedding model that outperforms OpenAI and Cohere

French company Mistral has released Codestral Embed, its first codestral embedding model, which the company says outperforms competitors on popular benchmarks, including SWE-Bench. The new product is designed for use in information retrieval tasks, especially in real-world code scenarios.

Codestral Embed is the first code embedding model that the company says outperforms competitors, including SWE-Bench.

Codestral Embed offers higher quality and flexible customization

Codestral Embed is part of the Codestral family of models and is designed to convert code and data into numeric vectors that are used in augmented knowledge extraction (RAG) systems and other code processing scenarios. The model allows you to choose the dimensionality and precision of the embeddings, which helps optimize storage costs without sacrificing quality.

The Mistral blog notes that even at settings of 256 dimensions and int8 precision, Codestral Embed outperforms competitors including Voyage Code 3, Cohere Embed v4.0 and OpenAI Text Embedding 3 Large. Developers can utilize the model for $0.15 per million tokens.

Codestral Embed is designed for RAG and code analysis

RAG: creating embeddings for faster information retrieval in code.
Semantic Code Search: search for code snippets in natural language, which is useful for developer platforms, documentation and code assistants.
Code duplicate search: identifying repeated or similar code fragments, which is important for complying with corporate code reuse policies.
Semantic code clustering: group fragments by functionality and structure to analyze repositories and project architecture.

The model has been tested on several benchmarks, including SWE-Bench and GitHub Text2Code, where it performed better than competitors, according to the company.