jina-embeddings-v5-text: Compact multilingual embedding models

Get hands-on with Elasticsearch: Dive into our sample notebooks in the Elasticsearch Labs repo, start a free cloud trial, or try Elastic on your local machine now.

Jina AI and Elastic are releasing jina-embeddings-v5-text, a family of new, high-performance, compact text embedding models with state-of-the-art performance for models of comparable size across all major task types.

The family includes two models:

jina-embeddings-v5-text-small
jina-embeddings-v5-text-nano

These models are the successful result of an innovative new training recipe for embedding models. They both outperform models many times their size, creating savings in memory and computing resources and responding faster to requests.

The jina-embeddings-v5-text-small model has 677M parameters, supports a 32768 token input context window, and produces 1024 dimension embeddings by default.

jina-embeddings-v5-text-nano weighs in at roughly a third of its sibling's size, with 239M parameters and a 8192 token input context window, yielding slender 768 dimension embeddings.

Model name	Total size	Input context window size	Embedding size
jina-v5-text-small	677M params	32768 tokens	1024 dims
jina-v5-text-nano	239M params	8192 tokens	768 dims

These two models are the best in class for overall MMTEB (Multilingual MTEB) benchmark performance. Among models with under 500M parameters, jina-embeddings-v5-text-nano is the top performer, despite having less than 250M parameters, and jina-embeddings-v5-text-small model is the leader among multilingual embedding models with under 750M parameters.

These models are available via Elastic Inference Service (EIS), via an online API, and available for local hosting. For instructions on how to access jina-embeddings-v5-text models, see the “Getting started” section, below.

Embedding models and semantic indexing dramatically increase the accuracy of search algorithms but also have a variety of other uses for tasks involving semantic similarity and meaning extraction, for example:

Finding duplicate texts.
Recognizing paraphrases and translations.
Topic discovery.
Recommendation engines.
Sentiment and intention analysis.
Spam filtering.
And many others.

Features

This new model family has a number of features designed to improve relevance and reduce costs.

Task optimization

We’ve optimized the jina-embeddings-v5-text models for four broad task types:

Task	Example use cases
Retrieval	Searching with natural language queries and retrieving the most relevant matches in a collection of documents.
Text matching	Semantic similarity, deduplication, paraphrase and translation alignment, and more.
Clustering	Topic discovery, automatic organization of document collections.
Classification	Document categorization, sentiment and intent detection, similar tasks.

Optimizing for one task usually means having to compromise on another, so most embedding models only have competitive performance for one kind of task. But jina-embeddings-v5-text models are able to specialize in all four areas without compromising by training task-specific Low-Rank Adaptation (LoRA) adapters.

LoRA adapters are a kind of plugin for an AI model that changes its behavior dramatically while only adding slightly to the total size. Instead of having an entire model for each task, each one with hundreds of millions of parameters, the jina-embeddings-v5-text model family lets you use just one model with a compact LoRA adapter for each task. This saves memory, storage space, and inference costs.

Truncating embeddings

We’ve trained the jina-embeddings-v5-text models using Matryoshka Representation Learning, which lets you cut your embeddings down to smaller sizes at a minimal cost to their quality.

By default, jina-embeddings-v5-text-small generates 1024-dimension embedding vectors, each represented by a 16-bit number, making every embedding 2KB in size. For a large collection of documents, this can be a lot of data to store, and searching in a vector database full of embeddings is proportional both to the size of the database and to the number of dimensions each stored vector has.

But you can just halve the size of the embeddings (throw away 512 of the 1024 dimensions), and take up half the space while doubling search speeds. This has an impact on performance. Throwing away information reduces precision. But as the graph below shows, even getting rid of half of the embedding only reduces performance slightly:

jina-embeddings-v5-text-small & jina-embeddings-v5-text-nano embedding size

As long your embeddings are at least 256 dimensions, the loss in precision should remain fairly small. Below that level, however, relevance and accuracy deteriorate quickly.

Truncating embeddings like this empowers users to manage their own trade-offs between accuracy and computing costs. It gives you the tools to get big efficiency gains and large cost savings out of your search AI.

Robust quantization

Quantization is another way of reducing the size of embeddings. Instead of throwing away part of each embedding, quantization reduces the precision of the numbers in the embedding. The jina-embeddings-v5-text models generate embeddings with 16-bit numbers, but we can round those numbers off, reducing their precision and the number of bits needed to store them. In the most extreme case, we can reduce each number to one bit (0 or 1), compressing jina-embeddings-v5-text’s default 1024 dimension embeddings from 2 kilobytes to 128 bytes, a 94% reduction from binary quantization alone. Just like for truncation, this produces large savings in memory and computing costs. However, also like truncation, quantization makes embeddings less accurate.

We’ve trained the jina-embeddings-v5-text models to work with Elasticsearch’s Better Binary Quantization by minimizing that loss of accuracy, and benchmark tests of binarized embeddings from these models show performance almost equal to their non-binarized equivalents. Consult the technical report for detailed ablation studies of binarization performance.

Multilingual performance

Many embedding models are multilingual because they’ve been trained on materials that include large numbers of languages. But that doesn’t mean that they all perform equally well in all supported languages.

We identified 211 languages in the MMTEB multilingual benchmark and separated them so we could compare our models to similar models on a language-by-language basis. The image below summarizes our results as a heat map. Each patch is a language (identified by its ISO-639 code), and the greener it is, the better the model performed compared to the average of similar models:

jina-embeddings-v5-text-nano & jina-embeddings-v5-text-small languages in the MMTEB multilingual benchmark

Although accuracy varies between languages, the jina-embeddings-v5-text models are state-of-the-art or nearly so across most of the world’s languages.

For details about multilingual performance, see the jina-embeddings-v5-text technical report.

Jina in Elastic: State-of-the-art native AI for search

With jina-embeddings-v5-text models on EIS, you can run high-performance multilingual embedding models natively in Elasticsearch with fully managed, GPU-accelerated inference and no infrastructure to provision or scale. jina-embeddings-v5-text models extend the growing EIS model catalog with compact, multilingual models powered by the latest developments in AI. These models have state-of-the-art performance on information retrieval and standard data analysis benchmarks, and they offer unequaled, globe-spanning multilingual support.

With two models of vastly different sizes, users can determine which one is best suited for their applications and budgets. Furthermore, with robust embeddings that remain performant when truncated to smaller sizes or quantized to lower precision, jina-embeddings-v5-text models provide opportunities for further concrete savings in storage and computing costs as well as in processing latency.

With the jina-embeddings-v5-text family, Jina Reranker, and Elastic’s fast vector and BM25 search, users now have access to end-to-end, state-of-the-art hybrid search from Elastic. When you need the most relevant results, whether for retrieval augmented generation (RAG) pipelines, search applications, or data analysis, Elastic with Jina search AI models provides solid and cost-effective quality.

Getting started

The jina-embeddings-v5-text models are fully integrated into EIS, and you can use them by setting the type field to semantic_text when creating your index and specifying the model (jina-embeddings-v5-text-small or jina-embeddings-v5-text-nano) in the inference_id field, as in this example:

Elasticsearch automatically selects the appropriate LoRA adapter during indexing and retrieval. The embedding dimensions (see the “Truncating embeddings” section, above) can be set when creating a custom inference endpoint.

See the Elasticsearch documentation for more information on using jina-embeddings-v5-text models.

More information

To learn more about jina-embeddings-v5-text models, read the release notes on the Jina AI blog and the technical report, with more detailed technical information about performance and Jina AI’s innovative new training procedure. For information about downloading and running these models locally, visit the jina-embeddings-v5-text collection page on Hugging Face.

Jina AI models are available under a CC-BY-NC-4.0 license, so you are free to download them and try them out, but for commercial use, please contact Elastic sales.