NOXTUA VOYAGE EMBED is an embedding model fine-tuned on legal documents that Xayn shared with Voyage. It provides a substantial quality improvement of 25.3% over OpenAIสผs text-embedding-3-large.
NOXTUA VOYAGE EMBED is a model customized for retrieval tasks on legal documents that Xayn shared with Voyage. The model is based on voyage-multilingual-2, Voyageสผs latest and most powerful embedding model tailored to multilingual retrieval, and fine-tuned on EU_GER_Xayn_DeJure_Laws_Decisions, a proprietary dataset provided by Xayn. The context length is 32K tokens and the embedding dimension is 1024. The rest of the document will describe the evaluation results of NOXTUA VOYAGE EMBED against other baseline models, including voyage-multilingual-2 and OpenAIสผs text-embedding- 3-large.
Evaluation Datasets
EU_GER_Xayn_DeJure_Laws_Decisions is an extensive collection of German and EU law books as well as law cases, comprising decisions from various courts across Germany and the EU courts. The dataset is meticulously organized, with each case document containing essential metadata such as the date of decision, court hierarchy, and a summary of the key facts and outcomes. The cases cover a broad range of legal domains, including civil, criminal, administrative, and labor law, ensuring a comprehensive representation of the German and EU legal system.
The dataset consists of totally more than 20B tokens. Voyage generates 47k queries from the documents in the dataset to form an evaluation dataset, named xayn-syn-pairs-eval.
Example of pairs in xayn-syn-pairs: Query
โโWelche rechtlichen Konsequenzen hat das Verschweigen von Krankheitshistorien bei der Lebensversicherung?โโ
Relevant Doc
โ==STAMMDATEN==
LG Hamburg, 04.10.1990 - 327 O 125/90
==LEITSATZ==
1. Verschweigt der Versicherte bei Antragstellung einen chronischen und รผber Jahre medikamentรถs behandelten Bluthochdruck, ist der Versicherer nach ยง 123 BGB zur Anfechtung des Versicherungsvertrags berechtigt (hier: 52 Jahre alter Mann mit langjรคhrigem Verdacht auf Koronarinsuffizienz, verbunden mit Hypertonie und Fettstoffwechselstรถrungen).
2. War der Versicherte wegen dieser Krankheit in einem Zeitraum von neun Jahren mehr als 50mal in รคrztlicher Behandlung, lรครt dies den Schluร zu, daร er die Angaben รผber die Vorerkrankungen deshalb unterlieร, weil er befรผrchtete, der Versicherer werde anderenfalls den beantragten Lebensversicherungsvertrag nicht abschlieรen.โ
Evaluation Results
We compare the NOXTUA VOYAGE EMBED against other embedding models on xayn-syn-pairs-eval.
OpenAI embedding model: text-embedding-3-large
voyage-law-2, Voyage AI embedding model optimized for legal retrieval quality
voyage-multilingual-2, Voyage AI embedding model optimized for multilingual legal retrieval quality
Given a query, we retrieve the top-100 documents based on cosine similarities. We report NDCG๎10 and Recall@100. Both are standard metric for retrieval quality - higher is better. The table below presents the results.

The NOXTUA VOYAGE EMBED model significantly outperforms other embedding models, achieving a substantial average improvement of 25.3% over OpenAI text-embedding-3-large. Compared with voyage-multilingual-2 and voyage-law-2, NOXTUA VOYAGE EMBED achieves a 10.7% improvement in Recall@100, which validates the effectiveness of fine-tuning.
In addition, we also evaluate NOXTUA VOYAGE EMBED and baselines on a few common public legal retrieval benchmarks, such as legal_summarization, legalbench_consumer_contracts_qa, GerDaLIRSmall, and LegalQuAD. NOXTUA VOYAGE EMBED significantly outperforms text-embedding-3-large as well as other voyage models on these datasets as well.














