Extreme compression of sentence-transformer ranker models

Faster inference, longer battery life, and less storage on edge devices

2022 || Amit Chaulwar, Lukas Malik, Maciej Krajewsk, Felix Reichel, Leif-Nissen Lundbæk, Michael Huth, Bartlomiej Matejczyk || arXiv

Modern search systems use several large ranker models with transformer architectures. These models require large computational resources and are not suitable for usage on devices with limited computational resources. Knowledge distillation is a popular compression technique that can reduce resource needs of such models, where a large teacher model transfers knowledge to a small student model. To drastically reduce memory requirements and energy consumption, we propose two extensions for a popular sentence-transformer distillation procedure: generation of an optimal size vocabulary and dimensionality reduction of the embedding dimension of teachers prior to distillation. We evaluate these extensions on two different types of ranker models. This results in extremely compressed student models whose analysis on a test dataset shows the significance and utility of our proposed extensions.

Read the entire research paper here.

Contact presse

Vous recherchez un.e expert.e en intelligence artificielle ou souhaitez rejoindre notre liste de diffusion presse ?

Pour toute demande presse, merci de contacter : 

Dr Clara Herdeanu

Chief Communications Officer

Téléphone : +49 174 4758 847

Téléphone : +49 174 4758 847

L'IA juridique à la hauteur de vos exigences. 

Pour que nous puissions donner suite à votre demande, veuillez nous indiquer à l'avance quelle juridiction vous intéresse.