This release covers Finetuner version 0.8.0, including dependency finetuner-core 0.13.9.
This release contains 1 new feature and 1 refactoring.
🆕 Features
Add Jina embeddings suite (#757)
We have made contributions to the open-source community by releasing three pre-trained embedding models:
jina-embedding-s-en-v1
: 35 million parameter compact embedding model.jina-embedding-b-en-v1
: 110 million parameter standard-sized embedding model.jina-embedding-l-en-v1
: 330 million parameter large embedding model.
We have trained all three models using Jina AI's Linnaeus-Clean dataset. This dataset consists of 380 million pairs of sentences in query-document pairs. These pairs were curated from a variety of domains in the Linnaeus-Full dataset through a thorough cleaning process. The Linnaeus-Full dataset contains 1.6 billion sentence pairs.
If you wish to use these embeddings with Finetuner, follow the instructions below:
!pip install finetuner
import finetuner
model = finetuner.build_model('jinaai/jina-embedding-s-en-v1')
embeddings = finetuner.encode(
model=model,
data=['how is the weather today', 'What is the current weather like today?']
)
print(finetuner.cos_sim(embeddings[0], embeddings[1]))
⚙ Refactoring
Change installation behavior (#757)
With the launch of Finetuner 0.8.0, installing it using pip install finetuner
will automatically include the necessary torch-related dependencies. This enables Finetuner to function as an optimal provider of embedding models. If you intend to fine-tune an embedding model, make sure that you install Finetuner with all the additional dependencies by using the command pip install "finetuner[full]"
.
🤟 Contributors
We would like to thank all contributors to this release:
- Wang Bo (@bwanglzu)
- Louis Milliken (@LMMilliken)
- Michael Günther (@guenthermi)
- George Mastrapas (@gmastrapas)
- Scott Martens (@scott-martens)
- Jonathan Geuter (@j-geuter)