notifications
News
For Enterprises
Embeddings
Our world-class embeddings for your search and RAG systems


For Power Users
PromptPerfect
Premier tool for prompt engineering
SceneXplain
Leading AI solution for image captions and video summaries
BestBanner
Blog to banner, without the prompts!
JinaChat
More modality, longer memory, less cost
Rationale
Ultimate AI decision-making tools


For Developers
DocArray
The data structure for multimodal data
Jina
Build multimodal AI applications on the cloud
Finetuner
Fine-tune embeddings on domain specific data for better search quality
CLIP-as-service
Embed images and sentences into fixed-length vectors with CLIP
JCloud
Deploy a local project as a cloud service. Radically easy, no nasty surprises.
LangChain-Serve
Langchain apps on production with Jina & FastAPI
VectorDB
A Python vector database you just need - no more, no less
Executor Hub
Share and discover building blocks for multimodal AI applications
DALL-E Flow
A human-in-the-Loop workflow for creating HD images from text
DiscoArt
Create compelling Disco Diffusion artworks in one line of code
ThinkGPT
Agent techniques to augment your LLM and push it beyond its limits
DevGPT
Your virtual development team
RunGPT
An open-source cloud-native of large multimodal models serving framework
Jerboa
An experimental finetuner for open-source LLMs


Company
About us
Contact sales
Open day
Intern program
Join us
open_in_new


arrow_circle_leftBack to Newsroom
Software updates
Finetuner 0.7.7 Update

Finetuner makes neural network fine-tuning easier and faster by streamlining the workflow and handling all the complexity and infrastructure requirements in the cloud.

Engineering Group
May 24, 2023 • 3 minutes read

Finetuner makes neural network fine-tuning easier and faster by streamlining the workflow and handling all the complexity and infrastructure requirements in the cloud. With Finetuner, one can easily enhance the performance of pre-trained models and make them production-ready without expensive hardware.

Release v0.7.7 · jina-ai/finetuner
Release Note Finetuner 0.7.7This release covers Finetuner version 0.7.7, including dependencies finetuner-api 0.5.9 and finetuner-core 0.13.5.This release contains 2 new features, 2 refactorings,…
GitHubjina-ai

This release covers Finetuner version 0.7.7, including dependencies finetuner-api 0.5.9 and finetuner-core 0.13.5.

This release contains 2 new features, 2 refactorings, 3 bug fixes, and 1 documentation improvement.

🆕 Features

Training data synthesis (#715)

In this release of Finetuner, we have introduced a training data synthesis feature. This feature is particularly useful for users in the e-commerce domain, who may have difficulty obtaining enough labeled training data.

This feature allows you to use historical queries collected from your search system, along with your articles, to generate training data:

import finetuner
from finetuner.model import synthesis_model_en

synthesis_run = finetuner.synthesize(
    query_data='finetuner/xmarket_queries_da',
    corpus_data='finetuner/xmarket_corpus_da',
    models=synthesis_model_en,
)

Once the synthesis job is done, you can get the training data with:

train_data_name = synthesis_run.train_data

And then, you can continue fine-tuning your embedding model with the generated training data:

training_run = finetuner.fit(
    model='bert-base-en',
    train_data=synthesis_run.train_data,
    loss='MarginMSELoss',
    ...,
)

Evaluation on multiple datasets in EvaluationCallback

In order to facilitate the training and evaluation of large language models (LLMs) using Finetuner, we have made significant changes to EvaluationCallback.

These changes now enable evaluation on multiple datasets. Users can now use the caption parameter to EvaluationCallback to get output that labels which dataset each evaluation corresponds to:

import finetuner
from finetuner.callback import EvaluationCallback

finetuner.fit(
    ...,
    callbacks=[
        EvaluationCallback(
            query_data='query-1',
            index_data='index-1',
            caption='dataset-1',
        ),
        EvaluationCallback(
            query_data='query-2',
            index_data='index-2',
            caption='dataset-2',
        ),
    ]
)

⚙ Refactoring

Display small loss values with higher precision.

To avoid displaying "0.000" for very small loss values, the display precision has been increased.

Filter PIL debugging messages from logging stack.

In order to enhance the readability of the logs, we have excluded debugging messages generated by the PIL package.

🐞 Bug Fixes

No longer overestimate the batch_size for text models.

This pull request resolves a bug where the batch size finder would incorrectly overestimate the maximum usable batch size for text models like BERT. This is likely to happen when users fine-tune the bert-base-en model without specifying batch_size.

Fix division by None error in EvaluationCallback.

Runs set up with automatic batch-size configuration and automatic evaluation callback previously passed the value None to EvaluationCallback as batch_size. This resulted in a division by None error.

Filter out queries that do not have any matches in EvaluationCallback.

When there are queries in the evaluation data which do not have any matches, Finetuner was previously unable to calculate any metrics, which leads to division by zero errors. It has been fixed in this release.

📗 Documentation Improvements

Add a tutorial for data synthesis (#745)

We have provided a tutorial for the new data synthesis module.

🤟 Contributors

We would like to thank all contributors to this release:

  • Wang Bo (@bwanglzu)
  • Louis Milliken (@LMMilliken)
  • Michael Günther (@guenthermi)
  • George Mastrapas (@gmastrapas)
  • Scott Martens (@scott-martens)

Categories:
Software updates

Learn more
Jina 3.23.1 Update
Jina is a MLOps framework that empowers anyone to build cross-modal and multi-modal applications on the cloud.
Engineering Group
December 01, 2023 • 1 minutes read
Jina 3.23.0 Update
Jina is a MLOps framework that empowers anyone to build cross-modal and multi-modal applications on the cloud.
Engineering Group
November 20, 2023 • 1 minutes read
Jina 3.22.4 Update
Jina is a MLOps framework that empowers anyone to build cross-modal and multi-modal applications on the cloud.
Engineering Group
October 31, 2023 • 1 minutes read
Offices
location_on
Berlin, Germany (HQ)
Ohlauer Str. 43 (1st floor), zone A, 10999 Berlin, Germany
Geschäftsanschrift: Leipziger str. 96, 10117 Berlin, Germany
location_on
Beijing, China
Level 5, Building 6, No.48 Haidian West St. Beijing Haidian, China
location_on
Shenzhen, China
402, Floor 4, Fu'an Technology Building, Shenzhen Nanshan, China
© Jina AI GmbH 2020-2023. All rights reserved.
email
Privacy PolicyTerms and ConditionsPrivacy Settings