{ "cells": [ { "cell_type": "markdown", "id": "168470ff", "metadata": { "pycharm": { "name": "#%% md\n" } }, "source": [ "
\n", "\n", "\n", "A Human-in-the-Loop? workflow for creating HD images from text" ] }, { "cell_type": "markdown", "id": "85339e6c", "metadata": {}, "source": [ "[](https://github.com/jina-ai/dalle-flow) [](https://slack.jina.ai) [](https://colab.research.google.com/github/jina-ai/dalle-flow/blob/main/client.ipynb)" ] }, { "cell_type": "markdown", "id": "3d76e649", "metadata": { "pycharm": { "name": "#%% md\n" } }, "source": [ "\n", "\n", "🚧 If you find your request fails, it is possible the server is occasionally down for maintaince. Please give it a try in 5 minutes.\n", "\n", "\n", "- 🌟 2022/5/7 DALL·E Flow just got updated!\n", " - New DALL·E Mega checkpoint\n", " - Improved GLID3 memory-efficiency and parameters\n", "- 🌟 2022/5/6 DALL·E Flow just got updated!\n", " - The first step will generate 16 candidates: **8 from DALL·E Mega, 8 from GLID3-XL**; ranked by CLIP-as-service.\n", " - Optimized the flow efficiency, diffusion and upscaling is much faster now!\n", "- ~~⚠️ 2022/5/3 **The number of images is restrict to 9 for DALL·E Mega, and 16 for GLID3-XL**~~\n", "- ⚠️ 2022/5/2 **Due to the massive requests now, the server is super busy.** You can deploy your own server by following [the instruction here](https://github.com/jina-ai/dalle-flow#server).\n" ] }, { "cell_type": "markdown", "id": "d1304c98", "metadata": { "pycharm": { "name": "#%% md\n" } }, "source": [ "Using client is super easy. The following steps are best run in Jupyter notebook or [Google Colab](https://colab.research.google.com/github/jina-ai/dalle-flow/blob/main/client.ipynb). \n", "\n", "The only dependency you will need are [DocArray](https://github.com/jina-ai/docarray) and [Jina](https://github.com/jina-ai/jina):" ] }, { "cell_type": "code", "execution_count": null, "id": "7cb1cd20", "metadata": { "pycharm": { "name": "#%%\n" } }, "outputs": [], "source": [ "!pip install \"docarray[common]>=0.13.10\" jina" ] }, { "cell_type": "markdown", "id": "66a750e1", "metadata": { "pycharm": { "name": "#%%\n" } }, "source": [ "We have provided a demo server for you to play:" ] }, { "cell_type": "code", "execution_count": null, "id": "ee768ad6", "metadata": { "pycharm": { "name": "#%%\n" } }, "outputs": [], "source": [ "server_url = 'grpc://dalle-flow.jina.ai:51005'" ] }, { "cell_type": "markdown", "id": "eaaac9dc", "metadata": { "pycharm": { "name": "#%% md\n" } }, "source": [ "### Step 1: Generate via DALL·E Mega\n", "\n", "Now let's define the prompt:" ] }, { "cell_type": "code", "execution_count": null, "id": "4f60975f", "metadata": { "pycharm": { "name": "#%%\n" } }, "outputs": [], "source": [ "prompt = 'an oil painting of a humanoid robot playing chess in the style of Matisse'" ] }, { "cell_type": "markdown", "id": "8168d6c6", "metadata": { "pycharm": { "name": "#%% md\n" } }, "source": [ "Let's submit it to the server and visualize the results:" ] }, { "cell_type": "code", "execution_count": null, "id": "b280af05", "metadata": { "pycharm": { "name": "#%%\n" } }, "outputs": [], "source": [ "%%time\n", "\n", "from docarray import Document\n", "\n", "da = Document(text=prompt).post(server_url, parameters={'num_images': 8}).matches\n", "\n", "da.plot_image_sprites(fig_size=(10,10), show_index=True)" ] }, { "cell_type": "markdown", "id": "a86369e0", "metadata": { "pycharm": { "name": "#%% md\n" } }, "source": [ "Here we generate 16 candidates, 8 from DALLE-mega and 8 from GLID3 XL, this is as defined in `num_images`, which takes about ~2 minutes. You can use a smaller value if it is too long for you. The results are sorted by [CLIP-as-service](https://github.com/jina-ai/clip-as-service), with index-`0` as the best candidate judged by CLIP. " ] }, { "cell_type": "markdown", "id": "8fd9bc33", "metadata": { "pycharm": { "name": "#%% md\n" } }, "source": [ "### Step 2: Select and refinement via GLID3 XL\n", "\n", "Of course, you may think differently. Notice the number in the top-left corner? Select the one you like the most and get a better view:" ] }, { "cell_type": "code", "execution_count": null, "id": "4b57eef8", "metadata": { "pycharm": { "name": "#%%\n" } }, "outputs": [], "source": [ "fav_id = 9\n", "\n", "fav = da[fav_id]\n", "\n", "fav.display()" ] }, { "cell_type": "markdown", "id": "385c28b2", "metadata": { "pycharm": { "name": "#%% md\n" } }, "source": [ "Now let's submit the selected candidates to the server for diffusion." ] }, { "cell_type": "code", "execution_count": null, "id": "4629f9c6", "metadata": { "pycharm": { "name": "#%%\n" } }, "outputs": [], "source": [ "%%time\n", "\n", "diffused = fav.post(f'{server_url}', parameters={'skip_rate': 0.5, 'num_images': 9}, target_executor='diffusion').matches\n", "\n", "diffused.plot_image_sprites(fig_size=(10,10), show_index=True)" ] }, { "cell_type": "markdown", "id": "5c7ccbb7", "metadata": { "pycharm": { "name": "#%% md\n" } }, "source": [ "This will give 9 images based on the given image. You may allow the model to improvise more by giving `skip_rate` a near-zero value, or a near-one value to force its closeness to the given image. The whole procedure takes about ~1 minutes.\n" ] }, { "cell_type": "markdown", "id": "e421f371", "metadata": { "pycharm": { "name": "#%% md\n" } }, "source": [ "### Step 3: Select and upscale via SwinIR\n", "\n", "Select the image you like the most, and give it a closer look:\n" ] }, { "cell_type": "code", "execution_count": null, "id": "f83734fa", "metadata": { "pycharm": { "name": "#%%\n" } }, "outputs": [], "source": [ "dfav_id = 2\n", "\n", "fav = diffused[dfav_id]\n", "\n", "fav.display()" ] }, { "cell_type": "markdown", "id": "9c807dcb", "metadata": { "pycharm": { "name": "#%% md\n" } }, "source": [ "Finally, submit to the server for the last step: upscaling to 1024 x 1024px." ] }, { "cell_type": "code", "execution_count": null, "id": "102fbcda", "metadata": { "pycharm": { "name": "#%%\n" } }, "outputs": [], "source": [ "%%time\n", "\n", "fav = fav.post(f'{server_url}/upscale', target_executor='upscaler')\n", "fav.display()" ] }, { "cell_type": "markdown", "id": "ed93a102", "metadata": { "pycharm": { "name": "#%%\n" } }, "source": [ "> 💁♂️ On Google colab this image may render exactly the same size as before. But it is in 1024x1024 already. Right click on the image and copy/save it. You will see.\n", "\n", "That's it! It is _the one_. If not satisfied, please repeat the procedure. Btw, DocArray is a powerful and easy-to-use data structure for unstructured data. It is super productive for data scientists who work in cross-/multi-modal domain. To learn more about DocArray, [please check out the docs](https://docarray.jina.ai)." ] }, { "cell_type": "code", "execution_count": null, "id": "dcd6d387", "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.9" } }, "nbformat": 4, "nbformat_minor": 5 }