We're starting to run low on phrases like "we're happy to announce x", or "announcing x" every time we push out a new feature in SceneXplain. So let's just jump to the meat: SceneXplain has a new JSON Schema Store, where you can discover and share reusable JSON schemas. This builds on our recent "extract JSON from image" feature, which lets you specify a JSON Schema when uploading an image, and get information back in JSON that adheres to that schema.
What is a JSON Schema?
According to the official website, JSON Schema is a vocabulary that you can use to annotate and validate JSON documents. With SceneXplain, you can upload a schema like:
{
"type": "object",
"properties": {
"alt_tag": {
"type": "string",
"description": "the most concise description possible of the image's content and function. Do not describe elements that are purely decorative (e.g. part of the website's design, not content). Do not include text like 'this image contains' or 'image depicts'"
}
}
}
Along with an image...
And get the following output:
{
"alt_text": "Close-up view of a tall tree covered in moss with a clear blue sky in the background"
}
The JSON schema above is pretty straightforward, with only one field. But crafting more complicated JSON schemas can be a lot of effort, even assuming they work well. And sharing them can get messy, having to shuffle code blocks around Slack or other messaging clients (believe me, I've been there!)
How does the Schema Store work?
The Schema Store solves these headaches. Now you copy and use someone else's Schema without having to go through the bother of crafting your own. Or you can easily share your schemas with colleagues for more efficient image processing.
Use community schemas
To get started, create an account on SceneXplain and follow these steps:
Create your own schema
Examples
Here are a few more examples of what you can do with the schemas from the Store:
Bulk-generating alt-text via SceneXplain's API
Now let's get our hands dirty by calling SceneXplain's API to bulk process images using a schema from the Store. In our use case, we'll perform the simple task of generating alt-text.
Alt text is used in HTML to describe the appearance and function of an image on a webpage. They are crucial for accessibility, as they provide a text alternative for screen readers used by visually impaired users, and also aid in search engine optimization (SEO) by allowing search engines to better understand the content of the images.
We won't go through each individual step in depth, since the notebook handles that. We'll just give an overview of each step. Please refer to the notebook for the real code.
Choose the schema
We'll use the alt-tagger schema that I created earlier. Be sure to note its ID! In our case that's qTcJ1uVh5d7y3HLDCn0Q
.
Test the API
We can quickly test the API with a code snippet by clicking the API tab:
You can access the API in Python, cURL, or JavaScript. Right now we'll use cURL since it's nice and short. We'll send the URL to the following image, along with our SceneXplain key:
curl "https://api.scenex.jina.ai/v1/describe" \
-H "x-api-key: token $YOUR_GENERATED_SECRET" \
-H "content-type: application/json" \
--data '{"data":[
{"image": "https://images.pexels.com/photos/18822188/pexels-photo-18822188/free-photo-of-heron-by-the-sea.jpeg",
"features": ["json"],
"json_schema_id": "qTcJ1uVh5d7y3HLDCn0Q"}
]}'
After a few seconds (and some prettification via jq
), we get the following JSON, which includes our alt-tag:
{
"code": 200,
"status": 20000,
"result": [
{
"id": "BiU7me3ytaKn4KaF0v84",
"image": "https://storage.googleapis.com/causal-diffusion.appspot.com/imagePrompts%2Fe4710bdc-73de-469c-9202-c2c0fe1073af%2Foriginal.png",
"features": [
"json"
],
"json_schema_id": "qTcJ1uVh5d7y3HLDCn0Q",
"algorithm": "jelly",
"uid": "NIDud1AA3NMTBFYZ4MEpNZy5om62",
"optOut": false,
"fullyOptOut": false,
"__developer_options": null,
"text": "{\"alt_tag\":\"White-faced heron standing in shallow shoreline water\"}",
"i18n": {
"en": "{\"alt_tag\":\"White-faced heron standing in shallow shoreline water\"}"
},
"userId": "foo",
"createdAt": 1700737281840,
"languages": []
}
]
}
Collect your data
Moving forwards, we'll be using Python for our code. Assuming we have a folder of images, for each image we'll need to send:
- The image file converted to a base64-encoded datauri
- The SceneXplain features we want to use. In our case that's just
['json']
. - The ID of the JSON Schema:
qTcJ1uVh5d7y3HLDCn0Q
.
We throw all of these into a dict, then throw each image's dict into a list.
Send the data to SceneXplain
That's really just a case of making an HTTP request and sending over our data. We've wrapped it into a function in the notebook.
Process the output data
This is just a case of extracting the alt-tag from the output JSON, and in our case, writing it to a CSV file then zipping it up along with all the other images. Our alt-text.csv
looks like:
filename,alt-tag
/tmp/tmpexs68in3/free-photo-of-leaves-on-the-branch.jpeg,"Close-up of a branch with mix of green and yellow leaves, portraying the onset of autumn, set against the blurred background of a serene forest."
/tmp/tmpexs68in3/free-photo-of-holida-christmas-party-drinks-ornaments.jpeg,"Vividly colored table setting with red tablecloth, two empty crystal glasses, Christmas decorations including candy canes, gold ornaments, a small Christmas tree, on a backdrop of a green curtain and pink walls."
/tmp/tmpexs68in3/free-photo-of-red.jpeg,Close-up of a red classic Malibu car's rear end.
/tmp/tmpexs68in3/free-photo-of-pose-woman-dress-in-the-desert-gold-light-curly-hair.jpeg,"Curly-haired woman in brown coat standing on beach, with the sun beaming light onto her."
/tmp/tmpexs68in3/free-photo-of-a-bowl-of-granola-with-fruit-and-nuts-on-a-wooden-cutting-board.jpeg,Bowl of granola with strawberries and pomegranate seeds on a wooden board on a dark brown table
/tmp/tmpexs68in3/free-photo-of-alexandrine-parakeet-in-side-view.png,"Close-up of a green parrot with a red beak and yellow eyes on a branch, looking to the right; with a blurry green background"
/tmp/tmpexs68in3/pexels-photo-12015253.png,"Central, old gas pump with a red and white color scheme labeled 'Benzin' with an attached black hose against a street scene backdrop"
You can view the process in full in the notebook.
Get started with SceneXplain and the Schema Store
Like what you see? Go to https://scenex.jina.ai to sign up and get started, and head on over to our Discord to join the conversation!