We've been putting a lot of work into SceneXplain recently, especially the visual question-answering. If you're not familiar with it, SceneXplain uses AI to generate textual descriptions from images. It's designed with a user-friendly interface and offers API integration, making it accessible for application integration. Instead of simply providing captions, it aims to produce more descriptive narratives about image content.
In this post, we'll explore how we can use the feature to perform image sentiment analysis for brand social listening.
Due to pesky copyrights and trademarks, we'll avoid any real brands, and use a fictional brand, Duff Beer (from The Simpsons), as our first example.
What is sentiment analysis?
I'm more of a geek than a marketer, so I'll let Wikipedia do the talking:
Sentiment analysis (also known as opinion mining or emotion AI) is the use of natural language processing, text analysis, computational linguistics, and biometrics to systematically identify, extract, quantify, and study affective states and subjective information. Sentiment analysis is widely applied to voice of the customer materials such as reviews and survey responses, online and social media, and healthcare materials for applications that range from marketing to customer service to clinical medicine. With the rise of deep language models, such as RoBERTa, also more difficult data domains can be analyzed, e.g., news texts where authors typically express their opinion/sentiment less explicitly.
One of the key challenges in sentiment analysis is removing noise from the image we're looking at. If you're looking at a picture of happy people drinking beer at an outdoor festival, you generally don't care about the shape of the clouds or whether the bartender is wearing a T-shirt or a polo shirt. You care about the people (are they happy? Are they miserable?) and the brand of the beer (is it your brand or a competitor's?)
Visual question answering for understanding images
Let's start by asking a simple question about an image: Is there beer in this picture?
And the output from SceneXplain is:
Yes, there are bottles and cans of Duff beer visible in the image.
Homer Simpson is holding a bottle of Duff beer, and a can of
Duff beer is resting on the grass in front of an open field.
That's nice but not super useful if we want to do batch analysis on images. In a case like that, we'd want a standardized output so we can perform deeper analysis later on. We'd want something like True
or False
. So let's change the prompt to Return "True" if there is beer in the picture. Otherwise, return "False"
. Now SceneXplain simply returns True
for the image above:
We can also do things like getting a detailed description of the image:
Please identify and describe the beer brand, location and atmosphere
from the photo. Your answer should provide enough detail and
description to allow us to understand the scenario accurately.
You can provide any relevant information such as backstory,
scene analysis, participant interactions, etc.
That gives us:
The beer brand in the image is the iconic Duff beer, which is a
fictional beer brand featured in the animated television series,
The Simpsons. The location is an outdoor setting with a whimsical
blue sky, fluffy clouds, and the Simpsons family gathered in a
grassy field within the woods. The atmosphere is lively and
entertaining, with Homer Simpson enthusiastically holding a bottle
of Duff beer in front of an entertained crowd. The lush greenery
of the scene contrasts with the eccentrically colorful animation
scheme, capturing the spirit of The Simpsons. In the background,
the iconic Simpsons logo assumes the word "Duff," emphasizing
the refreshing energy of the moment. The family, including
Marge, Bart, Lisa, and Maggie, stands together, collectively
enjoying the outdoors.
As you can see, there's a little bit of hallucination going on (as is normal with these kinds of AI models). I can't see any of the Simpsons’ family members (outside of Homer) or the logo assuming anything. But the general gist is there, with the details that are important for sentiment analysis.
But enough with the cartoons - let’s test this on some real photos and in a real-world use case via the API and working with JSON.
Analyzing brand sentiment via API
Alas, we do not live in Springfield like the Simpsons. No dodgy nuclear reactors or Duff Beer for us. But we can make do with some real-world brands to quench our thirst.
We'll use SceneXplain's API to easily analyze some photos in bulk. In our case we'll use Python, though JavaScript and cURL access are also available. Here are the images we'll analyze:
To follow along with this code, you'll need a secret key, which you can generate from SceneXplain's API page:
Now for some code. First, let's set our question:
question = """
Identify the brands, scene attributes and sentiment categories (positive, negative, neutral),
and provide a response in the following JSON format:
---format begin---
{"brands": [brand], "scene": scene, "sentiment": sentiment}
--format end---
"""
Next up, we pass along our image URLs, setting the question_answer
flag and question
:
data = {
"data": [
{"image": "https://images.pexels.com/photos/1267305/pexels-photo-1267305.jpeg", "features": ["question_answer"], "question": question},
{"image": "https://images.pexels.com/photos/2076755/pexels-photo-2076755.jpeg", "features": ["question_answer"], "question": question},
{"image": "https://images.pexels.com/photos/3922817/pexels-photo-3922817.jpeg", "features": ["question_answer"], "question": question},
{"image": "https://images.pexels.com/photos/3922818/pexels-photo-3922818.jpeg", "features": ["question_answer"], "question": question},
{"image": "https://images.pexels.com/photos/5935229/pexels-photo-5935229.jpeg", "features": ["question_answer"], "question": question}
]
}
And now we pass all of that to SceneXplain:
import http.client
import json
headers = {
"x-api-key": f"token {YOUR_GENERATED_SECRET}",
"content-type": "application/json",
}
connection = http.client.HTTPSConnection('us-central1-causal-diffusion.cloudfunctions.net')
connection.request("POST", "/describe", json.dumps(data), headers)
response = connection.getresponse()
response_data = response.read().decode("utf-8")
scenes = json.loads(response_data)['result']
connection.close()
While our response object is JSON, within that response is SceneXplain's answer to our question. This is always in a plain string, so let's cast that to JSON to make it easier to work with:
for scene in scenes:
scene['answer'] = json.loads(scene['answer'])
Last but not least, let's see what we've got:
from pprint import pprint
pprint([scene['answer'] for scene in scenes])
Here's the output:
[{'brands': ['Hot Lizard IPA'],
'scene': 'upscale pub with patrons enjoying drinks and ambiance, featuring '
'neon beer signs and a bartender pouring beer into a glass with '
'precision',
'sentiment': 'positive'},
{'brands': ['Heineken'],
'scene': 'urban street scene with a young woman holding a beer bottle and a '
'skateboarder moving behind her, drinking Heineken beer',
'sentiment': 'neutral'},
{'brands': ['Heineken'],
'scene': 'friends celebrating with Heineken beer and wine glasses, creating '
'a warm and joyful atmosphere',
'sentiment': 'positive'},
{'brands': ['Budweiser'],
'scene': 'social gathering in a cozy, dimly lit lounge with friends raising '
'glasses of frothy golden beer in joyful celebration',
'sentiment': 'positive'},
{'brands': [],
'scene': "rooftop party with friends enjoying each other's company and city "
'view',
'sentiment': 'positive'}]
As you can see, it's pretty good:
- It usually picks up the brand and sentiment, though there is the occasional hallucination when it comes to the scene descriptions.
- In cases where there is no visible beer brand (like in the crowd image of people raising their bottles), it returns an empty list for
brands
. - It did screw up the 'Hopadillo' brand, consistently calling it 'Hot Lizard' during my testing. This may be down to it being a niche brand (with limited training data) and the image model doing the best of a bad job by guessing "Hot Lizard", which are actually words in its training set.
Detecting brand variants and localization
Out of interest, I looked into how well this works with branding edge cases - for example a Chinese Coca-Cola can, or a customized Coca-Cola can:
For those, we got the results:
[{'brands': ['Coca-Cola'],
'scene': 'a display of Coca-Cola cans adorned with Chinese writing featuring '
'the iconic Coca-Cola logo on a sleek black surface',
'sentiment': 'positive'},
{'brands': ['Coca-Cola'],
'scene': 'A coffee cup and a can of Coca-Cola on a blue surface with a red '
"background and 'My Hero' sign in the back.",
'sentiment': 'neutral'}]
As we can see, brand detection is on point. We get good results for testing negative sentiments too.
With code like the above and SceneXplain's API, you can easily do bulk analysis of images to get deeper insights into how consumers see your brand. This allows you to identify prevalent patterns, trends, or themes that may be influencing consumer perception and engagement. By understanding this, you can tailor your marketing strategy to enhance positive aspects and address any identified issues, ultimately optimizing your brand's reach and impact.
What else can SceneXplain do?
Dive into some of our prior blog posts to find out!
Get started with SceneXplain and sentiment analysis
To analyze those sentiments and optimize your brand strategy, sign up on SceneXplain now, and join the discussion on our Discord server.