When AI meets fashion: Azure Video Indexer’s new model for detecting Featured Clothing in videos

This post has been republished via RSS; it originally appeared at: Microsoft Tech Community - Latest Blogs - .

What happens when a Data Scientist combines complex neural networks with classic ML (machine learning) models? Spoiler – the results are super interesting and fashionable! But let's not jump ahead and talk about clothes in videos.

Videos are complex creatures. They include thousands of images (frames), audio with voice and special events such as clapping and laughter, and text (the transcript). Understanding key insights from videos is a super complex task, that requires smart AI (artificial intelligence) models.

Clothing plays a key role in many kinds of videos from the aspects of fashion, content, advertisement and more. To derive the clothing featured in a video, we first need to detect the people in the video, which is a complex task in AI. However, not all clothing has the same influence in a video compared to others. Since there can be many people observed in a video, how can we conclude what are the most influential clothing? The main characters’ clothing will probably be in the focus of the camera and be more relevant to the video’s content than the clothing of background characters. However, secondary characters can also have major influence if they take part in a key moment or are a celebrity guest.

This leads to the key idea of ranking clothing in a video - by the importance of the characters. How can we create an algorithm that decides who are the main characters, in terms of clothing?

When watching a movie or a TV series, we can understand from the content of the video who the main characters are. For example, they can appear for a long time in the video, be in the focus of the camera, wear branded clothing or even be celebrities. These are only a few examples of features that define the main characters. Our new model relies on these features and more to address the main characters in the video and rank their clothing by importance of the characters.

How does it work? Classic machine learning on top of advanced AI models

Featured Clothing is based on advanced AI models developed in Video Indexer in the past 5 years, where the fundamental model for this new algorithm is detecting and tracking people in videos. One can assume that time of appearance is a key feature for finding the main characters, however it’s not enough. A celebrity guest, or a person appearing in a key moment in the video, can also affect the importance of the person and their clothing. Therefore, we leverage the insights created by other AI models in Video Indexer. For example, celebrity recognition is a key feature based on a complex AI pipeline. In order to know that a person is a celebrity, we use the face pipeline in Video Indexer, which detects faces, groups them by people, and apply celebrity recognition model. We use a recently developed algorithm to match the group of the face with the observed person, and from that deduce if the observed person is indeed a celebrity.

How can we detect key moments? For that we also leverage two AI models developed in our team. We use Audio Events Detection, a complex neural network that detects special events such as laughter and crowd reaction. We combine these results with emotions that appear in the video, based on a neural network that detects expressed emotions in the video.

So now that we have a bucket full of amazing features, all based on clever AI models, how can we combine them all together to smartly detect the main characters and their clothing? This is where classic ML comes to place. We use the classic regression model to combine them all and give a score! The results are amazing, showing the main characters + their best frame, allowing a cascade of applications from video summarization to tailored commercials.

scheme 2nd option.png

Usage

We would like to detect the featured clothing and rank them by importance in the video [1]. First, upload and index your video to Video Indexer according to the guidelines. Use the option for advanced video and audio indexing.

Next, download the artifacts of the video using the download buttons on Video Indexer’s portal, as marked in the following screenshot:

VI screenshot with download.png

The artifacts hold the results of Featured Clothing in the zip named featuredclothing.zip. The results contain two objects:

featuredclothing.map.json. This file contains the instances of each featured clothing, with the following fields:
- id – ranking index (id=1 is the most important clothing)
- confidence – the score of the featured clothing
- frameIndex – the best frame of the clothing
- timestamp – corresponding to the frameIndex
- opBoundingBox – bounding box of the person
- faceBoundingBox – bounding box of the person’s face, if detected
- fileName – where the best frame of the clothing is saved
featuredclothing.frames.map. This folder contains images of the best frames that the featured clothing appeared in, corresponding to the field fileName in each instance in featuredclothing.map.json.

zip content.png

In this example, the algorithm ranks the clothing in the video by the importance of the characters, as expected. For example, celebrities like Kendrick Lamar and Rihanna are given high scores, but the host of the show is given lower score even though she appears for a long time in the video, since her appearance is in the interest of the video and a part of the main content.

Kendrick Lamar. Rank 1, confidence 0.93

Rihanna. Rank 2, confidence 0.9

The host. Rank 18, confidence 0.44

[1] 10 BEST Dressed Celebs At 2018 Grammys, Clevver News, YouTube.

Leave a Reply Cancel reply