Here, Ted Mann, CEO of Slyce, shares some retail advantages of visual search. Slyce specializes in visual product search. Avid readers will remember Ted’s feature as part of our ‘Conversations’ series last year. Previously he founded the SnipSnap coupon app, InJersey hyperlocal network, and was a writer and editor at Gannett.
The idiom goes: “A picture is worth a thousand words.” In the world of retail merchandising, buyers are now taking so many mobile photographs to aid in their decision-making process that they are essentially creating entire libraries. When combined with the right technologies, these images hold immense potential to provide valuable intelligence.
The technology of visual search — identifying products through pictures and photos — has seen rapid adoption in consumer apps like Pinterest and Houzz, as well as retailers like Home Depot and Macy’s. But it’s no longer just consumers who are embracing this tech. Merchandising teams of retailers such as Nordstrom and Neiman Marcus are now using visual product lookup to make smarter purchasing and merchandising decisions
Here’s how it works: by simply snapping a photo of a dress or shirt, a retailer can instantly call up historical sales data and trends on how similar looking products have sold in the past. So, if a buyer is at a vendor or trade show, they can utilize their internal app and simply snap a picture instead of having to trawl through an armful of binders and spreadsheets from past seasons.
Training the Visual Search Model
Modern approaches to visual search typically utilize deep learning and artificial intelligence to identify image similarity, and then map the best matches back to a retailer’s product catalog — and associated product descriptions and metadata. While there are several pre-trained models that be be used for basic image recognition exercises (identifying a cat or dog, say), in order to get to the level of specificity and accuracy for retail, especially fashion, customed training data is required.
What sort of training data? Images of the retail products, tagged and annotated with the relevant descriptors. Lots of images. This data is used to teach the model how to recognize and look up products with the specific features that would be required to find that item in the retailer’s catalog. Take, for example, fashion: you need thousands of examples of the different shirt collars or jeans cuts to properly customize the model.
Another key factor in the training of the model is what type of images are used for training. That is, are they production images created in a studio with blank backgrounds? Or are they real-world user-generated images, the kind of messy photos you and I take on our phones every day, complete with cluttered backgrounds and partial obstructions? Both types of images are extremely useful for training a visual-search engine, but the former will help the system accurately identify catalog-type images, while the latter is likely to perform well in a camera-search environment.
Merchandising teams at retailers may find a use for models trained on both catalog imagery and user generated content (UGC). A catalog-trained model can be used to deliver a find-similar search on known e-commerce images, allowing their employees or customers to search and refine. A model trained on UGC will enable people to take photographs, or wild images, in the physical world—say on a fashion runway, trade show, or on the street—and match the items back to the retailer’s catalog of products.
Mobile Form Factor
The rise of visual search in the merchandising world was born out of the need to be able to quickly find and identify historical products a retailer has sold. With more than 50 major retailers live with mobile camera search in their apps— companies including Amazon, Macy’s, ASOS, JCPenney, to name a few — the technology was already seeing massive customer adoption.
At many retailers, store associates and employees became the biggest power-users of visual search. At The Home Depot, more than 70% of all visual searches typically come from a store associate, using camera-search to help customers find and locate hardware in the aisle. It wasn’t long before the buyers at department stores followed suit.
The same visual search models being used to power the camera feature in a consumer app can be used for an internal product lookup. The key difference for internal apps is typically where that search takes you. In the case of Neiman Marcus, a visual search query in their internal merchandising app takes the user to a view where they can explore historical sales data and trends about a product — i.e. how many units were sold in the past two months? What was the average margin? This data, especially trends and predictive analytics, is being used by the merchandising team to make smarter decisions about what to buy, and how much inventory to float.
This type of internal mobile application may soon replace the massive binders and spreadsheets that buyers have depended on for decades. Think of it as merchandising AI and visual search in your pocket.
Metadata Enrichment and AI Exhaust
Once a visual search model is created, it can yield byproducts that can be used to solve other problems for retailers. One such problem is incomplete product metadata. Visual search can aid in translating an image into a series of keywords that can be used to essentially plug the holes. This process of filling out the metadata can even be done in combination with the buyer taking that very first photograph of an item.
Imagine if a retailer chooses to pull the trigger on an order. That initial photograph they took can instantly spawn a new product listing. Utilizing metadata enrichment the photograph can even help populate Product Information Management data, and kick-start the listing.
But Does It Work?
Is image recognition for fashion and apparel all that accurate? Using the latest deep-learning techniques, companies like Amazon and Slyce have managed to produce results that are 95% accurate at recognizing not just a product’s category—but its entire taxonomic structure.
This level of precision is a key reason why customer adoption of visual search technology has seen massive growth in recent years. Slyce has seen usage across its retailers growing by 20% month over month. Pinterest reported in February 2018 that they are seeing 600 million visual searches every month. With the latest Samsung and LG smartphones integrating product recognition into their operating-system camera, it’s clear that visual search, like voice search before it, is here to stay.
Perhaps most exciting for retailers and the merchandising teams coming to depend on visual search to help do their jobs, the accuracy of the technology is reaching new heights. The latest techniques around ordering and ranking image deep features have made it possible to filter and sort product sets not only by their text attributes, but also by their visually similar features … those visual cues that even a thousand words, or the best trained retail buyer, aren’t always able to describe.