Home Featured Metadata, Data and Big Data; how XMP can bring it all together

Metadata, Data and Big Data; how XMP can bring it all together


In his first guest blog for 2017, Dan Hudson – President of E-Spec and long-time contributor to WhichPLM – breaks down the various datum in our industry. For those looking to wade through this territory, this is a great guide.

It wasn’t long ago that a discussion of ‘metadata’ would require an introduction and definition of the term. Since the NSA has made us all familiar with metadata, the introduction is no longer necessary, but the definition is still required. The distinction between the data about phone calls and the actual contents of the call captures the common definition; metadata is data about data. In this case attributes of the phone call might include the duration, the phone numbers involved, the locations etc., but not the actual call.

PLM Data

Data in your PLM (Product Lifecycle Management) system could also be considered metadata; it is data about a product while the product may still be abstract data. Your system also contains additional data about the first set of data. If the color of your product is metadata then the CMYK representation is data about that data. So this definition leads to almost all data being considered metadata.

It seems that in most discussions regarding metadata, the definition is more strict; all data isn’t considered metadata. In many cases what is referred to as metadata is really embedded metadata. The most familiar embedded metadata is the data that applications embed in the file when it is saved or created. Besides the obvious file name, path and date/time stamp, most applications also embed the user’s name and the version of the application being used. Cameras (including smart phones) embed metadata in their files: camera model, focal length, exposure, orientation, resolution, flash, etc. Smart phones and cameras with GPS capability now embed the location of where the picture was taken. Embedded metadata is unique in that it follows the image/file/asset through the workflow. If the image is copied so is the embedded metadata. If the image is shared via applications similar to DropBox, the embedded metadata is also shared. As the image travels through the workflow steps additional metadata can be embedded as the data is defined.

Standardized Embedded Data

Several industries have created standard schemas for embedded metadata; photography, printing/publishing, medical and other industries early on created metadata standards in proprietary formats. Adobe released its XMP (Extensible Metadata Platform) in 2001, by 2005 these proprietary schemas began embracing XMP; creating XMP versions of their particular standard. Cameras and smart phones’ use of XMP has made XMP the de facto standard for embedded metadata. The extensibility of XMP makes it a powerful tool for maintaining various types of data about the file. The syntax of XMP is based on XML, making it familiar for developers, furthering it’s broad acceptance as a standard.

Social Media Data

Social media has popularized “tagging”; tagging is applying metadata of sorts – this metadata is typically stored in the application or website with the image rather than embedded in the image. Tags are similar to keywords (which can be embedded or not); they can be used for searching and filtering of images. In both cases – tagging or keywords – the information is usually “visible” as the data can be identified by looking at the image.

Organization Structural Data

Metadata that cannot be determined by examining the image or contents of the file must be collected from the author or person processing the file. When this metadata relates to a business process or organization, it is sometimes referred to as structural metadata. Company, division, brand, channel, customer, season, product type, pricing are all examples of structural metadata. This data indicates to who and where the file belongs as it progresses through the product’s workflow or lifecycle. It is the data that ties business processes and systems together.

In order for “structural metadata” to be useful it must be consistent. When collected properly this metadata can be used as an index to a database or business system to link the file to a particular record or product. This means some structural metadata must be “required” at a certain point in the workflow. Its value must match the dataset used by the database or business system. Abbreviations, spelling and sometimes capitalization must be consistent among the databases and systems if the structural metadata is to be useful in integration between systems and organizations. Providing tools to help users enter structural metadata in a consistent manner becomes crucial.

Contextual Data

If you want to search for contextual information – maybe looking for an image of a dog on a beach – then tags or keywords are appropriate. Even then the tags and keywords must be managed. If an image is tagged with “canine”, will a search for “dog” retrieve the image? In this case the user can determine if the search is working by browsing the previews in the search result. If you are building a website that needs to retrieve the approved image to display for a particular product, the structural metadata needs to find one and only one image. The approval status of the image is structural metadata. Product ID or SKU may be part of the structural metadata used to retrieve the image as well.

Big Data Analysis

We now have concepts of “big data” and “analytics”. If we think about all the pieces of data being collected by websites and business applications, they seem to represent the other side of metadata. The number of “hits” or point of sale data relates to products in a different manner than metadata. The analysis of all of this data requires ways to filter and sort the large data set. Many of these filter values will be metadata or even structural metadata. Having a well-defined taxonomy and nomenclature allows analytics to be implemented in a straightforward manner. When defining your taxonomy, consideration of how you want to report on your data is critical. Consistent values for report data need to be captured in your metadata.

Why Adobe XMP metadata?

Adobe’s XMP metadata standard (which I have no affiliation with before you ask) is ideally suited to define this structural metadata. Your organization can define its own namespace, identifying the metadata as belonging to your organization. Fields are defined within the namespace with an XMP “tag”. The field type, its display label and the allowed values are also defined. This creates a nomenclature and taxonomy for your organization. You can create an enterprise taxonomy but then breakdown the metadata into subsets, collecting and displaying only values that the current user knows for a particular step in the workflow. By limiting the amount of metadata that any one user has to enter, as well as limiting the number of keystrokes to collect the data, user adoption of XMP is enhanced. By using validation lists and default values the user can complete the metadata entry with just a few keystrokes.

Adobe XMP is currently supported by many file types; besides all native Adobe file formats additional file formats include: JPG, PNG, PDF, TIFF, GIF, MP3, MP4. Many business systems support importing XMP from files being added to the system; this feature has become a de facto standard among Digital Asset Management (DAM) systems as well as many graphics-oriented product systems. Adobe provides a software developer kit (SDK) allowing developers to easily add support for reading and writing XMP data in their applications. A common method for using these capabilities is sending XMP data as XML to RESTful APIs tied to the target business systems.

What’s coming in metadata?

The future of XMP may be influenced by the new partnership between Adobe and Microsoft. The new relationship will provide integration between the companies’ cloud environments; one piece of the partnership is a common data model, Experience Data Model (XDM). While the exact role of XMP within XDM is still uncertain, XMP will provide the metadata integration between Adobe’s Creative Cloud and its marketing cloud, Adobe Experience Manager (AEM), making it a critical component of getting metadata from the creative users to the rest of the enterprise.

XMP Data to the rescue

If your organization is drowning in graphics, images, artwork, other assets and Big Data; if your systems have trouble translating data or integrating between departments and systems or if you track status and workflows with spreadsheets –XMP metadata can help organize your data and facilitate communication. So when you analyze your Big Data, the data was right from the very beginning and you are able to get a true picture from your analytics in the end.

Lydia Mageean Lydia Mageean has been part of the WhichPLM team for eight years now. She has a creative and media background, and is responsible for maintaining and updating our website content, liaising with advertisers, working on special projects like our PLM Project Pack, or our Annual Publications, and more.Joining mid-2013 as our Online Editor, she has since become WhichPLM’s Editor. In addition to taking on writing and interviewing responsibilities, Lydia has also become the primary point of contact for news, events, features and other aspects of our ever-growing online content library and tools.