Skip to main navigation Skip to search Skip to main content

Multi-label triplet embeddings for image annotation from user-generated tags

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

3 Scopus citations

Abstract

This work studies the representational embedding of images and their corresponding annotations - in the form of tag metadata - such that, given a piece of the raw data in one modality, the corresponding semantic description can be retrieved in terms of the raw data in another. While convolutional neural networks (CNNs) have been widely and successfully applied in this domain with regards to detecting semantically simple scenes or categories (even though many such objects may be simultaneously present in an image), this work approaches the task of dealing with image annotations in the context of noisy, user-generated, and semantically complex multi-labels, widely available from social media sites. In this case, the labels for an image are diverse, noisy, and often not specifically related to an object, but rather descriptive or user-specific. Furthermore, the existing deep image annotation literature using this type of data typically utilizes the so-called CNN-RNN framework, combining convolutional and recurrent neural networks. We offer a discussion of why RNNs may not be the best choice in this case, though they have been shown to perform well on the similar captioning tasks. Our model exploits the latent image-text space through the use of a triplet loss framework to learn a joint embedding space for the images and their tags, in the presence of multiple, potentially positive exemplar classes. We present state-of-the-art results of the representational properties of these embeddings on several image annotation datasets to show the promise of this approach.

Original languageEnglish
Title of host publicationICMR 2018 - Proceedings of the 2018 ACM International Conference on Multimedia Retrieval
PublisherAssociation for Computing Machinery, Inc
Pages249-256
Number of pages8
ISBN (Print)9781450350464
DOIs
StatePublished - Jun 5 2018
Event8th ACM International Conference on Multimedia Retrieval, ICMR 2018 - Yokohama, Japan
Duration: Jun 11 2018Jun 14 2018

Publication series

NameICMR 2018 - Proceedings of the 2018 ACM International Conference on Multimedia Retrieval

Conference

Conference8th ACM International Conference on Multimedia Retrieval, ICMR 2018
Country/TerritoryJapan
CityYokohama
Period06/11/1806/14/18

Keywords

  • Convolutional neural networks
  • Image annotation
  • Triplet embeddings

Fingerprint

Dive into the research topics of 'Multi-label triplet embeddings for image annotation from user-generated tags'. Together they form a unique fingerprint.

Cite this