Skip to main navigation Skip to search Skip to main content

Boosting cross-media retrieval via visual-auditory feature analysis and relevance feedback

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

21 Scopus citations

Abstract

Different types of multimedia data express high-level semantics from different aspects. How to learn comprehensive high-level semantics from different types of data and enable efficient crossmedia retrieval becomes an emerging hot issue. There are abundant correlations among heterogeneous low-level media content, which makes it challenging to query cross-media data effectively. In this paper, we propose a new cross-media retrieval method based on short-term and long-term relevance feedback. Our method mainly focuses on two typical types of media data, i.e. image and audio. First, we build multimodal representation via statistical canonical correlation between image and audio feature matrices, and define cross-media distance metric for similarity measure; then we propose optimization strategy based on relevance feedback, which fuses short-term learning results and long-term accumulated knowledge into the objective function. Experiments on image-audio dataset have demonstrated the superiority of our method over several existing algorithms.

Original languageEnglish
Title of host publicationMM 2014 - Proceedings of the 2014 ACM Conference on Multimedia
PublisherAssociation for Computing Machinery
Pages953-956
Number of pages4
ISBN (Electronic)9781450330633
DOIs
StatePublished - Nov 3 2014
Event2014 ACM Conference on Multimedia, MM 2014 - Orlando, United States
Duration: Nov 3 2014Nov 7 2014

Publication series

NameMM 2014 - Proceedings of the 2014 ACM Conference on Multimedia

Conference

Conference2014 ACM Conference on Multimedia, MM 2014
Country/TerritoryUnited States
CityOrlando
Period11/3/1411/7/14

Keywords

  • Cross-media retrieval
  • Feature analysis
  • Relevance feedback

Fingerprint

Dive into the research topics of 'Boosting cross-media retrieval via visual-auditory feature analysis and relevance feedback'. Together they form a unique fingerprint.

Cite this