Skip to main navigation Skip to search Skip to main content

DISCO: Describing images using scene contexts and objects

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

2 Scopus citations

Abstract

In this paper, we propose a bottom-up approach to generating short descriptive sentences from images, to enhance scene understanding. We demonstrate automatic methods for mapping the visual content in an image to natural spoken or written language. We also introduce a human-in-the-loop evaluation strategy that quantitatively captures the meaningfulness of the generated sentences. We recorded a correctness rate of 60.34% when human users were asked to judge the meaningfulness of the sentences generated from relatively challenging images. Also, our automatic methods compared well with the state-of-the-art techniques for the related computer vision tasks.

Original languageEnglish
Title of host publicationAAAI-11 / IAAI-11 - Proceedings of the 25th AAAI Conference on Artificial Intelligence and the 23rd Innovative Applications of Artificial Intelligence Conference
Pages1487-1493
Number of pages7
StatePublished - 2011
Event25th AAAI Conference on Artificial Intelligence and the 23rd Innovative Applications of Artificial Intelligence Conference, AAAI-11 / IAAI-11 - San Francisco, CA, United States
Duration: Aug 7 2011Aug 11 2011

Publication series

NameProceedings of the National Conference on Artificial Intelligence
Volume2

Conference

Conference25th AAAI Conference on Artificial Intelligence and the 23rd Innovative Applications of Artificial Intelligence Conference, AAAI-11 / IAAI-11
Country/TerritoryUnited States
CitySan Francisco, CA
Period08/7/1108/11/11

Fingerprint

Dive into the research topics of 'DISCO: Describing images using scene contexts and objects'. Together they form a unique fingerprint.

Cite this