Abstract
For TRECVID 2011 MED task, the GENIE system incorporated two late-fusion approaches where multiple discriminative base-classifiers are built per feature, then, combined later through discriminative fusion techniques. All of our fusion and base classifiers are formulated as one-vs-all detectors per event class along with threshold estimation capabilities during cross-validation. Total of five different types of features were extracted from data, which include both audio or visual features: HOG3D, Object Bank, Gist, MFCC, and acoustic segment models (ASMs). Features such as HOG3D and MFCC are low-level features while Object Bank and ASMs are more semantic. In our work, event-specific feature adaptations or manual annotations were deliberately avoided, to establish a strong baseline results. Overall, the results were competitive in the MED11 evaluation, and shows that standard machine learning techniques can yield fairly good results even on a challenging dataset.
| Original language | English |
|---|---|
| State | Published - 2011 |
| Event | TREC Video Retrieval Evaluation, TRECVID 2011 - Gaithersburg, MD, United States Duration: Dec 5 2011 → Dec 7 2011 |
Conference
| Conference | TREC Video Retrieval Evaluation, TRECVID 2011 |
|---|---|
| Country/Territory | United States |
| City | Gaithersburg, MD |
| Period | 12/5/11 → 12/7/11 |
Fingerprint
Dive into the research topics of 'GENIE TRECVID2011 multimedia event detection: Late-fusion approaches to combine multiple audio-visual features'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver