TY - GEN
T1 - Cross-media semantic representation via bi-directional learning to rank
AU - Wu, Fei
AU - Lu, Xinyan
AU - Zhang, Zhongfei
AU - Yan, Shuicheng
AU - Rui, Yong
AU - Zhuang, Yueting
PY - 2013
Y1 - 2013
N2 - In multimedia information retrieval, most classic approaches tend to represent different modalities of media in the same feature space. Existing approaches take either one-to-one paired data or uni-directional ranking examples (i.e., utilizing only text-query-image ranking examples or image-query text ranking examples) as training examples, which do not make full use of bi-directional ranking examples (bi-directional ranking means that both text-query-image and image-query text ranking examples are utilized in the training period) to achieve a better performance. In this paper, we consider learning a cross-media representation model from the perspective of optimizing a listwise ranking problem while taking advantage of bi-directional ranking examples. We propose a general cross-media ranking algorithm to optimize the bi-directional listwise ranking loss with a latent space embedding, which we call Bi-directional Cross-Media Semantic Representation Model (Bi-CMSRM). The latent space embedding is discriminatively learned by the structural large margin learning for optimization with certain ranking criteria (mean average precision in this paper) directly. We evaluate Bi-CMSRM on the Wikipedia and NUS-WIDE datasets and show that the utilization of the bi-directional ranking examples achieves a much better performance than only using the unidirectional ranking examples.
AB - In multimedia information retrieval, most classic approaches tend to represent different modalities of media in the same feature space. Existing approaches take either one-to-one paired data or uni-directional ranking examples (i.e., utilizing only text-query-image ranking examples or image-query text ranking examples) as training examples, which do not make full use of bi-directional ranking examples (bi-directional ranking means that both text-query-image and image-query text ranking examples are utilized in the training period) to achieve a better performance. In this paper, we consider learning a cross-media representation model from the perspective of optimizing a listwise ranking problem while taking advantage of bi-directional ranking examples. We propose a general cross-media ranking algorithm to optimize the bi-directional listwise ranking loss with a latent space embedding, which we call Bi-directional Cross-Media Semantic Representation Model (Bi-CMSRM). The latent space embedding is discriminatively learned by the structural large margin learning for optimization with certain ranking criteria (mean average precision in this paper) directly. We evaluate Bi-CMSRM on the Wikipedia and NUS-WIDE datasets and show that the utilization of the bi-directional ranking examples achieves a much better performance than only using the unidirectional ranking examples.
KW - Bi-directional learning to rank
KW - Cross-media representation
KW - Latent space embedding
UR - https://www.scopus.com/pages/publications/84887500308
U2 - 10.1145/2502081.2502097
DO - 10.1145/2502081.2502097
M3 - Conference contribution
SN - 9781450324045
T3 - MM 2013 - Proceedings of the 2013 ACM Multimedia Conference
SP - 877
EP - 886
BT - MM 2013 - Proceedings of the 2013 ACM Multimedia Conference
T2 - 21st ACM International Conference on Multimedia, MM 2013
Y2 - 21 October 2013 through 25 October 2013
ER -