Abstract
Video summarization is a challenging task, mainly due to the difficulties in learning complicated semantic structural relations between videos and summaries. In this paper, we present a novel supervised video summarization scheme based on three-stage deep neural networks. The scheme takes a divide-And-conquer strategy to resolve the complicated task of 3D video summarization into a set of easy and flexible computational subtasks, and then to sequentially perform 2D CNNs, 1D CNNs, and long short-Term memory to address the subtasks in an hierarchical fashion. The hierarchical modeling of spatio-Temporal structure leads to high performance and efficiency. In addition, we propose a simple but effective user-ranking method to cope with the labeling subjectivity problem of user-created video summarization, leading to the labeling quality refinement for robust supervised learning. Experimental results show that our approach outperforms the state-of-The-Art video summarization methods on two benchmark datasets.
| Original language | English |
|---|---|
| Article number | 8585041 |
| Pages (from-to) | 2654-2664 |
| Number of pages | 11 |
| Journal | IEEE Transactions on Image Processing |
| Volume | 28 |
| Issue number | 6 |
| DOIs | |
| State | Published - Jun 2019 |
Keywords
- Video summarization
- convolutional neural network
- multi-user inconsistency
- recurrent neural network
- user ranking
Fingerprint
Dive into the research topics of 'User-Ranking Video Summarization with Multi-Stage Spatio-Temporal Representation'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver