TY - GEN
T1 - Error Detection in Egocentric Procedural Task Videos
AU - Lee, Shih Po
AU - Lu, Zijia
AU - Zhang, Zekun
AU - Hoai, Minh
AU - Elhamifar, Ehsan
N1 - Publisher Copyright: © 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - We present a new egocentric procedural error dataset containing videos with various types of errors as well as normal videos and propose a new framework for procedural error detection using error-free training videos only. Our framework consists of an action segmentation model and a contrastive step prototype learning module to segment actions and learn useful features for error detection. Based on the observation that interactions between hands and objects often inform action and error understanding, we propose to combine holistic frame features with relations features, which we learn by building a graph using active object detection followed by a Graph Convolutional Network. To handle errors, unseen during training, we use our contrastive step prototype learning to learn multiple prototypes for each step, capturing variations of error-free step executions. At inference time, we use feature-prototype similarities for error detection. By experiments on three datasets, we show that our proposed framework outperforms state-of-the-art video anomaly detection methods for error detection and provides smooth action and error predictions. 11Code and data is available at https://github.com/robert80203/EgoPER_official
AB - We present a new egocentric procedural error dataset containing videos with various types of errors as well as normal videos and propose a new framework for procedural error detection using error-free training videos only. Our framework consists of an action segmentation model and a contrastive step prototype learning module to segment actions and learn useful features for error detection. Based on the observation that interactions between hands and objects often inform action and error understanding, we propose to combine holistic frame features with relations features, which we learn by building a graph using active object detection followed by a Graph Convolutional Network. To handle errors, unseen during training, we use our contrastive step prototype learning to learn multiple prototypes for each step, capturing variations of error-free step executions. At inference time, we use feature-prototype similarities for error detection. By experiments on three datasets, we show that our proposed framework outperforms state-of-the-art video anomaly detection methods for error detection and provides smooth action and error predictions. 11Code and data is available at https://github.com/robert80203/EgoPER_official
KW - action segmentation
KW - egocentric videos
KW - error detection
KW - procedural learning
UR - https://www.scopus.com/pages/publications/85205976764
U2 - 10.1109/CVPR52733.2024.01765
DO - 10.1109/CVPR52733.2024.01765
M3 - Conference contribution
SN - 9798350353006
T3 - Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
SP - 18655
EP - 18666
BT - Proceedings - 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024
PB - IEEE Computer Society
T2 - 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024
Y2 - 16 June 2024 through 22 June 2024
ER -