Skip to main navigation Skip to search Skip to main content

A Comparative Study of Video-Based Human Representations for American Sign Language Alphabet Generation

  • SUNY Buffalo

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

7 Scopus citations

Abstract

Sign language is a complex visual language, and automatic interpretations of sign language can facilitate communication involving deaf individuals. As one of the essential components of sign language, fingerspelling connects the natural spoken languages to the sign language and expands the scale of sign language vocabulary. In practice, it is challenging to analyze fingerspelling alphabets due to their signing speed and small motion range. The usage of synthetic data has the potential of further improving fingerspelling alphabets analysis at scale. In this paper, we evaluate how different video-based human representations perform in a framework for Alphabet Generation for American Sign Language (ASL). We tested three mainstream video-based human representations: two-stream inflated 3D ConvNet, 3D landmarks of body joints, and rotation matrices of body joints. We also evaluated the effect of different skeleton graphs and selected body joints. The generation process of ASL fingerspelling used a transformer-based Conditional Variational Autoencoder. To train the model, we collected ASL alphabet signing videos from 17 signers with dynamic alphabet signing. The generated alphabets were evaluated using automatic metrics of quality such as FID, and we also considered supervised metrics by recognizing the generated entries using Spatio-Temporal Graph Convolutional Networks. Our experiments show that using the rotation matrices of the upper body joints and the signing hand give the best results for the generation of ASL alphabet signing. Going forward, our goal is to produce articulated fingerspelling words by combining individual alphabets learned in this work.

Original languageEnglish
Title of host publication2024 IEEE 18th International Conference on Automatic Face and Gesture Recognition, FG 2024
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9798350394948
DOIs
StatePublished - 2024
Event18th IEEE International Conference on Automatic Face and Gesture Recognition, FG 2024 - Istanbul, Turkey
Duration: May 27 2024May 31 2024

Publication series

Name2024 IEEE 18th International Conference on Automatic Face and Gesture Recognition, FG 2024

Conference

Conference18th IEEE International Conference on Automatic Face and Gesture Recognition, FG 2024
Country/TerritoryTurkey
CityIstanbul
Period05/27/2405/31/24

Fingerprint

Dive into the research topics of 'A Comparative Study of Video-Based Human Representations for American Sign Language Alphabet Generation'. Together they form a unique fingerprint.

Cite this