Skeletal Graph Self-Attention: Embedding a Skeleton Inductive Bias into Sign Language Production (2112.05277v1)

Published 6 Dec 2021 in cs.CV and cs.CL

Abstract: Recent approaches to Sign Language Production (SLP) have adopted spoken language Neural Machine Translation (NMT) architectures, applied without sign-specific modifications. In addition, these works represent sign language as a sequence of skeleton pose vectors, projected to an abstract representation with no inherent skeletal structure. In this paper, we represent sign language sequences as a skeletal graph structure, with joints as nodes and both spatial and temporal connections as edges. To operate on this graphical structure, we propose Skeletal Graph Self-Attention (SGSA), a novel graphical attention layer that embeds a skeleton inductive bias into the SLP model. Retaining the skeletal feature representation throughout, we directly apply a spatio-temporal adjacency matrix into the self-attention formulation. This provides structure and context to each skeletal joint that is not possible when using a non-graphical abstract representation, enabling fluid and expressive sign language production. We evaluate our Skeletal Graph Self-Attention architecture on the challenging RWTH-PHOENIX-Weather-2014T(PHOENIX14T) dataset, achieving state-of-the-art back translation performance with an 8% and 7% improvement over competing methods for the dev and test sets.

Citations (13)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Skeletal Graph Self-Attention: Embedding a Skeleton Inductive Bias into Sign Language Production (2112.05277v1)

Summary

Related Papers