Speaker-independent Lipreading

This is the code repository for the work published in

Matteo Riva, Michael Wand, and Jürgen Schmidhuber. Motion dynamics improve speaker-independent lipreading. In Proc. ICASSP, 2020, pp. 4407-4411.


Example of content and motion frames

Abstract

We present a novel lipreading system that improves on the task of speaker-independent word recognition by decoupling motion and content dynamics. We achieve this by implementing a deep learning architecture that uses two distinct pipelines to process motion and content and subsequently merges them, implementing an end-to-end trainable system that performs fusion of independently learned representations. We obtain a average relative word accuracy improvement of ≈ 6.8% on unseen speakers and of ≈ 3.3% on known speakers, with respect to a baseline which uses a standard architecture.

Model

We implement two distinct pipelines that follow the same architecture as the baseline, excluding the final LSTM layer. One takes content frames as input and one takes the motion ones. The hidden representations learned by these are concatenated into a single joint representation used as input to a final LSTM layer, providing a end-to-end trainable system that implements the fusion of separately processed input sequences.


Baseline architecture	Dual pipeline Motion-Content

Name		Name	Last commit message	Last commit date
Latest commit History 310 Commits
CustomLayers		CustomLayers
Data		Data
Model		Model
OtherScripts		OtherScripts
imgs		imgs
.gitignore		.gitignore
README.md		README.md
baseline.py		baseline.py
baseline_bnk.py		baseline_bnk.py
dataset_evaluator.py		dataset_evaluator.py
dualseq_2bnk.py		dualseq_2bnk.py
dualseq_bnk.py		dualseq_bnk.py
dualseq_bsl.py		dualseq_bsl.py
dualseq_fc.py		dualseq_fc.py
dualseq_jointlstm.py		dualseq_jointlstm.py
jointseq_bsl.py		jointseq_bsl.py
mcnet_nextstep.py		mcnet_nextstep.py
mcnet_nextstep_class.py		mcnet_nextstep_class.py
mcnet_nextstep_dwnsampled.py		mcnet_nextstep_dwnsampled.py
mcnet_nextstep_fc.py		mcnet_nextstep_fc.py
motcnt_bsl.py		motcnt_bsl.py
motcnt_conv.py		motcnt_conv.py
motcnt_conv3d.py		motcnt_conv3d.py
motcnt_fc.py		motcnt_fc.py
motonly_bsl.py		motonly_bsl.py
motonly_fc.py		motonly_fc.py

ascarrambad/lipreading

Folders and files

Latest commit

History

Repository files navigation

Speaker-independent Lipreading

Abstract

Model

Results

About

Topics

Resources

Stars

Watchers

Forks

Languages