Time-Frequency-Based Attention Cache Memory Model for Online Speech Separation

Guo Chen*, Kai Li*, Runxuan Yang, Xiaolin Hu
Tsinghua University
*These authors contributed equally to the article.


Abstract

Existing causal speech separation models have a significant performance gap compared to non-causal models, due to the difficulty in retaining historical information. To address this issue, we introduce a causal speech separation method called the Time-Frequency Attention Cache Memory model (TFACM). It models the spatio-temporal relationships between the time and frequency dimensions and an attention mechanism. Specifically, we use the LSTM layer to capture the relative spatial positions in the frequency dimension, while causal modeling is performed in the time dimension using both local and global representations, with a cache memory (CM) module introduced to store historical information. Additionally, we introduce a causal attention refinement (CAR) module to optimize the representation in the time dimension, thereby achieving finer-grained feature representations. Experimental results on public datasets demonstrated that TFACM significantly outperformed existing methods in speech separation performance, showcasing its robust capabilities in complex environments.

The pipeline of the TFACM separator and modules in it. Here, CConv represents the causal convolutional layer, CDeconv represents the causal transposed convolutional layer, and PW/DW-Conv represents the point-wise/depth-wise convolutional layer.

Comparison with State-of-the-art Models

TFACM outperformed previous SOTA causal models, including SKiM and ReSepFormer, with SDRi gains of 1.6 dB, 1.8dB and 1.9 dB on the WHAM!, the WHAMR! and LibriMix datasets.

Audio Demo

Demo One

Index Ground Truth TFACM TF-GridNet-Causal DPRNN SKiM ReSepFormer

SPK A

SPK B

Demo Two

Index Ground Truth TFACM TF-GridNet-Causal DPRNN SKiM ReSepFormer

SPK A

SPK B

Demo Three

Index Ground Truth TFACM TF-GridNet-Causal DPRNN SKiM ReSepFormer

SPK A

SPK B

Demo Four

Index Ground Truth TFACM TF-GridNet-Causal DPRNN SKiM ReSepFormer

SPK A

SPK B

Demo Five

Index Ground Truth TFACM TF-GridNet-Causal DPRNN SKiM ReSepFormer

SPK A

SPK B


Acknowledgements

Website template was borrowed from Colorful Image Colorization and Nerfies; the code can be found here and here. Thank you (.❛ ᴗ ❛.).