Speech Samples


The model is evaluated with Libri2mix from:
librimix: an open-source dataset for generalizable speech separation [1]:

Mixture input DPCL2 uPIT3 Conv-TasNet4 SudoRMRF 1.0x5 Dual-Path-RNN6 A-FRCNN-16 Ground-Truth

References

[1] Cosentino J, Pariente M, Cornell S, et al. LibriMix: An Open-Source Dataset for Generalizable Speech Separation[J]. arXiv preprint arXiv:2005.11262, 2020.

[2] Hershey J R, Chen Z, Le Roux J, et al. Deep clustering: Discriminative embeddings for segmentation and separation[C]//2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2016: 31-35.

[3] Kolbæk M, Yu D, Tan Z H, et al. Multitalker speech separation with utterance-level permutation invariant training of deep recurrent neural networks[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2017, 25(10): 1901-1913.

[4] Luo Y, Mesgarani N. Conv-tasnet: Surpassing ideal time–frequency magnitude masking for speech separation[J]. IEEE/ACM transactions on audio, speech, and language processing, 2019, 27(8): 1256-1266.

[5] Tzinis E, Wang Z, Smaragdis P. Sudo rm-rf: Efficient networks for universal audio source separation[C]//2020 IEEE 30th International Workshop on Machine Learning for Signal Processing (MLSP). IEEE, 2020: 1-6.

[6] Luo Y, Chen Z, Yoshioka T. Dual-path rnn: efficient long sequence modeling for time-domain single-channel speech separation[C]//ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2020: 46-50.