Speech Samples


The model is evaluated with WSJ0-2mix from:
A machine-readable corpus of Wall Street Journal news text [1]:


Here are some examples of Pure speech separation. You can use the mouse to
hover over the lips of a speaker to hear the separated sound.

Here are some repositories of Pure speech separation.
You can click on the link to visit them.
If these code repositories help you, you can give a star and a fork.

Mixture input Deep Clustering1 PIT & uPIT2,3 Conv-TasNet4 Dual-Path-RNN5 Ground-Truth

References

[1] J. R. Hershey, Z. Chen, J. Le Roux, and S. Watanabe, “Deep clustering: Discriminative embeddings for segmentation and separation,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., 2016, pp. 31– 35.

[2] Yu D, Kolbæk M, Tan Z H, et al. Permutation invariant training of deep models for speaker-independent multi-talker speech separation[C]// 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2017: 241-245.

[3] Kolbæk M, Yu D, Tan Z H, et al. Multitalker speech separation with utterance-level permutation invariant training of deep recurrent neural networks[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2017, 25(10): 1901-1913.

[4] Luo Y, Mesgarani N. Conv-tasnet: Surpassing ideal time–frequency magnitude masking for speech separation[J]. IEEE/ACM transactions on audio, speech, and language processing, 2019, 27(8): 1256-1266.

[5] Luo Y, Chen Z, Yoshioka T. Dual-path RNN: efficient long sequence modeling for time-domain single-channel speech separation[J]. arXiv preprint arXiv:1910.06379, 2019.