|
Coming Soon |
This dataset is designed for research in speech separation and speech enhancement, featuring realistic simulated environments using Soundspace2.0. Microphones, speech sources, and noise sources are randomly positioned within these simulated environments to create dynamic and challenging audio scenarios. The dataset includes: Speech data derived from the LibriSpeech dataset. Noise data from the Freesound Dataset 50k (FSD50K) and the Free Music Archive (FMA). Preprocessed music data from the FMA, which has had vocals removed using a pre-trained BSRNN music separation model. All audio samples are provided at a 16 kHz sample rate, with each sample being 60 seconds in length.
Source: LibriSpeech dataset.
Subset: LibriSpeech-360, containing approximately 360 hours of English speech data.
Sources: WHAM! dataset and DnR data.
Details: Includes a variety of noise types and background sound effects.
Source: Cleaned DnR dataset.
Preprocessing: Music tracks from the FMA have been preprocessed to remove vocals using a BSRNN model.
We compare different SOTA methods on the LibriSpace datasets.
Model | SI-SNR | SDR | NB-PESQ | WB-PESQ | STOI | MOS_NOISE | MOS_REVERB | MOS_SIG | MOS_OVRL | WER (%) |
---|---|---|---|---|---|---|---|---|---|---|
Conv-TasNet | 4.81 | 7.13 | 2.00 | 1.46 | 0.73 | 2.45 | 3.04 | 2.30 | 2.10 | 53.82 |
DPRNN | 4.87 | 6.65 | 2.17 | 1.63 | 0.77 | 2.54 | 3.28 | 2.47 | 2.11 | 47.81 |
DPTNet | 11.51 | 13.00 | 2.82 | 2.35 | 0.87 | 3.00 | 3.15 | 2.68 | 2.32 | 28.13 |
SuDoRM-RF | 8.01 | 9.70 | 2.47 | 1.98 | 0.81 | 2.95 | 3.26 | 2.63 | 2.25 | 35.61 |
A-FRCNN | 9.17 | 10.63 | 2.70 | 2.16 | 0.84 | 2.98 | 3.24 | 2.72 | 2.32 | 35.44 |
TDANet | 9.27 | 11.00 | 2.72 | 2.22 | 0.85 | 3.05 | 3.22 | 2.74 | 2.36 | 30.46 |
SKIM | 7.23 | 8.78 | 2.34 | 1.86 | 0.79 | 2.65 | 3.23 | 2.47 | 2.11 | 38.92 |
BSRNN | 9.10 | 10.86 | 2.82 | 2.26 | 0.85 | 2.93 | 3.11 | 2.84 | 2.45 | 29.86 |
TF-GridNet | 15.38 | 16.81 | 3.58 | 3.08 | 0.93 | 3.11 | 3.10 | 2.91 | 2.49 | 12.04 |
Mossformer | 14.72 | 15.97 | 3.02 | 2.67 | 0.91 | 3.11 | 3.24 | 2.76 | 2.39 | 21.10 |
Mossformer2 | 14.84 | 16.09 | 3.17 | 2.83 | 0.91 | 3.20 | 3.21 | 2.78 | 2.40 | 19.51 |
Model | SI-SNR | SDR | NB-PESQ | WB-PESQ | STOI | MOS_NOISE | MOS_REVERB | MOS_SIG | MOS_OVRL | WER (%) |
---|---|---|---|---|---|---|---|---|---|---|
Conv-TasNet | 4.12 | 5.38 | 1.84 | 1.42 | 0.65 | 1.98 | 3.53 | 2.21 | 1.81 | 63.21 |
DPRNN | 4.37 | 5.73 | 1.98 | 1.50 | 0.73 | 2.47 | 3.28 | 2.45 | 2.07 | 51.33 |
DPTNet | 11.69 | 12.80 | 2.67 | 2.13 | 0.84 | 2.91 | 3.14 | 2.54 | 2.23 | 29.05 |
SuDoRM-RF | 6.84 | 8.34 | 2.15 | 1.66 | 0.77 | 2.80 | 3.28 | 2.48 | 2.12 | 41.37 |
A-FRCNN | 7.59 | 9.32 | 2.52 | 2.00 | 0.82 | 2.94 | 3.24 | 2.67 | 2.29 | 33.82 |
TDANet | 7.00 | 8.68 | 2.26 | 1.71 | 0.79 | 2.71 | 3.25 | 2.58 | 2.19 | 37.16 |
SKIM | 6.00 | 7.42 | 2.23 | 1.75 | 0.77 | 2.63 | 3.29 | 2.44 | 2.10 | 42.82 |
BSRNN | 6.96 | 8.66 | 2.36 | 1.76 | 0.79 | 2.54 | 3.13 | 2.79 | 2.32 | 41.73 |
TF-GridNet | 14.37 | 15.69 | 3.45 | 2.84 | 0.91 | 3.31 | 3.15 | 2.96 | 2.58 | 14.43 |
Mossformer | 11.80 | 13.17 | 2.82 | 2.26 | 0.86 | 3.05 | 3.28 | 2.61 | 2.25 | 26.64 |
Mossformer2 | 11.12 | 12.34 | 2.62 | 2.09 | 0.83 | 2.87 | 3.31 | 2.55 | 2.20 | 32.65 |
Model | SI-SNR | SDR | NB-PESQ | WB-PESQ | STOI | MOS_NOISE | MOS_REVERB | MOS_SIG | MOS_OVRL | WER (%) |
---|---|---|---|---|---|---|---|---|---|---|
DCCRN | 8.41 | 11.29 | 2.81 | 2.17 | 0.87 | 2.94 | 3.01 | 2.80 | 2.39 | 21.78 |
Fullband | 7.82 | 8.34 | 3.05 | 2.34 | 0.89 | 3.30 | 3.04 | 2.95 | 2.54 | 22.04 |
FullSubNet | 9.48 | 11.92 | 3.19 | 2.48 | 0.90 | 3.24 | 3.05 | 2.98 | 2.54 | 20.01 |
Fast-FullSubNet | 8.14 | 8.71 | 3.13 | 2.41 | 0.90 | 3.31 | 3.05 | 2.99 | 2.58 | 21.13 |
FullSubNet+ | 8.93 | 11.07 | 3.06 | 2.35 | 0.89 | 3.12 | 2.97 | 2.91 | 2.47 | 20.73 |
TaylorSENet | 10.11 | 12.67 | 3.07 | 2.45 | 0.89 | 2.72 | 3.01 | 2.65 | 2.22 | 21.61 |
GaGNet | 10.01 | 12.78 | 3.12 | 2.48 | 0.89 | 2.77 | 3.05 | 2.64 | 2.23 | 21.40 |
G2Net | 9.82 | 12.22 | 3.03 | 2.39 | 0.89 | 2.78 | 3.00 | 2.64 | 2.22 | 22.02 |
Inter-SubNet | 10.34 | 12.87 | 3.32 | 2.61 | 0.91 | 3.39 | 3.10 | 3.05 | 2.62 | 18.83 |
SudoRMRF | 11.28 | 13.35 | 2.75 | 2.20 | 0.87 | 3.64 | 2.88 | 2.80 | 1.88 | 93.54 |
Model | SI-SNR | SDR | NB-PESQ | WB-PESQ | STOI | MOS_NOISE | MOS_REVERB | MOS_SIG | MOS_OVRL | WER (%) |
---|---|---|---|---|---|---|---|---|---|---|
DCCRN | 11.56 | 11.98 | 2.72 | 2.00 | 0.85 | 3.30 | 3.51 | 2.94 | 2.59 | 25.13 |
Fullband | 10.07 | 11.098 | 2.80 | 2.02 | 0.86 | 3.13 | 2.99 | 2.88 | 2.46 | 25.27 |
FullSubNet | 11.60 | 12.31 | 3.10 | 2.22 | 0.88 | 3.34 | 3.08 | 3.05 | 2.63 | 20.82 |
Fast-FullSubNet | 10.36 | 11.24 | 2.93 | 2.08 | 0.87 | 3.22 | 3.03 | 2.93 | 2.51 | 24.98 |
FullSubNet+ | 10.64 | 11.50 | 2.80 | 1.99 | 0.86 | 3.02 | 2.93 | 2.82 | 2.38 | 24.11 |
TaylorSENet | 12.18 | 13.04 | 3.06 | 2.33 | 0.88 | 2.76 | 2.92 | 2.65 | 2.24 | 23.46 |
GaGNet | 12.20 | 13.17 | 2.95 | 2.27 | 0.87 | 2.78 | 2.86 | 2.64 | 2.21 | 23.36 |
G2Net | 12.14 | 13.13 | 3.00 | 2.32 | 0.88 | 2.80 | 2.88 | 2.64 | 2.23 | 22.96 |
Inter-SubNet | 12.07 | 13.01 | 3.15 | 2.28 | 0.88 | 3.34 | 3.11 | 3.04 | 2.64 | 20.07 |
SudoRMRF | 12.99 | 13.86 | 2.61 | 2.01 | 0.85 | 3.91 | 2.80 | 2.98 | 1.93 | 88.72 |
Model | Params (M) | MACs (G/s) | CPU Inference (1s, ms) | GPU Inference (1s, ms) | Inference GPU Memory (1s, MB) | Backward GPU (1s, ms) | Backward GPU Memory (1s, MB) |
---|---|---|---|---|---|---|---|
Conv-TasNet | 5.62 | 10.23 | 71.67 | 8.59 | 134.34 | 42.34 | 647.22 |
DPRNN | 2.72 | 43.79 | 379.49 | 15.88 | 285.49 | 38.57 | 1757.00 |
DPTNet | 2.80 | 53.37 | 481.37 | 20.04 | 20.67 | 58.28 | 3120.22 |
SuDoRM-RF | 2.72 | 4.60 | 87.81 | 17.83 | 138.94 | 68.40 | 1058.76 |
A-FRCNN | 6.13 | 81.20 | 102.22 | 36.19 | 157.20 | 128.40 | 1141.86 |
TDANet | 2.33 | 9.13 | 169.47 | 32.88 | 145.56 | 89.62 | 3064.75 |
SKIM | 5.92 | 21.92 | 245.98 | 10.54 | 273.07 | 38.62 | 1083.77 |
BSRNN | 25.97 | 123.10 | 577.11 | 59.78 | 135.48 | 184.26 | 2349.62 |
TF-GridNet | 14.43 | 525.68 | 1525.98 | 64.59 | 615.04 | 165.55 | 6687.60 |
Mossformer | 42.10 | 85.54 | 473.74 | 49.71 | 163.68 | 153.84 | 4385.91 |
Mossformer2 | 55.74 | 112.67 | 830.66 | 93.33 | 163.52 | 297.07 | 5617.39 |
Model | Params (M) | MACs (G/s) | CPU Inference (1s, ms) | GPU Inference (1s, ms) | Inference GPU Memory (1s, MB) | Backward GPU (1s, ms) | Backward GPU Memory (1s, MB) |
---|---|---|---|---|---|---|---|
DCCRN | 3.67 | 14.38 | 98.42 | 5.81 | 30.42 | 35.42 | 124.66 |
Fullband | 6.05 | 0.39 | 5.98 | 1.99 | 23.01 | 10.21 | 73.39 |
FullSubNet | 5.64 | 30.87 | 58.46 | 3.66 | 144.21 | 15.25 | 491.20 |
Fast-FullSubNet | 6.84 | 4.14 | 12.33 | 4.63 | 26.75 | 20.12 | 111.45 |
FullSubNet+ | 8.66 | 31.11 | 110.44 | 9.50 | 147.02 | 37.40 | 521.49 |
TaylorSENet | 5.40 | 6.15 | 70.96 | 26.84 | 139.33 | 76.63 | 329.40 |
GaGNet | 5.95 | 1.66 | 66.72 | 29.72 | 129.59 | 84.05 | 226.49 |
G2Net | 7.39 | 2.85 | 98.29 | 47.56 | 130.33 | 162.51 | 291.98 |
Inter-SubNet | 2.29 | 36.71 | 78.81 | 4.40 | 216.91 | 14.59 | 725.93 |
SudoRMRF | 2.70 | 2.12 | 42.43 | 11.42 | 8.52 | 52.59 | 293.44 |