Audio Examples for 'PERCEIVE AND PREDICT: SELF-SUPERVISED SPEECH REPRESENTATION BASED LOSS FUNCTIONS FOR SPEECH ENHANCEMENT'

Sourced from VoiceBank-DEMAND testset.

p257_006

Noise type: cafe
Mixing SNR: 17.5dB

Clean

Noisy

Baseline Specrogram MSE Loss

Baseline STOI Loss

Baseline SI-SDR Loss

Proposed HuBERT Encoder MSE Loss

HuBERT Output Layer MSE Loss

Proposed XLSR Encoder MSE Loss

XLSR Output Layer MSE Loss

p257_007

Noise type: cafe
Mixing SNR: 12.5dB

Clean

Noisy

Baseline Specrogram MSE Loss

Baseline STOI Loss

Baseline SI-SDR Loss

Proposed HuBERT Encoder MSE Loss

HuBERT Output Layer MSE Loss

Proposed XLSR Encoder MSE Loss

XLSR Output Layer MSE Loss

p257_008

Noise type: cafe
Mixing SNR: 7.5dB

Clean

Noisy

Baseline Specrogram MSE Loss

Baseline STOI Loss

Baseline SI-SDR Loss

Proposed HuBERT Encoder MSE Loss

HuBERT Output Layer MSE Loss

Proposed XLSR Encoder MSE Loss

XLSR Output Layer MSE Loss

p257_009

Noise type: cafe
Mixing SNR: 2.5dB

Clean

Noisy

Baseline Specrogram MSE Loss

Baseline STOI Loss

Baseline SI-SDR Loss

Proposed HuBERT Encoder MSE Loss

HuBERT Output Layer MSE Loss

Proposed XLSR Encoder MSE Loss

XLSR Output Layer MSE Loss

The following audio examples have been 'remixed' to show enhancement at more challenging SNR values than those present in the original VoiceBank-DEMAND testset.


p257_008

Noise type: cafe_remix
Mixing SNR: -10.0dB

Clean

Noisy

Baseline Specrogram MSE Loss

Baseline STOI Loss

Baseline SI-SDR Loss

Proposed HuBERT Encoder MSE Loss

HuBERT Output Layer MSE Loss

Proposed XLSR Encoder MSE Loss

XLSR Output Layer MSE Loss

p257_008

Noise type: cafe_remix
Mixing SNR: -5.0dB

Clean

Noisy

Baseline Specrogram MSE Loss

Baseline STOI Loss

Baseline SI-SDR Loss

Proposed HuBERT Encoder MSE Loss

HuBERT Output Layer MSE Loss

Proposed XLSR Encoder MSE Loss

XLSR Output Layer MSE Loss

p257_008

Noise type: cafe_remix
Mixing SNR: 0.0dB

Clean

Noisy

Baseline Specrogram MSE Loss

Baseline STOI Loss

Baseline SI-SDR Loss

Proposed HuBERT Encoder MSE Loss

HuBERT Output Layer MSE Loss

Proposed XLSR Encoder MSE Loss

XLSR Output Layer MSE Loss

p257_008

Noise type: cafe_remix
Mixing SNR: 5.0dB

Clean

Noisy

Baseline Specrogram MSE Loss

Baseline STOI Loss

Baseline SI-SDR Loss

Proposed HuBERT Encoder MSE Loss

HuBERT Output Layer MSE Loss

Proposed XLSR Encoder MSE Loss

XLSR Output Layer MSE Loss

p257_008

Noise type: cafe_remix
Mixing SNR: 10.0dB

Clean

Noisy

Baseline Specrogram MSE Loss

Baseline STOI Loss

Baseline SI-SDR Loss

Proposed HuBERT Encoder MSE Loss

HuBERT Output Layer MSE Loss

Proposed XLSR Encoder MSE Loss

XLSR Output Layer MSE Loss

p257_008

Noise type: cafe_remix
Mixing SNR: 15.0dB

Clean

Noisy

Baseline Specrogram MSE Loss

Baseline STOI Loss

Baseline SI-SDR Loss

Proposed HuBERT Encoder MSE Loss

HuBERT Output Layer MSE Loss

Proposed XLSR Encoder MSE Loss

XLSR Output Layer MSE Loss

p257_008

Noise type: cafe_remix
Mixing SNR: 20.0dB

Clean

Noisy

Baseline Specrogram MSE Loss

Baseline STOI Loss

Baseline SI-SDR Loss

Proposed HuBERT Encoder MSE Loss

HuBERT Output Layer MSE Loss

Proposed XLSR Encoder MSE Loss

XLSR Output Layer MSE Loss

p257_008

Noise type: cafe_remix
Mixing SNR: 25.0dB

Clean

Noisy

Baseline Specrogram MSE Loss

Baseline STOI Loss

Baseline SI-SDR Loss

Proposed HuBERT Encoder MSE Loss

HuBERT Output Layer MSE Loss

Proposed XLSR Encoder MSE Loss

XLSR Output Layer MSE Loss