Unofficial Parallel WaveGAN implementation demo
This is the demonstration page of UNOFFICIAL following model implementations.
Github: https://github.com/kan-bayashi/ParallelWaveGAN
Audio samples (English)
Here is the comparison in the analysis-synthesis condition using LJSpeech dataset.
Note that we limit the frequency range from 80 to 7600 Hz in Mel spectrogram calculation.
- Groundtruth: Target speech.
- Parallel WaveGAN (official): Official samples provided in the official demo HP.
- Parallel WaveGAN (ours): Our samples based this config.
- MelGAN + STFT-loss (ours): Our samples based this config.
- FB-MelGAN (ours): Our samples based this config.
- MB-MelGAN (ours): Our samples based this config.
- HiFi-GAN (ours): Our samples based this config.
- StyleMelGAN (ours): Our samples based this config.
Groundtruth | ParallelWaveGAN (official) |
ParallelWaveGAN (ours) | MelGAN + STFT-loss (ours) |
FB-MelGAN (ours) | MB-MelGAN (ours) |
HiFiGAN (ours) | StyleMelGAN (ours) |
Groundtruth | ParallelWaveGAN (official) |
ParallelWaveGAN (ours) | MelGAN + STFT-loss (ours) |
FB-MelGAN (ours) | MB-MelGAN (ours) |
HiFiGAN (ours) | StyleMelGAN (ours) |
Groundtruth | ParallelWaveGAN (official) |
ParallelWaveGAN (ours) | MelGAN + STFT-loss (ours) |
FB-MelGAN (ours) | MB-MelGAN (ours) |
HiFiGAN (ours) | StyleMelGAN (ours) |
Groundtruth | ParallelWaveGAN (official) |
ParallelWaveGAN (ours) | MelGAN + STFT-loss (ours) |
FB-MelGAN (ours) | MB-MelGAN (ours) |
HiFiGAN (ours) | StyleMelGAN (ours) |
Groundtruth | ParallelWaveGAN (official) |
ParallelWaveGAN (ours) | MelGAN + STFT-loss (ours) |
FB-MelGAN (ours) | MB-MelGAN (ours) |
HiFiGAN (ours) | StyleMelGAN (ours) |
Audio samples (Japanese)
Audio sampels trained on JSUT dataset.
Note that groundtruth samples are 48 kHz and we downsampled to 24 kHz and we limit the frequency range from 80 to 7600 Hz in Mel spectrogram calculation.
- Groundtruth: Target speech.
- Parallel WaveGAN (ours): Our samples based this config.
Groundtruth | ParallelWaveGAN (ours) |
Groundtruth | ParallelWaveGAN (ours) |
Groundtruth | ParallelWaveGAN (ours) |
Audio samples (Mandarin)
Audio sampels trained on CSMSC dataset.
Note that groundtruth samples are 48 kHz and we downsampled to 24 kHz and we limit the frequency range from 80 to 7600 Hz in Mel spectrogram calculation.
- Groundtruth: Target speech.
- Parallel WaveGAN (ours): Our samples based this config.
Groundtruth | ParallelWaveGAN (ours) |
Groundtruth | ParallelWaveGAN (ours) |
Groundtruth | ParallelWaveGAN (ours) |
References
Author
Tomoki Hayashi
e-mail: hayashi.tomoki@g.sp.m.is.nagoya-u.ac.jp