espnet.transform package¶
Initialize main package.
espnet.transform.spec_augment¶
Spec Augment module for preprocessing i.e., data augmentation
-
class
espnet.transform.spec_augment.
FreqMask
(**kwargs)[source]¶ Bases:
espnet.transform.functional.FuncTrans
freq mask for spec agument
- Parameters:
x (numpy.ndarray) – (time, freq)
n_mask (int) – the number of masks
inplace (bool) – overwrite
replace_with_zero (bool) – pad zero on mask if true else use mean
-
class
espnet.transform.spec_augment.
SpecAugment
(**kwargs)[source]¶ Bases:
espnet.transform.functional.FuncTrans
spec agument
apply random time warping and time/freq masking default setting is based on LD (Librispeech double) in Table 2
- Parameters:
x (numpy.ndarray) – (time, freq)
resize_mode (str) – “PIL” (fast, nondifferentiable) or “sparse_image_warp” (slow, differentiable)
max_time_warp (int) – maximum frames to warp the center frame in spectrogram (W)
freq_mask_width (int) – maximum width of the random freq mask (F)
n_freq_mask (int) – the number of the random freq mask (m_F)
time_mask_width (int) – maximum width of the random time mask (T)
n_time_mask (int) – the number of the random time mask (m_T)
inplace (bool) – overwrite intermediate array
replace_with_zero (bool) – pad zero on mask if true else use mean
-
class
espnet.transform.spec_augment.
TimeMask
(**kwargs)[source]¶ Bases:
espnet.transform.functional.FuncTrans
freq mask for spec agument
- Parameters:
spec (numpy.ndarray) – (time, freq)
n_mask (int) – the number of masks
inplace (bool) – overwrite
replace_with_zero (bool) – pad zero on mask if true else use mean
-
class
espnet.transform.spec_augment.
TimeWarp
(**kwargs)[source]¶ Bases:
espnet.transform.functional.FuncTrans
time warp for spec augment
move random center frame by the random width ~ uniform(-window, window) :param numpy.ndarray x: spectrogram (time, freq) :param int max_time_warp: maximum time frames to warp :param bool inplace: overwrite x with the result :param str mode: “PIL” (default, fast, not differentiable) or “sparse_image_warp”
(slow, differentiable)
- Returns numpy.ndarray:
time warped spectrogram (time, freq)
-
espnet.transform.spec_augment.
freq_mask
(x, F=30, n_mask=2, replace_with_zero=True, inplace=False)[source]¶ freq mask for spec agument
- Parameters:
x (numpy.ndarray) – (time, freq)
n_mask (int) – the number of masks
inplace (bool) – overwrite
replace_with_zero (bool) – pad zero on mask if true else use mean
-
espnet.transform.spec_augment.
spec_augment
(x, resize_mode='PIL', max_time_warp=80, max_freq_width=27, n_freq_mask=2, max_time_width=100, n_time_mask=2, inplace=True, replace_with_zero=True)[source]¶ spec agument
apply random time warping and time/freq masking default setting is based on LD (Librispeech double) in Table 2
- Parameters:
x (numpy.ndarray) – (time, freq)
resize_mode (str) – “PIL” (fast, nondifferentiable) or “sparse_image_warp” (slow, differentiable)
max_time_warp (int) – maximum frames to warp the center frame in spectrogram (W)
freq_mask_width (int) – maximum width of the random freq mask (F)
n_freq_mask (int) – the number of the random freq mask (m_F)
time_mask_width (int) – maximum width of the random time mask (T)
n_time_mask (int) – the number of the random time mask (m_T)
inplace (bool) – overwrite intermediate array
replace_with_zero (bool) – pad zero on mask if true else use mean
-
espnet.transform.spec_augment.
time_mask
(spec, T=40, n_mask=2, replace_with_zero=True, inplace=False)[source]¶ freq mask for spec agument
- Parameters:
spec (numpy.ndarray) – (time, freq)
n_mask (int) – the number of masks
inplace (bool) – overwrite
replace_with_zero (bool) – pad zero on mask if true else use mean
-
espnet.transform.spec_augment.
time_warp
(x, max_time_warp=80, inplace=False, mode='PIL')[source]¶ time warp for spec augment
move random center frame by the random width ~ uniform(-window, window) :param numpy.ndarray x: spectrogram (time, freq) :param int max_time_warp: maximum time frames to warp :param bool inplace: overwrite x with the result :param str mode: “PIL” (default, fast, not differentiable) or “sparse_image_warp”
(slow, differentiable)
- Returns numpy.ndarray:
time warped spectrogram (time, freq)
espnet.transform.transformation¶
Transformation module.
-
class
espnet.transform.transformation.
Transformation
(conffile=None)[source]¶ Bases:
object
Apply some functions to the mini-batch
Examples
>>> kwargs = {"process": [{"type": "fbank", ... "n_mels": 80, ... "fs": 16000}, ... {"type": "cmvn", ... "stats": "data/train/cmvn.ark", ... "norm_vars": True}, ... {"type": "delta", "window": 2, "order": 2}]} >>> transform = Transformation(kwargs) >>> bs = 10 >>> xs = [np.random.randn(100, 80).astype(np.float32) ... for _ in range(bs)] >>> xs = transform(xs)
espnet.transform.perturb¶
-
class
espnet.transform.perturb.
BandpassPerturbation
(lower=0.0, upper=0.75, seed=None, axes=(-1, ))[source]¶ Bases:
object
Randomly dropout along the frequency axis.
- The original idea comes from the following:
- “randomly-selected frequency band was cut off under the constraint of
leaving at least 1,000 Hz band within the range of less than 4,000Hz.”
- (The Hitachi/JHU CHiME-5 system: Advances in speech recognition for
everyday home environments using multiple microphone arrays; http://spandh.dcs.shef.ac.uk/chime_workshop/papers/CHiME_2018_paper_kanda.pdf)
-
class
espnet.transform.perturb.
NoiseInjection
(utt2noise=None, lower=-20, upper=-5, utt2ratio=None, filetype='list', dbunit=True, seed=None)[source]¶ Bases:
object
Add isotropic noise
-
class
espnet.transform.perturb.
SpeedPerturbation
(lower=0.9, upper=1.1, utt2ratio=None, keep_length=True, res_type='kaiser_best', seed=None)[source]¶ Bases:
object
The speed perturbation in kaldi uses sox-speed instead of sox-tempo, and sox-speed just to resample the input, i.e pitch and tempo are changed both.
“Why use speed option instead of tempo -s in SoX for speed perturbation” https://groups.google.com/forum/#!topic/kaldi-help/8OOG7eE4sZ8
Warning
This function is very slow because of resampling. I recommmend to apply speed-perturb outside the training using sox.
-
class
espnet.transform.perturb.
VolumePerturbation
(lower=-1.6, upper=1.6, utt2ratio=None, dbunit=True, seed=None)[source]¶ Bases:
object
espnet.transform.add_deltas¶
espnet.transform.wpe¶
espnet.transform.channel_selector¶
espnet.transform.functional¶
-
class
espnet.transform.functional.
FuncTrans
(**kwargs)[source]¶ Bases:
espnet.transform.transform_interface.TransformInterface
Functional Transformation
Warning
Builtin or C/C++ functions may not work properly because this class heavily depends on the inspect module.
Usage:
>>> def foo_bar(x, a=1, b=2): ... '''Foo bar ... :param x: input ... :param int a: default 1 ... :param int b: default 2 ... ''' ... return x + a - b
>>> class FooBar(FuncTrans): ... _func = foo_bar ... __doc__ = foo_bar.__doc__
-
property
func
¶
espnet.transform.transform_interface¶
-
class
espnet.transform.transform_interface.
Identity
[source]¶ Bases:
espnet.transform.transform_interface.TransformInterface
Identity Function
espnet.transform.__init__¶
Initialize main package.
espnet.transform.cmvn¶
espnet.transform.spectrogram¶
-
class
espnet.transform.spectrogram.
IStft
(n_shift, win_length=None, window='hann', center=True)[source]¶ Bases:
object
-
class
espnet.transform.spectrogram.
LogMelSpectrogram
(fs, n_mels, n_fft, n_shift, win_length=None, window='hann', fmin=None, fmax=None, eps=1e-10)[source]¶ Bases:
object
-
class
espnet.transform.spectrogram.
Spectrogram
(n_fft, n_shift, win_length=None, window='hann')[source]¶ Bases:
object
-
class
espnet.transform.spectrogram.
Stft
(n_fft, n_shift, win_length=None, window='hann', center=True, pad_mode='reflect')[source]¶ Bases:
object
-
class
espnet.transform.spectrogram.
Stft2LogMelSpectrogram
(fs, n_mels, n_fft, fmin=None, fmax=None, eps=1e-10)[source]¶ Bases:
object
-
espnet.transform.spectrogram.
istft
(x, n_shift, win_length=None, window='hann', center=True)[source]¶
-
espnet.transform.spectrogram.
logmelspectrogram
(x, fs, n_mels, n_fft, n_shift, win_length=None, window='hann', fmin=None, fmax=None, eps=1e-10, pad_mode='reflect')[source]¶
-
espnet.transform.spectrogram.
spectrogram
(x, n_fft, n_shift, win_length=None, window='hann')[source]¶
-
property
-
class