espnet2.layers package¶
espnet2.layers.stft¶
-
class
espnet2.layers.stft.
Stft
(n_fft: int = 512, win_length: int = None, hop_length: int = 128, window: Optional[str] = 'hann', center: bool = True, normalized: bool = False, onesided: bool = True)[source]¶ Bases:
torch.nn.modules.module.Module
,espnet2.layers.inversible_interface.InversibleInterface
-
extra_repr
()[source]¶ Set the extra representation of the module
To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.
-
forward
(input: torch.Tensor, ilens: torch.Tensor = None) → Tuple[torch.Tensor, Optional[torch.Tensor]][source]¶ STFT forward function.
- Parameters:
input – (Batch, Nsamples) or (Batch, Nsample, Channels)
ilens – (Batch)
- Returns:
(Batch, Frames, Freq, 2) or (Batch, Frames, Channels, Freq, 2)
- Return type:
output
-
inverse
(input: Union[torch.Tensor, torch_complex.tensor.ComplexTensor], ilens: torch.Tensor = None) → Tuple[torch.Tensor, Optional[torch.Tensor]][source]¶ Inverse STFT.
- Parameters:
input – Tensor(batch, T, F, 2) or ComplexTensor(batch, T, F)
ilens – (batch,)
- Returns:
(batch, samples) ilens: (batch,)
- Return type:
wavs
-
espnet2.layers.time_warp¶
Time warp module.
-
class
espnet2.layers.time_warp.
TimeWarp
(window: int = 80, mode: str = 'bicubic')[source]¶ Bases:
torch.nn.modules.module.Module
Time warping using torch.interpolate.
- Parameters:
window – time warp parameter
mode – Interpolate mode
espnet2.layers.sinc_conv¶
Sinc convolutions.
-
class
espnet2.layers.sinc_conv.
BarkScale
[source]¶ Bases:
object
Bark frequency scale.
Has wider bandwidths at lower frequencies, see: Critical bandwidth: BARK Zwicker and Terhardt, 1980
-
class
espnet2.layers.sinc_conv.
LogCompression
[source]¶ Bases:
torch.nn.modules.module.Module
Log Compression Activation.
Activation function log(abs(x) + 1).
Initialize.
-
class
espnet2.layers.sinc_conv.
MelScale
[source]¶ Bases:
object
Mel frequency scale.
-
class
espnet2.layers.sinc_conv.
SincConv
(in_channels: int, out_channels: int, kernel_size: int, stride: int = 1, padding: int = 0, dilation: int = 1, window_func: str = 'hamming', scale_type: str = 'mel', fs: Union[int, float] = 16000)[source]¶ Bases:
torch.nn.modules.module.Module
Sinc Convolution.
This module performs a convolution using Sinc filters in time domain as kernel. Sinc filters function as band passes in spectral domain. The filtering is done as a convolution in time domain, and no transformation to spectral domain is necessary.
This implementation of the Sinc convolution is heavily inspired by Ravanelli et al. https://github.com/mravanelli/SincNet, and adapted for the ESpnet toolkit. Combine Sinc convolutions with a log compression activation function, as in: https://arxiv.org/abs/2010.07597
Notes: Currently, the same filters are applied to all input channels. The windowing function is applied on the kernel to obtained a smoother filter, and not on the input values, which is different to traditional ASR.
Initialize Sinc convolutions.
- Parameters:
in_channels – Number of input channels.
out_channels – Number of output channels.
kernel_size – Sinc filter kernel size (needs to be an odd number).
stride – See torch.nn.functional.conv1d.
padding – See torch.nn.functional.conv1d.
dilation – See torch.nn.functional.conv1d.
window_func – Window function on the filter, one of [“hamming”, “none”].
fs (str, int, float) – Sample rate of the input data
espnet2.layers.utterance_mvn¶
-
class
espnet2.layers.utterance_mvn.
UtteranceMVN
(norm_means: bool = True, norm_vars: bool = False, eps: float = 1e-20)[source]¶
-
espnet2.layers.utterance_mvn.
utterance_mvn
(x: torch.Tensor, ilens: torch.Tensor = None, norm_means: bool = True, norm_vars: bool = False, eps: float = 1e-20) → Tuple[torch.Tensor, torch.Tensor][source]¶ Apply utterance mean and variance normalization
- Parameters:
x – (B, T, D), assumed zero padded
ilens – (B,)
norm_means –
norm_vars –
eps –
espnet2.layers.mask_along_axis¶
-
class
espnet2.layers.mask_along_axis.
MaskAlongAxis
(mask_width_range: Union[int, Sequence[int]] = (0, 30), num_mask: int = 2, dim: Union[int, str] = 'time', replace_with_zero: bool = True)[source]¶ Bases:
torch.nn.modules.module.Module
-
class
espnet2.layers.mask_along_axis.
MaskAlongAxisVariableMaxWidth
(mask_width_ratio_range: Union[float, Sequence[float]] = (0.0, 0.05), num_mask: int = 2, dim: Union[int, str] = 'time', replace_with_zero: bool = True)[source]¶ Bases:
torch.nn.modules.module.Module
Mask input spec along a specified axis with variable maximum width.
- Formula:
max_width = max_width_ratio * seq_len
-
espnet2.layers.mask_along_axis.
mask_along_axis
(spec: torch.Tensor, spec_lengths: torch.Tensor, mask_width_range: Sequence[int] = (0, 30), dim: int = 1, num_mask: int = 2, replace_with_zero: bool = True)[source]¶ Apply mask along the specified direction.
- Parameters:
spec – (Batch, Length, Freq)
spec_lengths – (Length): Not using lengths in this implementation
mask_width_range – Select the width randomly between this range
espnet2.layers.log_mel¶
-
class
espnet2.layers.log_mel.
LogMel
(fs: int = 16000, n_fft: int = 512, n_mels: int = 80, fmin: float = None, fmax: float = None, htk: bool = False, log_base: float = None)[source]¶ Bases:
torch.nn.modules.module.Module
Convert STFT to fbank feats
The arguments is same as librosa.filters.mel
- Parameters:
fs – number > 0 [scalar] sampling rate of the incoming signal
n_fft – int > 0 [scalar] number of FFT components
n_mels – int > 0 [scalar] number of Mel bands to generate
fmin – float >= 0 [scalar] lowest frequency (in Hz)
fmax – float >= 0 [scalar] highest frequency (in Hz). If None, use fmax = fs / 2.0
htk – use HTK formula instead of Slaney
-
extra_repr
()[source]¶ Set the extra representation of the module
To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.
-
forward
(feat: torch.Tensor, ilens: torch.Tensor = None) → Tuple[torch.Tensor, torch.Tensor][source]¶ Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.espnet2.layers.abs_normalize¶
-
class
espnet2.layers.abs_normalize.
AbsNormalize
[source]¶ Bases:
torch.nn.modules.module.Module
,abc.ABC
Initializes internal Module state, shared by both nn.Module and ScriptModule.
-
abstract
forward
(input: torch.Tensor, input_lengths: torch.Tensor = None) → Tuple[torch.Tensor, torch.Tensor][source]¶ Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.espnet2.layers.inversible_interface¶
espnet2.layers.__init__¶
espnet2.layers.global_mvn¶
-
class
espnet2.layers.global_mvn.
GlobalMVN
(stats_file: Union[pathlib.Path, str], norm_means: bool = True, norm_vars: bool = True, eps: float = 1e-20)[source]¶ Bases:
espnet2.layers.abs_normalize.AbsNormalize
,espnet2.layers.inversible_interface.InversibleInterface
Apply global mean and variance normalization
TODO(kamo): Make this class portable somehow
- Parameters:
stats_file – npy file
norm_means – Apply mean normalization
norm_vars – Apply var normalization
eps –
-
extra_repr
()[source]¶ Set the extra representation of the module
To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.
espnet2.layers.label_aggregation¶
-
class
espnet2.layers.label_aggregation.
LabelAggregate
(win_length: int = 512, hop_length: int = 128, center: bool = True)[source]¶ Bases:
torch.nn.modules.module.Module
-
class
-
abstract
-
class