espnet2.diar package¶
espnet2.diar.abs_diar¶
-
class
espnet2.diar.abs_diar.
AbsDiarization
[source]¶ Bases:
torch.nn.modules.module.Module
,abc.ABC
Initializes internal Module state, shared by both nn.Module and ScriptModule.
-
abstract
forward
(input: torch.Tensor, ilens: torch.Tensor) → Tuple[torch.Tensor, torch.Tensor, collections.OrderedDict][source]¶ Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.-
abstract
forward_rawwav
(input: torch.Tensor, ilens: torch.Tensor) → Tuple[torch.Tensor, torch.Tensor, collections.OrderedDict][source]¶
espnet2.diar.espnet_model¶
-
class
espnet2.diar.espnet_model.
ESPnetDiarizationModel
(frontend: Optional[espnet2.asr.frontend.abs_frontend.AbsFrontend], specaug: Optional[espnet2.asr.specaug.abs_specaug.AbsSpecAug], normalize: Optional[espnet2.layers.abs_normalize.AbsNormalize], label_aggregator: torch.nn.modules.module.Module, encoder: espnet2.asr.encoder.abs_encoder.AbsEncoder, decoder: espnet2.diar.decoder.abs_decoder.AbsDecoder, attractor: Optional[espnet2.diar.attractor.abs_attractor.AbsAttractor], diar_weight: float = 1.0, attractor_weight: float = 1.0)[source]¶ Bases:
espnet2.train.abs_espnet_model.AbsESPnetModel
Speaker Diarization model
If “attractor” is “None”, SA-EEND will be used. Else if “attractor” is not “None”, EEND-EDA will be used. For the details about SA-EEND and EEND-EDA, refer to the following papers: SA-EEND: https://arxiv.org/pdf/1909.06247.pdf EEND-EDA: https://arxiv.org/pdf/2005.09921.pdf, https://arxiv.org/pdf/2106.10654.pdf
-
collect_feats
(speech: torch.Tensor, speech_lengths: torch.Tensor, spk_labels: torch.Tensor = None, spk_labels_lengths: torch.Tensor = None, **kwargs) → Dict[str, torch.Tensor][source]¶
-
encode
(speech: torch.Tensor, speech_lengths: torch.Tensor, bottleneck_feats: torch.Tensor, bottleneck_feats_lengths: torch.Tensor) → Tuple[torch.Tensor, torch.Tensor][source]¶ Frontend + Encoder
- Parameters:
speech – (Batch, Length, …)
speech_lengths – (Batch,)
bottleneck_feats – (Batch, Length, …): used for enh + diar
-
forward
(speech: torch.Tensor, speech_lengths: torch.Tensor = None, spk_labels: torch.Tensor = None, spk_labels_lengths: torch.Tensor = None, **kwargs) → Tuple[torch.Tensor, Dict[str, torch.Tensor], torch.Tensor][source]¶ Frontend + Encoder + Decoder + Calc loss
- Parameters:
speech – (Batch, samples)
speech_lengths – (Batch,) default None for chunk interator, because the chunk-iterator does not have the speech_lengths returned. see in espnet2/iterators/chunk_iter_factory.py
spk_labels – (Batch, )
kwargs – “utt_id” is among the input.
-
espnet2.diar.label_processor¶
espnet2.diar.__init__¶
espnet2.diar.attractor.abs_attractor¶
-
class
espnet2.diar.attractor.abs_attractor.
AbsAttractor
[source]¶ Bases:
torch.nn.modules.module.Module
,abc.ABC
Initializes internal Module state, shared by both nn.Module and ScriptModule.
-
abstract
forward
(enc_input: torch.Tensor, ilens: torch.Tensor, dec_input: torch.Tensor) → Tuple[torch.Tensor, torch.Tensor][source]¶ Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.espnet2.diar.attractor.rnn_attractor¶
-
class
espnet2.diar.attractor.rnn_attractor.
RnnAttractor
(encoder_output_size: int, layer: int = 1, unit: int = 512, dropout: float = 0.1, attractor_grad: bool = True)[source]¶ Bases:
espnet2.diar.attractor.abs_attractor.AbsAttractor
encoder decoder attractor for speaker diarization
-
forward
(enc_input: torch.Tensor, ilens: torch.Tensor, dec_input: torch.Tensor)[source]¶ Forward.
- Parameters:
enc_input (torch.Tensor) – hidden_space [Batch, T, F]
ilens (torch.Tensor) – input lengths [Batch]
dec_input (torch.Tensor) – decoder input (zeros) [Batch, num_spk + 1, F]
- Returns:
[Batch, num_spk + 1, F] att_prob: [Batch, num_spk + 1, 1]
- Return type:
attractor
-
espnet2.diar.attractor.__init__¶
espnet2.diar.layers.abs_mask¶
-
class
espnet2.diar.layers.abs_mask.
AbsMask
[source]¶ Bases:
torch.nn.modules.module.Module
,abc.ABC
Initializes internal Module state, shared by both nn.Module and ScriptModule.
-
abstract
forward
(input, ilens, bottleneck_feat, num_spk) → Tuple[Tuple[torch.Tensor], torch.Tensor, collections.OrderedDict][source]¶ Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.-
abstract property
max_num_spk
¶
espnet2.diar.layers.multi_mask¶
-
class
espnet2.diar.layers.multi_mask.
MultiMask
(input_dim: int, bottleneck_dim: int = 128, max_num_spk: int = 3, mask_nonlinear='relu')[source]¶ Bases:
espnet2.diar.layers.abs_mask.AbsMask
Multiple 1x1 convolution layer Module.
This module corresponds to the final 1x1 conv block and non-linear function in TCNSeparator. This module has multiple 1x1 conv blocks. One of them is selected according to the given num_spk to handle flexible num_spk.
- Parameters:
input_dim – Number of filters in autoencoder
bottleneck_dim – Number of channels in bottleneck 1 * 1-conv block
max_num_spk – Number of mask_conv1x1 modules (>= Max number of speakers in the dataset)
mask_nonlinear – use which non-linear function to generate mask
-
forward
(input: Union[torch.Tensor, torch_complex.tensor.ComplexTensor], ilens: torch.Tensor, bottleneck_feat: torch.Tensor, num_spk: int) → Tuple[List[Union[torch.Tensor, torch_complex.tensor.ComplexTensor]], torch.Tensor, collections.OrderedDict][source]¶ Keep this API same with TasNet.
- Parameters:
input – [M, K, N], M is batch size
ilens (torch.Tensor) – (M,)
bottleneck_feat – [M, K, B]
num_spk – number of speakers
(Training – oracle,
Inference – estimated by other module (e.g, EEND-EDA))
- Returns:
[(M, K, N), …] ilens (torch.Tensor): (M,) others predicted data, e.g. masks: OrderedDict[
’mask_spk1’: torch.Tensor(Batch, Frames, Freq), ‘mask_spk2’: torch.Tensor(Batch, Frames, Freq), … ‘mask_spkn’: torch.Tensor(Batch, Frames, Freq),
]
- Return type:
masked (List[Union(torch.Tensor, ComplexTensor)])
-
property
max_num_spk
¶
espnet2.diar.layers.tcn_nomask¶
-
class
espnet2.diar.layers.tcn_nomask.
ChannelwiseLayerNorm
(channel_size)[source]¶ Bases:
torch.nn.modules.module.Module
Channel-wise Layer Normalization (cLN).
-
class
espnet2.diar.layers.tcn_nomask.
Chomp1d
(chomp_size)[source]¶ Bases:
torch.nn.modules.module.Module
To ensure the output length is the same as the input.
-
class
espnet2.diar.layers.tcn_nomask.
DepthwiseSeparableConv
(in_channels, out_channels, kernel_size, stride, padding, dilation, norm_type='gLN', causal=False)[source]¶ Bases:
torch.nn.modules.module.Module
-
class
espnet2.diar.layers.tcn_nomask.
GlobalLayerNorm
(channel_size)[source]¶ Bases:
torch.nn.modules.module.Module
Global Layer Normalization (gLN).
-
class
espnet2.diar.layers.tcn_nomask.
TemporalBlock
(in_channels, out_channels, kernel_size, stride, padding, dilation, norm_type='gLN', causal=False)[source]¶ Bases:
torch.nn.modules.module.Module
-
class
espnet2.diar.layers.tcn_nomask.
TemporalConvNet
(N, B, H, P, X, R, norm_type='gLN', causal=False)[source]¶ Bases:
torch.nn.modules.module.Module
Basic Module of tasnet.
- Parameters:
N – Number of filters in autoencoder
B – Number of channels in bottleneck 1 * 1-conv block
H – Number of channels in convolutional blocks
P – Kernel size in convolutional blocks
X – Number of convolutional blocks in each repeat
R – Number of repeats
norm_type – BN, gLN, cLN
causal – causal or non-causal
espnet2.diar.layers.__init__¶
espnet2.diar.separator.__init__¶
espnet2.diar.separator.tcn_separator_nomask¶
-
class
espnet2.diar.separator.tcn_separator_nomask.
TCNSeparatorNomask
(input_dim: int, layer: int = 8, stack: int = 3, bottleneck_dim: int = 128, hidden_dim: int = 512, kernel: int = 3, causal: bool = False, norm_type: str = 'gLN')[source]¶ Bases:
espnet2.enh.separator.abs_separator.AbsSeparator
Temporal Convolution Separator
Note that this separator is equivalent to TCNSeparator except for not having the mask estimation part. This separator outputs the intermediate bottleneck feats (which is used as the input to diarization branch in enh_diar task). This separator is followed by MultiMask module, which estimates the masks.
- Parameters:
input_dim – input feature dimension
layer – int, number of layers in each stack.
stack – int, number of stacks
bottleneck_dim – bottleneck dimension
hidden_dim – number of convolution channel
kernel – int, kernel size.
causal – bool, defalut False.
norm_type – str, choose from ‘BN’, ‘gLN’, ‘cLN’
-
forward
(input: Union[torch.Tensor, torch_complex.tensor.ComplexTensor], ilens: torch.Tensor) → Tuple[torch.Tensor, torch.Tensor][source]¶ Forward.
- Parameters:
input (torch.Tensor or ComplexTensor) – Encoded feature [B, T, N]
ilens (torch.Tensor) – input lengths [Batch]
- Returns:
[B, T, bottleneck_dim] ilens (torch.Tensor): (B,)
- Return type:
feats (torch.Tensor)
-
property
num_spk
¶
-
property
output_dim
¶
espnet2.diar.decoder.abs_decoder¶
-
class
espnet2.diar.decoder.abs_decoder.
AbsDecoder
[source]¶ Bases:
torch.nn.modules.module.Module
,abc.ABC
Initializes internal Module state, shared by both nn.Module and ScriptModule.
-
abstract
forward
(input: torch.Tensor, ilens: torch.Tensor) → Tuple[torch.Tensor, torch.Tensor][source]¶ Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.-
abstract property
num_spk
¶
espnet2.diar.decoder.__init__¶
espnet2.diar.decoder.linear_decoder¶
-
class
espnet2.diar.decoder.linear_decoder.
LinearDecoder
(encoder_output_size: int, num_spk: int = 2)[source]¶ Bases:
espnet2.diar.decoder.abs_decoder.AbsDecoder
Linear decoder for speaker diarization
-
forward
(input: torch.Tensor, ilens: torch.Tensor)[source]¶ Forward.
- Parameters:
input (torch.Tensor) – hidden_space [Batch, T, F]
ilens (torch.Tensor) – input lengths [Batch]
-
property
num_spk
¶
-
-
abstract property
-
abstract
-
abstract property
-
abstract
-
class
-
abstract
-
abstract
-
abstract