espnet2.slu package¶
espnet2.slu.espnet_model¶
-
class
espnet2.slu.espnet_model.
ESPnetSLUModel
(vocab_size: int, token_list: Union[Tuple[str, ...], List[str]], frontend: Optional[espnet2.asr.frontend.abs_frontend.AbsFrontend], specaug: Optional[espnet2.asr.specaug.abs_specaug.AbsSpecAug], normalize: Optional[espnet2.layers.abs_normalize.AbsNormalize], preencoder: Optional[espnet2.asr.preencoder.abs_preencoder.AbsPreEncoder], encoder: espnet2.asr.encoder.abs_encoder.AbsEncoder, postencoder: Optional[espnet2.asr.postencoder.abs_postencoder.AbsPostEncoder], decoder: espnet2.asr.decoder.abs_decoder.AbsDecoder, ctc: espnet2.asr.ctc.CTC, joint_network: Optional[torch.nn.modules.module.Module], postdecoder: Optional[espnet2.slu.postdecoder.abs_postdecoder.AbsPostDecoder] = None, deliberationencoder: Optional[espnet2.asr.postencoder.abs_postencoder.AbsPostEncoder] = None, transcript_token_list: Union[Tuple[str, ...], List[str]] = None, ctc_weight: float = 0.5, interctc_weight: float = 0.0, ignore_id: int = -1, lsm_weight: float = 0.0, length_normalized_loss: bool = False, report_cer: bool = True, report_wer: bool = True, sym_space: str = '<space>', sym_blank: str = '<blank>', extract_feats_in_collect_stats: bool = True, two_pass: bool = False, pre_postencoder_norm: bool = False)[source]¶ Bases:
espnet2.asr.espnet_model.ESPnetASRModel
CTC-attention hybrid Encoder-Decoder model
-
collect_feats
(speech: torch.Tensor, speech_lengths: torch.Tensor, text: torch.Tensor, text_lengths: torch.Tensor, transcript: torch.Tensor = None, transcript_lengths: torch.Tensor = None, **kwargs) → Dict[str, torch.Tensor][source]¶
-
encode
(speech: torch.Tensor, speech_lengths: torch.Tensor, transcript_pad: torch.Tensor = None, transcript_pad_lens: torch.Tensor = None) → Tuple[torch.Tensor, torch.Tensor][source]¶ Frontend + Encoder. Note that this method is used by asr_inference.py
- Parameters:
speech – (Batch, Length, …)
speech_lengths – (Batch, )
-
forward
(speech: torch.Tensor, speech_lengths: torch.Tensor, text: torch.Tensor, text_lengths: torch.Tensor, transcript: torch.Tensor = None, transcript_lengths: torch.Tensor = None, **kwargs) → Tuple[torch.Tensor, Dict[str, torch.Tensor], torch.Tensor][source]¶ Frontend + Encoder + Decoder + Calc loss
- Parameters:
speech – (Batch, Length, …)
speech_lengths – (Batch, )
text – (Batch, Length)
text_lengths – (Batch,)
kwargs – “utt_id” is among the input.
-
espnet2.slu.__init__¶
espnet2.slu.postencoder.conformer_postencoder¶
Conformers PostEncoder.
-
class
espnet2.slu.postencoder.conformer_postencoder.
ConformerPostEncoder
(input_size: int, output_size: int = 256, attention_heads: int = 4, linear_units: int = 2048, num_blocks: int = 6, dropout_rate: float = 0.1, positional_dropout_rate: float = 0.1, attention_dropout_rate: float = 0.0, input_layer: str = 'linear', normalize_before: bool = True, concat_after: bool = False, positionwise_layer_type: str = 'linear', positionwise_conv_kernel_size: int = 3, macaron_style: bool = False, rel_pos_type: str = 'legacy', pos_enc_layer_type: str = 'rel_pos', selfattention_layer_type: str = 'rel_selfattn', activation_type: str = 'swish', use_cnn_module: bool = True, zero_triu: bool = False, cnn_module_kernel: int = 31, padding_idx: int = -1)[source]¶ Bases:
espnet2.asr.postencoder.abs_postencoder.AbsPostEncoder
Hugging Face Transformers PostEncoder.
espnet2.slu.postencoder.__init__¶
espnet2.slu.postencoder.transformer_postencoder¶
Encoder definition.
-
class
espnet2.slu.postencoder.transformer_postencoder.
TransformerPostEncoder
(input_size: int, output_size: int = 256, attention_heads: int = 4, linear_units: int = 2048, num_blocks: int = 6, dropout_rate: float = 0.1, positional_dropout_rate: float = 0.1, attention_dropout_rate: float = 0.0, input_layer: Optional[str] = 'linear', pos_enc_class=<class 'espnet.nets.pytorch_backend.transformer.embedding.PositionalEncoding'>, normalize_before: bool = True, concat_after: bool = False, positionwise_layer_type: str = 'linear', positionwise_conv_kernel_size: int = 1, padding_idx: int = -1)[source]¶ Bases:
espnet2.asr.postencoder.abs_postencoder.AbsPostEncoder
Transformer encoder module.
- Parameters:
input_size – input dim
output_size – dimension of attention
attention_heads – the number of heads of multi head attention
linear_units – the number of units of position-wise feed forward
num_blocks – the number of decoder blocks
dropout_rate – dropout rate
attention_dropout_rate – dropout rate in attention
positional_dropout_rate – dropout rate after adding positional encoding
input_layer – input layer type
pos_enc_class – PositionalEncoding or ScaledPositionalEncoding
normalize_before – whether to use layer_norm before the first block
concat_after – whether to concat attention layer’s input and output if True, additional linear will be applied. i.e. x -> x + linear(concat(x, att(x))) if False, no additional linear will be applied. i.e. x -> x + att(x)
positionwise_layer_type – linear of conv1d
positionwise_conv_kernel_size – kernel size of positionwise conv1d layer
padding_idx – padding_idx for input_layer=embed
-
forward
(xs_pad: torch.Tensor, ilens: torch.Tensor, prev_states: torch.Tensor = None) → Tuple[torch.Tensor, torch.Tensor, Optional[torch.Tensor]][source]¶ Embed positions in tensor.
- Parameters:
xs_pad – input tensor (B, L, D)
ilens – input length (B)
prev_states – Not to be used now.
- Returns:
position embedded tensor and mask
espnet2.slu.postdecoder.abs_postdecoder¶
-
class
espnet2.slu.postdecoder.abs_postdecoder.
AbsPostDecoder
[source]¶ Bases:
torch.nn.modules.module.Module
,abc.ABC
Initializes internal Module state, shared by both nn.Module and ScriptModule.
-
abstract
forward
(transcript_input_ids: torch.LongTensor, transcript_attention_mask: torch.LongTensor, transcript_token_type_ids: torch.LongTensor, transcript_position_ids: torch.LongTensor) → torch.Tensor[source]¶ Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.espnet2.slu.postdecoder.__init__¶
espnet2.slu.postdecoder.hugging_face_transformers_postdecoder¶
Hugging Face Transformers PostDecoder.
-
class
espnet2.slu.postdecoder.hugging_face_transformers_postdecoder.
HuggingFaceTransformersPostDecoder
(model_name_or_path: str, output_size=256)[source]¶ Bases:
espnet2.slu.postdecoder.abs_postdecoder.AbsPostDecoder
Hugging Face Transformers PostEncoder.
Initialize the module.
-
class
-
abstract