// Tomoki Hayashi

About Me

Name:
- Tomoki Hayashi (Ph. D)
Affiliation:
- COO @ Human Dataware Lab. Co., Ltd., Japan
- Postdoctroal researcher @ Nagoya University, Japan
- Researcher @ TARVO Inc., Japan
Research Interests:
- Speech processing
  - Speech synthesis
  - Speech recognition
  - Voice conversion
- Environmental sound processing
  - Sound event detection
  - Anomalous sound detection
- Time series processing
  - Demand forcast
  - Anomaly detection

Bio

Short Bio

Tomoki Hayashi received the B.E. degree in engineering and the M.E. and Ph.D. degrees in information science from Nagoya University, Aichi, Japan, in 2014, 2016, and 2019, respectively. His research interests include statistical speech and audio signal processing. He is currently working as a postdoctoral researcher at Nagoya University and the chief operating officer of Human Dataware Lab. Co., Ltd. He is a main developer of the end-to-end speech processing toolkit ESPnet. He received the IEEE SPS Japan 2020 Young Author Best Paper Award and the Itakura Award from the Acoustical Society of Japan.

Education

Apr. 2010 - Mar. 2014
- School of Engineering, Nagoya University, Japan
  - B.E. degree in Engineering, 1999
  - Superviser: Kazuya Takeda, Norihide Kitaoka
Apr. 2014 - Mar. 2019
- Graduate School of Information Science, Nagoya University, Japan
  - Master degree in Information Science, 2016
  - Doctor degree in Information Science, 2019
  - Superviser: Kazuya Takeda, Tomoki Toda

Work / Research experience

Aug. 2014 - Sep. 2014
- NTT Communication Scienece Laborotories (Keihanna, Japan)
  - Research internship
  - Superviser: Shoko Araki
Aug. 2016 - Nov. 2016
- Mitsubishi Electric Research Laborotories (Boston, USA)
  - Research internship
  - Superviser: Shinji Watanabe
Oct. 2017 - Dec. 2017
- NEC Corporation (Kanagawa, Japan)
  - Research internship
  - Superviser: Tatsuya Komatsu
Nov. 2015 - Now
- Human Dataware Lab. Co., Ltd. (Nagoya, Japan)
  - Chief operating officer
Apr. 2019 - Now
- Nagoya Univerisy (Nagoya, Japan)
  - Postdoctoral researcher
Feb. 2020 - Now
- TARVO Inc. (Nagoya, Japan)
  - Researcher

Memberships

The Institute of Electrical and Electronics Engineers, Inc. (IEEE), Member
International Speech Communication Association (ISCA), Member
The Acoustical Society of Japan (ASJ), Member

Publications

Google Scholar

Award

DCASE2022 Challenge Task 2 Judge’s award, Nov. 2022.
日本音響学会独創研究奨励賞板倉記念, Mar. 2021.
IEEE Signal Processing Society Japan Young Author Best Paper Award, Dec, 2020.
DCASE2020 Challenge Task 2 Judge’s award, Nov. 2020.
日本音響学会東海支部優秀発表賞 Jul. 2015.
日本音響学会秋季研究発表会学生優秀発表賞, Sep. 2014.

Tutorials / Invited talks

林知樹, 山本龍一, 井上勝喜, 吉村建慶, 武田一哉, 戸田智基, 渡部晋治, “End-to-end音声合成の研究を加速させるオープンソースツールキットESPnet-TTS,” 日本音響学会春季研究発表会スペシャルセッション「end-to-end音声合成とその周辺」, Mar. 2020.（招待講演）
T. Hori, T. Hayashi, S. Karita, and S. Watanabe, “Advanced Methods for Neural End-to-End Speech Processing - Unification, Integration, and Implementation,” Interspeech Tutorial, Sep. 2019.
T. Toda, K. Kobayashi, and T. Hayashi, “Statistical voice conversion with direct waveform modeling,” Interspeech Tutorial, Sep. 2019.

Review papers

林知樹, “End-to-End音声処理の概要とESPnet2を用いたその実践,” 日本音響学会誌，Vol. 76, No. 12, pp. 720-729, Dec. 2020．
林知樹, 戸田智基, “統計的手法による音響イベント検出,” 日本音響学会誌，Vol. 75, No. 9, pp. 532-537, Sep. 2019．
K. Miyazaki, T. Toda, T. Hayashi, K. Takeda, “Environmental sound processing and its applications,” IEEJ Transactions on Electronics, Information and Systems，Vol. 14, No. 3, pp. 340-351, Mar. 2019.

Journal papers

W.-C. Huang, T. Hayashi, Y.-C. Wu, H. Kameoka, T. Toda, “A comparative study of self-supervised speech representation based voice conversion,” IEEE Journal of Selected Topics in Signal Processing, Vol. 16, No. 6, pp.1308-1318, 2022.
Y.-C. Wu, T. Hayashi, P.L. Tobing, K. Kobayashi, T. Toda, “Quasi-periodic WaveNet: an autoregressive raw waveform generative model with pitch-dependent dilated convolution neural network,” IEEE/ACM Transactions on Audio, Speech and Language Processing, Vol. 29, pp. 1134-1148, 2021.
Y.-C. Wu, T. Hayashi, T. Okamoto, H. Kawai, T. Toda, “Quasi-periodic parallel WaveGAN: a non-autoregressive raw waveform generative model with pitch-dependent dilated convolution neural network,” IEEE/ACM Transactions on Audio, Speech and Language Processing, Vol. 29, pp. 792-806, 2021.
W.-C. Huang, T. Hayashi, Y.-C. Wu, H. Kameoka, T. Toda, “Pretraining techniques for sequence-to-sequence voice conversion,” IEEE/ACM Transactions on Audio, Speech and Language Processing, pp. 745-755, 2021.
P.L. Tobing, Y.-C. Wu, T. Hayashi, K. Kobayashi, T. Toda, “An evaluation of voice conversion with neural network spectral mapping models and WaveNet vocoder,” APSIPA Transactions on Signal and Information Processing, Vol. 9, e26, pp. 1-14, Nov. 2020.
Y.-C. Wu, P.L. Tobing, T. Hayashi, K. Kobayashi, T. Toda, “Non-parallel voice conversion system with WaveNet vocoder and collapsed speech suppression,” IEEE Access, Vol. 8, No. 1, pp. 62094-62106, Apr. 2020.
P.L. Tobing, Y.-C. Wu, T. Hayashi, K. Kobayashi, T. Toda, “Voice conversion with CycleRNN-based spectral mapping and finly tuned WaveNet vocoder,” IEEE Access, Vol. 7, No. 1, pp. 171114-171125, Dec. 2019.
A. Tamamori, T. Hayashi, T. Toda, K. Takeda, “Daily activity recognition based on recurrent neural network using multi-modal signals,” APSIPA Transactions on Signal and Information Processing, Vol. 7, e21, pp..1-11, Dec. 2018.
T. Hayashi, M. Nishida, N. Kitaoka, T. Toda, K. Takeda, “Daily activity recognition with large-scaled real-life recording datasets based on deep neural network using multi-modal signals,” IEICE Transactions on Fundamentals, Vol. E101-A, No. 1, pp. 199-210, Jan. 2018.
S. Watanabe, T. Hori, S. Kim, J. R. Hershey, T. Hayashi, “Hybrid CTC/Attention architecture for end-to-end speech recognition,” in IEEE Journal of Selected Topics in Signal Processing, vol. 11, no. 8, pp. 1240-1253, Dec. 2017.
T. Hayashi, S. Watanabe, T. Toda, T. Hori, J. Le Roux, K. Takeda, “Duration-controlled LSTM for polyphonic sound event detection,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, Vol. 25, No. 11, pp. 2059-2070, Nov. 2017. 【IEEE Signal Processing Society Japan 2020 Young Author Best Paper Award】

International conference

M. Someki, Y. Higuchi, T. Hayashi, S. Watanabe, “ESPnet-ONNX: Bridging a Gap Between Research and Production”, Proc. APSIPA, pp.420-427, Nov. 2022.
J. Shi, S. Guo, T. Qian, N. Huo, T. Hayashi, Y. Wu, F. Xu, X. Chang, H. Li, P. Wu, S. Watanabe, Q. Jin, “Muskits: an End-to-End Music Processing Toolkit for Singing Voice Synthesis,” Proc. INTERSPEECH, pp.4277-4281, Sep. 2022.
S. Kim, T. Hayashi, T. Toda, “Note-level automatic guitar transcription using attention mechanism,” Proc. EUSIPCO, pp.229-233, Sep. 2022.
I. Kuroyanagi, T. Hayashi, K. Takeda, T. Toda, “Improvement of serial approach to anomalous sound detection by incorporating two binary cross-entropies for outlier exposure,” Proc. EUSIPCO, pp.294-298, Sep. 2022.
W.-C. Huang, S.-W. Yang, T. Hayashi, H.-Y. Lee, S. Watanabe, T. Toda, “S3PRL-VC: open-source voice conversion framework with self-supervised speech representations,” Proc. ICASSP, pp. 6552-6556, May. 2022.
T. Hayashi, K. Kobayashi, T. Toda, “An investigation of streaming non-autoregressive sequence-to-sequence voice conversion,” Proc. ICASSP, pp. 6802-6806, May. 2022.
W.-C. Huang, T. Hayashi, X. Li, S. Watanabe, T. Toda, “On prosody modeling for ASR+TTS based voice conversion,” Proc. IEEE ASRU, pp. 642-649, Dec. 2021.
I. Kuroyanagi, T. Hayashi, Y. Adachi, T. Yoshimura, K. Takeda, T. Toda, “An ensemble approach to anomalous sound detection based on conformer-based autoencoder and binary classifier incorporated with metric learning,” Proc. DCASE 2021 Workshop, pp. 110-114, Nov. 2021.
T. Komatsu, S. Watanabe, K. Miyazaki, T. Hayashi, “Acoustic Event Detection with Classifier Chains,” Proc. INTERSPEECH, pp. 601-605, 2021.
I. Kuroyanagi, T. Hayashi, K. Takeda, T. Toda, “Anomalous sound detection using a binary classification model and class centroids,” Proc. EUSIPCO, pp. 1995-1999, 2021.
T. Hayashi, T. Yoshimura, M. Inuzuka, I. Kuroyanagi, O. Segawa, “Spontaneous speech summarization: Transformers all the way through,” Proc. EUSIPCO, pp. 456-460, 2021.
T. Hayashi, W.-C. Huang, K. Kobayashi, T. Toda, “Non-autoregressive sequence-to-sequence voice conversion,” Proc. ICASSP, pp. 7068-7072, 2021.
P. Guo, F. Boyer, X. Chang, T. Hayashi, Y. Higuchi, H. Inaguma, N. Kamo, C. Li, D. G. Romero, J. Shi, J. Shi, S. Watanabe, K. Wei, W. Zhang, Y. Zhang, “Recent Developments on ESPnet Toolkit Boosted by Conformer,” Proc. ICASSP, pp. 5874-5878, 2021.
K. Kobayashi, W.-C. Huang, Y.-C. Wu, S. P.L. Tobing, T. Hayashi, T. Toda, “Crank: an open-source software for nonparallel voice conversion based on vector-quantized variational autoencoder,” Proc. ICASSP, pp.5934-5938, 2021.
W.-C. Huang, Y.-C. Wu, T. Hayashi, T. Toda, “Any-to-one sequence-to-sequence voice conversion using self-supervised discrete speech representations,” Proc. ICASSP, pp. 5944-5948, 2021.
C. Li, J. Shi, W. Zhang, A. S. Subramanian, X. Chang, N. Kamo, M. Hira, T. Hayashi, C. Boeddeker, Z. Chen, S. Watanabe, “ESPnet-SE: End-to-end speech enhancement and separation toolkit designed for ASR integration,” Proc. IEEE SLT, pp. 785-792, Dec. 2020.
K. Miyazaki, T. Komatsu, T. Hayashi, S. Watanabe, T. Toda, K. Takeda, “Conformer-based sound event detection with semi-supervised learning and data augmentation,” Proc. DCASE 2020 Workshop, pp. 100-104, Nov. 2020.
W.-C. Huang, T. Hayashi, S. Watanabe, T. Toda, “The sequence-to-sequence baseline for the Voice Conversion Challenge 2020: cascading ASR and TTS,” Proc. Joint workshop for the Blizzard Challenge and Voice Conversion Challenge 2020, pp. 160-164, Oct. 2020.
W.-C. Huang, T. Hayashi, Y.-C. Wu, H. Kameoka, T. Toda, “Voice transformer network: sequence-to-sequence voice conversion using transformer with text-to-speech pretraining,” Proc. INTERSPEECH, pp. 4675-4680, Oct. 2020.
Y.-C. Wu, T. Hayashi, T. Okamoto, H. Kawai, T. Toda, “Quasi-periodic parallel WaveGAN vocoder: a non-autoregressive pitch-dependent dilated convolution model for parametric speech generation,” Proc. INTERSPEECH, pp. 3535-3539, Oct. 2020.
S. Hikosaka, S. Seki, T. Hayashi, K. Kobayashi, K. Takeda, H. Banno, T. Toda, “Intelligibility enhancement based on speech waveform modification using hearing impairment simulator,” Proc. INTERSPEECH, pp. 4059-4063, Oct. 2020.
P.L. Tobing, T. Hayashi, Y.-C. Wu, K. Kobayashi, T. Toda, “Cyclic spectral modeling for unsupervised unit discovery into voice conversion with excitation and waveform modeling,” Proc. INTERSPEECH, pp. 3540-3544, Oct. 2020.
H. Inaguma, S. Kiyono, K. Duh, S. Karita, N. E. Yalta Soplin, T. Hayashi, S. Watanabe, “ESPnet-ST: All-in-One Speech Translation Toolkit,” Proc. the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pp. 302-311, Full virtual, Jul. 2020.
T. Yoshimura, T. Hayashi, K. Takeda, S. Watanabe, “End-to-end automatic speech recognition integrated with ctc-based voice activity detection,” Proc. IEEE ICASSP, pp. 6999-7003, Full virtual, May 2020.
K. Inoue, S. Hara, M. Abe, T. Hayashi, R. Yamamoto, S. Watanabe, “Semi-Supervised Speaker Adaptation for End-to-End Speech Synthesis with Pretrained Models,” Proc. IEEE ICASSP, pp. 7634-7638, Full virtual, May 2020.
K. Miyazaki, T. Komatsu, T. Hayashi, S. Watanabe, T. Toda, K. Takeda, “Weakly-supervised sound event detection with self-attention,” Proc. IEEE ICASSP, pp. 66-70, Full virtual, May 2020.
P.L. Tobing, Y.-C. Wu, T. Hayashi, K. Kobayashi, T. Toda, “Efficient shallow WaveNet vocoder using multiple samples output based on Laplacian distribution and linear prediction,” Proc. IEEE ICASSP, pp. 7204-7208, Full virtual, May 2020.
T. Hayashi, R. Yamamoto, K. Inoue, T. Yoshimura, S. Watanabe, T. Toda, K. Takeda, Y. Zhang, X. Tan, “ESPnet-TTS: Unified, reproducible, and integratable open source end-to-end text-to-speech toolkit,” Proc. IEEE ICASSP, pp. 7654-7658, Full virtual, May 2020.
S. Karita, N. Chen, T. Hayashi, T. Hori, H. Inaguma, Z. Jiang, M. Someki, N. E. Yalta Soplin, R. Yamamoto, X. Wang, S. Watanabe, T. Yoshimura, and W. Zhang, “A comparative study on Transformer vs RNN in speech applications,” Proc. IEEE ASRU, pp. 449-456, Sentosa, Singapore, Dec. 2019.
P.L. Tobing, T. Hayashi, T. Toda, “Investigation of shallow WaveNet vocoder with Laplacian distribution output,” Proc. IEEE ASRU, pp. 176-183, Sentosa, Singapore, Dec. 2019.
O. Segawa, T. Hayashi, K. Takeda, “Attention-Based Speech Recognition Using Gaze Information,” Proc. IEEE ASRU, pp. 465-470, Sentosa, Singapore, Dec. 2019.
Y.-C. Wu, P.L. Tobing, T. Hayashi, K. Kobayashi, T. Toda, “Statistical voice conversion with quasi-periodic WaveNet vocoder,” Proc. 10th ISCA Speech Synthesis Workshop (SSW10), pp. 63-68, Vienna, Austria, Sep. 2019.
Y.-C. Wu, T. Hayashi, P.L. Tobing, K. Kobayashi, T. Toda, “Quasi-periodic WaveNet vocoder: a pitch dependent dilated convolution model for parametric speech generation,” Proc. INTERSPEECH, pp. 196-200, Graz, Austria, Sep. 2019.
P.L. Tobing, Y.-C. Wu, T. Hayashi, K. Kobayashi, T. Toda, “Non-parallel voice conversion with cyclic variational autoencoder,” Proc. INTERSPEECH, pp. 674-678, Graz, Austria, Sep. 2019.
W.-C. Huang, Y.-C. Wu, C.-C. Lo, P.L. Tobing, T. Hayashi, K. Kobayashi, T. Toda, Y. Tsao, H.-M. Wang, “Investigation of F0 conditioning and fully convolutional networks in variational autoencoder based voice conversion,” Proc. INTERSPEECH, pp. 709-713, Graz, Austria, Sep. 2019.
T. Hayashi, S. Watanabe, T. Toda, K. Takeda, S. Toshniwal, K. Livescu, “Pre-trained text embeddings for enhanced text-to-speech synthesis,” Proc. INTERSPEECH, pp. 4430-4434, Graz, Austria, Sep. 2019.
W.-C. Huang, Y.-C. Wu, H.-T. Hwang, P.L. Tobing, T. Hayashi, K. Kobayashi, T. Toda, Y. Tsao, H.-M. Wang, “Refined WaveNet vocoder for variational autoencoder based voice conversion,” Proc. EUSIPCO, 5 pages, A Coruna, Spain, Sep. 2019.
T. Hori, R. Astudillo, T. Hayashi, Y. Zhang, S. Watanabe, and J. LeRoux, “Cycle-consistency training for end-to-end speech recognition,” Proc. IEEE ICASSP, pp. 6271-6275, Brighton, UK, May 2019.
T. Komatsu, T. Hayashi, R. Kondo, T. Toda, K. Takeda, “Scene-dependent anomalous acoustic-event detection based on conditional WaveNet and i-Vector,” Proc. IEEE ICASSP, pp. 870-874, Brighton, UK, May 2019.
P.L. Tobing, Y. Wu, T. Hayashi, K. Kobayashi, T. Toda, “Voice conversion with cyclic recurrent neural network and fine-tuned WaveNet vocoder,” Proc. IEEE ICASSP, pp. 6815-6819, Brighton, UK, May 2019.
T. Hayashi, S. Watanabe, Y. Zhang, T. Toda, T. Hori, R. Astudillo, K. Takeda, “Back-translation-style data augmentation for end-to-end ASR,” Proc. IEEE SLT, pp. 426-433, Dec. 2018.
P. L. Tobing, T. Hayashi, Y. Wu, K. Kobayashi, T. Toda, “An evaluation of deep spectral mappings and WaveNet vocoder for voice conversion,” Proc. IEEE SLT, pp. 297-303, Dec. 2018.
K. Miyazaki, T. Hayashi, T. Toda, K. Takeda, “Connectionist temporal classification-based sound event encoder for converting sound events into onomatopoeia representations,” Proc. EUSIPCO, pp. 857-861, Sep. 2018.
T. Hayashi, T. Komatsu, R. Kondo, T. Toda, K. Takeda, “Anomalous sound event detection based on WaveNet,” Proc. EUSIPCO, pp. 2508-2512, Sep. 2018.
T. Hayashi, S. Watanabe, T. Toda, K. Takeda, “Multi-head decoder for end-to-end speech recognition,” Proc. INTERSPEECH, pp. 801-805, Sep. 2018.
Y. Wu, K. Kobayashi, T. Hayashi, P. L. Tobing, T. Toda, “Collapsed segment detection and reduction for WaveNet vocoder,” Proc. INTERSPEECH, pp. 1998-1992, Sep. 2018.
Y. Wu, P. L. Tobing, T. Hayashi, K. Kobayashi, T. Toda, “The NU non-parallel voice conversion system for the voice conversion challenge 2018,” Proc. Odyssey 2018, pp. 211-218, June 2018.
P. L. Tobing, Y. Wu, T. Hayashi, K. Kobayashi, T. Toda, “NU voice conversion system for the voice conversion challenge 2018,” Proc. Odyssey 2018, pp. 219-226, June 2018.
T. Hayashi, A. Tamamori, K. Kobayashi, K. Takeda, T. Toda, “An investigation of multi-speaker training for WaveNet vocoder,” Proc. ASRU, pp. 712-718, Dec. 2017.
A. Tamamori, T. Hayashi, T. Toda, K. Takeda, “Investigation of effectiveness on recurrent neural network for daily activity recognition using multi-modal signals,” Proc. APSIPA, 7 pages, Kuala Lumpur, Malaysia, Dec. 2017.
A. Tamamori, T. Hayashi, K. Kobayashi, K. Takeda, T. Toda, “Speaker-dependent WaveNet vocoder,” Proc. INTERSPEECH, pp. 1118-1122, Aug. 2017.
K. Kobayashi, T. Hayashi, A. Tamamori, T. Toda, “Statistical voice conversion with WaveNet-based waveform generation,” Proc. INTERSPEECH, pp. 1138-1142, Aug. 2017.
T. Hayashi, S. Watanabe, T. Toda, T. Hori, J. Le Roux, K. Takeda, “BLSTM-HMM hybrid system combined with sound activity detection network for polyphonic sound event detection,” Proc. ICASSP, pp. 766-770, Mar. 2017.
A. Tamamori, T. Hayashi, T. Toda, K. Takeda, “Investigation on recurrent neural network architectures for daily activity recognition,” Proc. UV2016, Oct. 2016.
T. Hayashi, S. Watanabe, T. Toda, T. Hori, J. Le Roux, K. Takeda, “Bidirectional LSTM-HMM hybrid system for polyphonic sound event detection,” Proc. DCASE2016 workshop, 5 pages, Sep. 2016.
S. Araki, T. Hayashi, M. Delcroix, M. Fujimoto, K. Takeda, T. Nakatani, “Exploring multi-channel features for denoising-autoencoder-based speech enhancement,” Proc. ICASSP, pp.116-120, Apr. 2015.
T. Hayashi, S. Watanabe, T. Toda, T. Tori, J. LeRoux, K. Takeda, “Convolutional bidirectional long short-term memory hidden Markov model hybrid system for polyphonic sound event detection,” Proc. 5th Joint Meeting of the Acoustical Society of America and Acoustical Society of Japan, Dec. 2016.
H. Erdogan, T. Hayashi, J. R. Hershey, T. Hori, C. Hori, W. Hsu, S. Kim, J. LeRoux, Z. Meng, S. Watanabe, “Multi-channel speech recognition: LSTMs all the way through,” Proc. CHiME4 workshop, 2016.
T. Hayashi, M. Nishida, N. Kitaoka, K. Takeda, “Daily activity recognition based on DNN using environmental sound and acceleration signals,” Proc. EUSIPCO, pp. 2351-2355, Sep. 2015.
N. Kitaoka, T. Hayashi, K. Takeda, “Noisy speech recognition using blind spatial subtraction array technique and deep bottleneck features,” Proc. APSIPA, 4 pages, Dec. 2014.
T. Hayashi, N. Kitaoka, C. Miyajima, K. Takeda, “Investigating the robustness of deep bottleneck features for recognizing speech of speakers of various ages”, Proc. FORUM ACUSTICUM, 4 pages, Sep, 2014

Domestic conference

畔栁伊吹, 林知樹, 武田一哉, 戸田智基, “特徴量空間のクラス重心を考慮した二値分類モデルによる異常音検知,” 信学技報, Vol. 120, No. 397, EA2020-79, pp. 114-121, Mar. 2021.
宮崎晃一, 小松達也, 林知樹, 渡部晋治, 戸田智基, 武田一哉, “Self-attentionを用いた弱教師あり音響イベント検出,” 音講論, 1-1-5, pp. 181-182, Mar. 2020.
彦坂秀, 小林和弘, 林知樹, 関翔悟, 武田一哉, 坂野秀樹, 戸田智基, “模擬難聴処理を活用した補聴器フィルタ設計,” 音講論, 1-6-6, pp. 567-568, Sep. 2019.
安原和輝, 林知樹, 戸田智基, “End-to-End型テキスト音声合成におけるWaveNetボコーダの学習に関する調査,” 音講論, 1-4-9, pp. 951-952, Sep. 2019.
彦坂秀, 小林和弘, 林知樹, 関翔悟, 武田一哉, 坂野秀樹, 戸田智基, “模擬難聴処理を活用した音声波形加工に基づく明瞭度改善,” 信学技報, Vol. 119, No. 188, SP2019-13, pp. 25-29, Aug. 2019.
安原和輝, 林知樹, 戸田智基, “End-to-End型テキスト音声合成におけるWaveNetボコーダの学習についての調査,” 信学技報, Vol. 119, No. 188, SP2019-14, pp. 31-36, Aug. 2019.
林知樹, 渡部晋治, 戸田智基, 武田一哉, “End-to-End音声認識ためのmulti-head decoderネットワーク,” 音講論, pp. 925-926, Sep. 2018.
関翔悟, 林知樹, 武田一哉, 戸田智基, “WaveNetに基づく振幅スペクトログラムからの波形生成,” 音講論, pp. 281-282, Sep. 2018.
宮崎晃一, 林知樹, 戸田智基, 武田一哉, “End-to-Endアプローチに基づく音イベントの擬音語表現への記号化,” 信学技報, Vol. 118, No. 198, SP2018-30, pp. 37-42, Aug. 2018.
Y. Wu, P.L. Tobing, T. Hayashi, K. Kobayashi, T. Toda, “Development of NU non-parallel voice conversion system for Voice Conversion Challenge 2018,” 音講論, pp. 217-218, Mar. 2018.
P. L. Tobing, Y. Wu, T. Hayashi, K. Kobayashi, T. Toda, “Development of NU voice conversion system for Voice Conversion Challenge 2018,” 音講論, pp. 215-216, Mar. 2018.
林知樹, 小林和弘, 玉森聡, 武田一哉, 戸田智基, “WaveNetボコーダにおける学習データ量の影響に関する調査,” 音講論, pp. 249-250, Mar. 2018.
林知樹, 小林和弘, 玉森聡, 武田一哉, 戸田智基, “複数話者WaveNetボコーダに関する調査,” 信学技報, Vol. 117, No. 393, SP2017-81, pp. 81-86, Jan. 2018.
小林和弘, 林知樹, 玉森聡, 戸田智基, “WaveNetボコーダを用いた統計的音声変換法. 信学技報,” Vol. 117, No. 393, SP2017-82, pp. 87-92, Jan. 2018.
野田聖太, 林知樹, 戸田智基, 武田一哉, “DNN適応に基づく非可聴つぶやき認識用話者・環境依存音響モデルの構築,” 信学技報, Vol. 117, No. 368, EA2017-56, pp. 7-10, Dec. 2017.
宮崎晃一, 林知樹, 戸田智基, 武田一哉, “CTCに基づく音響イベントから擬音語表現への変換,” 音講論, pp. 19-20, Sep. 2017.
林知樹, 玉森聡, 小林和弘, 武田一哉, 戸田智基, “WaveNetボコーダ学習における複数話者音声データの利用に関する検討,” 音講論, pp. 285-286, Sep. 2017.
渡部晋治，堀貴明，林知樹, キムスヨン,“形態素解析も辞書も言語モデルも要らないend-to-end日本語音声認識,” 音講論, pp. 89-90, Mar. 2017.
野田聖太, 林知樹, 戸田智基, 武田一哉, “非可聴つぶやき認識のための通常音声を活用したDNN音響モデル学習,” 音講論, pp. 89-90, Mar. 2017.
林知樹，渡部晋治，戸田智基，堀貴明，Jonathan Le Roux, 武田一哉, “イベント区間検出統合型BLSTM-HMMハイブリッドモデルによる多重音響イベント検出,” 音講論, pp. 45-46, Mar. 2017.
林知樹, 渡部晋治, 戸田智基, 堀　貴明, Jonathan Le Roux, 武田一哉, “イベント継続長を明示的に制御したBLSTM-HSMMハイブリッドモデルによる多重音響イベント検出,” 信学技報, Vol. 117, No. 138, EA2017-2, pp. 9-14, July 2017.
玉森聡, 林知樹, 戸田智基, 武田一哉, “音声生成過程を考慮したWaveNetに基づく音声波形合成法,” 信学技報, Vol. 116, No. 477, SP2016-77, pp. 1-6, Mar. 2017.
玉森聡, 林知樹, 戸田智基, 武田一哉, “日常生活行動認識のためのrecurrent neural network構造の調査,” 音講論, pp. 1-2, Sep. 2016.
玉森聡，林知樹，戸田智基, 武田一哉, “Deep recurrent neural networkに基づく日常生活行動認識,” 信学技報, Vol. 116, No. 189, SP2016-28, pp. 7-12, Aug. 2016.
林知樹，北岡教英，戸田智基, 武田一哉, “Deep neural networkに基づく日常生活行動認識における適応手法,” 信学技報, Vol. 116, No. 189, SP2016-27, pp. 1-6, Aug. 2016.
林知樹，大谷健登，武田一哉，“DNNによる不可逆圧縮音源の高音質化の検討,” 日本音響学会秋季研究発表会, pp. 537-538, Sep. 2015
荒木章子, 林知樹 他, “マルチチャネル特徴を用いたdenoising autoencoder による音声強調,” 音講論, pp. 685-686, Mar. 2015.
林知樹，西田昌史，北岡教英，武田一哉，“DNNによる環境音と加速度信号を用いた日常生活行動認識,” 音講論, pp. 83-86，Mar. 2015.
林知樹，北岡教英，武田一哉, “深層学習を用いた音声特徴量の年齢の変動に対する頑健性の調査,” 音講論, pp.77-80, Sep, 2014.

Others

I. Kuroyanagi, T. Hayashi, K. Takeda, T. Toda, “Two-stage anomalous sound detection systems using domain generalization and specialization techniques,” DCASE2022 Challenge technical report, Jul. 2022.
T. Hayashi, R. Yamamoto, T. Yoshimura, P. Wu, J. Shi, T. Saeki, Y. Ju, Y. Yasuda, S. Takamichi, S. Watanabe, “ESPnet2-TTS: Extending the edge of tts research.” arXiv preprint arXiv:2110.07840, 2021.
J. Shi, X. Chang, T. Hayashi, Y. J. Lu, S. Watanabe, “Discretization and Re-synthesis: an alternative method to solve the Cocktail Party Problem,” arXiv preprint arXiv:2112.09382, 2021.
I. Kuroyanagi, T. Hayashi, Y. Adachi, T. Yoshimura, K. Takeda, T. Toda, “ANOMALOUS SOUND DETECTION WITH ENSEMBLE OF AUTOENCODER AND BINARY CLASSIFICATION APPROACHES,” DCASE2021 Challenge technical report, Jul. 2021.
C. Narisetty, T. Hayashi, R. Ishizaki, S. Watanabe, K. Takeda, “Leveraging State-of-the-art ASR Techniques to Audio Captioning,” DCASE2021 Challenge technical report, Jul. 2021.
T. Hayashi, S. Watanabe, “Discretalk: Text-to-speech as a machine translation problem,” arXiv preprint arXiv:2005.05525, 2020.
T. Hayashi, T. Yoshimura, Y. Adachi, “CONFORMER-BASED ID-AWARE AUTOENCODER FOR UNSUPERVISED ANOMALOUS SOUND DETECTION,” DCASE2020 Challenge technical report, Jul. 2020.

Softwares

Please do not ask me about the codebase through e-mail. Please use github issue.

ESPnet: End-to-end speech processing toolkit.
ParallelWaveGAN: Unofficial ParallelWaveGAN (+ various GAN vocoders) implementation.
PytorchWaveNetVocoder: WaveNet vocoder with noise shaping implementation.