著者
Daiki Takeuchi Kohei Yatabe Yuma Koizumi Yasuhiro Oikawa Noboru Harada
出版者
ACOUSTICAL SOCIETY OF JAPAN
雑誌
Acoustical Science and Technology (ISSN:13463969)
巻号頁・発行日
vol.41, no.5, pp.769-775, 2020-09-01 (Released:2020-09-01)
参考文献数
39

In recent single-channel speech enhancement, deep neural network (DNN) has played a quite important role for achieving high performance. One standard use of DNN is to construct a mask-generating function for time-frequency (T-F) masking. For applying a mask in T-F domain, the short-time Fourier transform (STFT) is usually utilized because of its well-understood and invertible nature. While the mask-generating regression function has been studied for a long time, there is less research on T-F transform from the viewpoint of speech enhancement. Since the performance of speech enhancement depends on both the T-F mask estimator and T-F transform, investigating T-F transform should be beneficial for designing a better enhancement system. In this paper, as a step toward optimal T-F transform in terms of speech enhancement, we experimentally investigated the effect of parameter settings of STFT on a DNN-based mask estimator. We conducted the experiments using three types of DNN architectures with three types of loss functions, and the results suggested that U-Net is robust to the parameter setting while that is not the case for fully connected and BLSTM networks.

言及状況

外部データベース (DOI)

Twitter (2 users, 2 posts, 6 favorites)

DNN 音声強調におけるスペクトログラムの解像度の影響に関する論文も出版されました!どっちもオープンアクセスなので無料で読めます! D. Takeuchi, K. Yatabe, Y. Koizumi et al.: Effect of spectrogram resolution on deep-neural-network-based speech enhancement https://t.co/780T6x6jGA

収集済み URL リスト