文献一覧: Arata KAWAMURA (著者)

3 0 0 0 An Efficient Image to Sound Mapping Method Preserving Speech Spectral Envelope

著者: Yuya HOSODA Arata KAWAMURA Youji IIGUNI
出版者: The Institute of Electronics, Information and Communication Engineers
雑誌: IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences (ISSN:09168508)
巻号頁・発行日: vol.E103.A, no.3, pp.629-630, 2020-03-01 (Released:2020-03-01)
参考文献数: 6
被引用文献数: 1

In this paper, we propose an image to sound mapping method. This technique treats an image as a spectrogram and maps it to a sound by taking inverse FFT of the spectrogram. Amplitude spectra of a speech signal are embedded to the spectrogram to give speech intelligibility for the mapped sound. Specifically, we hold amplitude spectra of a speech signal with strong power and embed the image brightness in other frequency bands. Holding amplitude spectra of a speech signal with strong power preserves a speech spectral envelope and improves the speech quality of the mapped sound. The amplitude spectra of the mapped sound with weak power represent the image brightness, and then the image is successfully reconstructed from the mapped sound. Simulation results show that the proposed method achieves sufficient speech quality.

2022-10-01 09:22:29
3 + 1 Twitter

1 0 0 0 An Efficient Image to Sound Mapping Method Using Speech Spectral Phase and Multi-Column Image

著者: Arata KAWAMURA Hiro IGARASHI Youji IIGUNI
出版者: 一般社団法人電子情報通信学会
雑誌: IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences (ISSN:09168508)
巻号頁・発行日: vol.E100.A, no.3, pp.893-895, 2017-03-01 (Released:2017-03-01)
参考文献数: 11
被引用文献数: 2

Image-to-sound mapping is a technique that transforms an image to a sound signal, which is subsequently treated as a sound spectrogram. In general, the transformed sound differs from a human speech signal. Herein an efficient image-to-sound mapping method, which provides an understandable speech signal without any training, is proposed. To synthesize such a speech signal, the proposed method utilizes a multi-column image and a speech spectral phase that is obtained from a long-time observation of the speech. The original image can be retrieved from the sound spectrogram of the synthesized speech signal. The synthesized speech and the reconstructed image qualities are evaluated using objective tests.

2017-09-28 05:09:14
1 + 0 Twitter