著者
Katsuhiko Yamamoto Toshio Irino Toshie Matsui Shoko Araki Keisuke Kinoshita Tomohiro Nakatani
出版者
ACOUSTICAL SOCIETY OF JAPAN
雑誌
Acoustical Science and Technology (ISSN:13463969)
巻号頁・発行日
vol.40, no.2, pp.84-92, 2019-03-01 (Released:2019-03-01)
参考文献数
27
被引用文献数
2

The speech-based envelope power spectrum model (sEPSM) was developed to predict the speech intelligibility of sounds produced by nonlinear speech enhancement algorithms such as spectral subtraction. It is a linear model with a linear, level-independent gammatone (GT) filterbank as the front-end. Therefore, it seems difficult to evaluate speech sounds with low and high sound pressure levels (SPLs) consistently because the intelligibility of the speech is dependent on the SPL as well as the signal-to-noise ratio. In this study, the sEPSM was extended with the dynamic compressive gammachirp (dcGC) auditory filterbank and a ``common'' normalization factor of the modulation power spectrum component to improve the predictability of the model. For evaluating the proposed model, we performed subjective experiments on the intelligibility of speech sounds enhanced by spectral subtraction and a Wiener filter algorithm. We compared the subjective speech intelligibility scores with the objective scores predicted by the proposed dcGC-sEPSM, original GT-sEPSM, and other well-known conventional methods such as the short-time objective intelligibility measure (STOI), coherence speech intelligibility index (CSII), and hearing aid speech perception index (HASPI). The result shows that the proposed dcGC-sEPSM predicted the subjective results better did than the other methods.
著者
Zhi Zhu Katsuhiko Yamamoto Masashi Unoki Naofumi Aoki
出版者
信号処理学会
雑誌
Journal of Signal Processing (ISSN:13426230)
巻号頁・発行日
vol.18, no.6, pp.303-307, 2014-11-25 (Released:2014-11-25)
参考文献数
7

Speech scrambling aims to eliminate intelligibility of original speech in order to preventing eavesdropping and copyright infringement. There is, however, a problem in that completely recovering scrambled speech into the original speech cannot be achieved with conventional methods. In this paper, we propose a speech scrambling method that uses the random-bit-shift of quantization bits with common keys. We evaluated the confidentiality and efficiency of the proposed method by using two objective measures, SER and PESQ. As a result we confirmed that speech signals can be scrambled into completely unintelligible sounds with the proposed method. Moreover, it is possible to restore a scrambled speech signal into the original one completely. In addition, we also confirmed that the scrambled speech signal could not be descrambled correctly with the wrong key.
著者
Katsuhiko Yamamoto Zhi Zhu Masashi Unoki Naofumi Aoki
出版者
信号処理学会
雑誌
Journal of Signal Processing (ISSN:13426230)
巻号頁・発行日
vol.18, no.4, pp.205-208, 2014-07-30 (Released:2014-07-30)
参考文献数
10

Speech scrambling methods are widely used for copyright protection and encrypting digital speech signals in order to guarantee the confidentiality of the original signals. They are very important methods for preventing eavesdropping and unauthorized copying. However, it seems to be impossible to completely recover a scrambled speech signal into the original signal. Moreover, nobody can comprehend the partial speech content from speech signals scrambled with these methods. In this paper, we propose a semi-scramble method for speech signals based on phonemic restoration. By using a speech scrambling method based on the random-bit shift of quantization bits, speech signals are converted to scrambled signals in partial intervals. We evaluated the confidentiality and efficiency of the proposed method by using two objective measures, signal-to-error ratio (SER) and perceptual evaluation of speech quality (PESQ). As a result, we confirmed that the proposed method can play a role in copyright protection for an original signal and recover a semi-scrambled speech signal into the original one. Finally, we indicated that the acoustic characteristics of signal semi-scrambled with the proposed method enable the listener to understand the speech information.