著者
César D. Salvador Shuichi Sakamoto Jorge Treviño Yôiti Suzuki
出版者
ACOUSTICAL SOCIETY OF JAPAN
雑誌
Acoustical Science and Technology (ISSN:13463969)
巻号頁・発行日
vol.38, no.1, pp.1-13, 2017-01-01 (Released:2017-01-01)
参考文献数
26
被引用文献数
3 5

This paper derives a continuous-space model to describe variations in magnitude of complex head-related transfer functions (HRTFs) along angles and radial distances throughout the horizontal plane. The radial part of this model defines a set of horizontal-plane distance-varying filters (HP-DVFs) that are used to synthesize the HRTFs for arbitrary sound source positions on the horizontal plane from initial HRTFs obtained for positions on a circular boundary at a single distance from the head of a listener. The HP-DVFs are formulated in terms of horizontal-plane solutions to the three-dimensional acoustic wave equation, which are derived by assuming invariance along elevation angles in spherical coordinates. This prevents the free-field inaccurate distance decay observed when assuming invariance along height in cylindrical coordinates. Furthermore, discontinuities along the axis connecting the ears are also overcome, which appear when assuming invariance along the polar angle in interaural coordinates. This paper also presents a magnitude-dependent band-limiting threshold (MBT) for restricting the action of filters to a limited angular bandwidth, which is necessary in practice to enable discrete-space models that consider a finite number of sources distributed on the initial circle. Numerical experiments using a model of a human head show that the overall synthesis accuracy achieved with the proposed MBT outperforms the one achieved with the existing frequency-dependent threshold, especially at low frequencies and close distances to the head.
著者
Atsuto Inoue Yusuke Ikeda Kohei Yatabe Yasuhiro Oikawa
出版者
ACOUSTICAL SOCIETY OF JAPAN
雑誌
Acoustical Science and Technology (ISSN:13463969)
巻号頁・発行日
vol.40, no.1, pp.1-11, 2019-01-01 (Released:2019-01-01)
参考文献数
28
被引用文献数
2 16

For the visualization of a sound field, a widely used method is the superimposition of the sound information onto a camera view. Although it effectively enables the understanding the relationship between space and sound, a planar display cannot resolve depth information in a straightforward manner. In contrast, a see-through head-mounted display (STHMD) is capable of representing three-dimensional (3D) vision and natural augmented reality (AR) or mixed reality (MR). In this paper, we propose a system for the measurement and visualization of a sound field with an STHMD. We created two visualization systems using different types of STHMDs and technologies for realizing AR/MR and a measurement system for a 3D sound intensity map, which can be used together with the visualization system. Through three visualization experiments, we empirically found that the stereoscopic viewing and the convenient viewpoint movement associated with the STHMD enables understanding of the sound field in a short time.
著者
Masayuki Nishiguchi
出版者
ACOUSTICAL SOCIETY OF JAPAN
雑誌
Acoustical Science and Technology (ISSN:13463969)
巻号頁・発行日
vol.27, no.6, pp.375-383, 2006 (Released:2006-11-01)
参考文献数
19

A coding algorithm for speech called harmonic vector excitation coding (HVXC) has been developed that encodes speech at very low bit rates (2.0–4.0 kbit/s). It breaks speech signals down into two types of segments: voiced segments, for which a parametric representation of harmonic spectral magnitudes of LPC residual signals is used; and unvoiced segments, for which the CELP coding algorithm is used. This combination provides near toll-quality speech at 4.0 kbit/s, and communication-quality speech at 2.0 kbit/s, thus outperforming FS1016 4.8-kbit/s CELP. This paper discusses the encoder and decoder algorithms for HVXC, including fast harmonic synthesis, time scale modification, and pitch-change decoding. Due to its high coding efficiency and new functionality, HVXC has been adopted as the ISO/IEC International Standard for MPEG-4 audio.
著者
Nozomiko Yasui Masanobu Miura Akitoshi Kataoka
出版者
一般社団法人 日本音響学会
雑誌
Acoustical Science and Technology (ISSN:13463969)
巻号頁・発行日
vol.33, no.3, pp.160-169, 2012-05-01 (Released:2012-04-29)
参考文献数
10
被引用文献数
2 4

A tremolo produced by irregular plucking of a mandolin is characterized by the average plucking rate, as well as the onset and amplitude deviations. The fluctuation of a tremolo elicited by only the average plucking rate is called the “1st fluctuation,” and that elicited by onset and amplitude deviations is called the “2nd fluctuation.” The procedure for estimating the fluctuation strength, which represents the sensation of hearing fluctuation from sounds, such as amplitude-modulated or frequency-modulated sounds with only the 1st fluctuation, has been developed. However, a procedure for a tremolo with both 1st and 2nd fluctuations has not been investigated. Therefore, we developed a procedure for estimating fluctuation strength from a tremolo produced by irregular plucking of a mandolin. We calculated the feature parameters of a tremolo, and estimated fluctuation strength from the tremolo using the calculated parameters. We found that this procedure that is based on not only the 1st fluctuation but also the 2nd fluctuation approximately represents the sensation of hearing fluctuation (adjusted R2=0.76), and is better than the representation obtained using a procedure based on conventional methods (R2=0.58). Thus, we developed a procedure for estimating fluctuation strength from a tremolo produced by irregular plucking of a mandolin.
著者
Stefan Bilbao Michele Ducceschi
出版者
ACOUSTICAL SOCIETY OF JAPAN
雑誌
Acoustical Science and Technology (ISSN:13463969)
巻号頁・発行日
vol.44, no.3, pp.194-209, 2023-05-01 (Released:2023-05-01)
参考文献数
100

Musical string vibration has been the subject of scientific study for centuries. Recent increases in computational power have allowed the exploration of increasingly detailed features of perceptual significance through simulation approaches. The starting point for any simulation is a well-defined model, usually framed as a system of differential equations, with parameters determined by measurement and experiment. This review article is intended to take the reader through models of string vibration progressively, beginning with well-known and well-studied linear models, and then introducing new features that form the basis for the modern study of realistic musical string vibration. These include, first, nonlinear excitation mechanisms, such as the hammer-string and bow-string interaction, and then the collision mechanism, both for pointwise obstructions and over a distributed region. Finally, the linear model of string vibration is generalized to include geometric nonlinear effects, leading to typical nonlinear behaviour such as pitch glides and the appearance of so-called phantom partials due to nonlinear mixing of modes. The article concludes with a general overview of numerical simulation techniques for string vibration.
著者
Hironori Takemoto Seiji Adachi Natsuki Toda
出版者
ACOUSTICAL SOCIETY OF JAPAN
雑誌
Acoustical Science and Technology (ISSN:13463969)
巻号頁・発行日
vol.44, no.1, pp.9-16, 2023-01-01 (Released:2023-01-01)
参考文献数
15

The vocal tract can be modeled as an acoustic tube in the low-frequency region because the plane wave propagation is dominant. Further, it can be considered static for a limited short period during running speech, such as vowels. Thus, its acoustic properties have been examined mainly using the transmission line model (TLM), that is, the one-dimensional static model in the frequency domain. In the present paper, we propose a one-dimensional static model in the time domain based on the finite-difference time-domain method. In this model, the vocal tract is represented by the cascaded acoustic tubes of different cross-sectional areas. The pressure and wall vibration effects are simulated at the center of each tube. On the other hand, the volume velocity is calculated at the labial end. According to the leapfrog algorithm, the pressure and volume velocity are sequentially computed. As a result, the impulse responses of the vocal tracts for the five Japanese vowels were calculated, and the corresponding transfer functions agreed well with those calculated by the TLM in the low-frequency region. The mean absolute percentage difference of the lower four peaks for the five vowels was 2.3%.
著者
Jungsoon Kim Moojoon Kim
出版者
ACOUSTICAL SOCIETY OF JAPAN
雑誌
Acoustical Science and Technology (ISSN:13463969)
巻号頁・発行日
vol.44, no.1, pp.1-8, 2023-01-01 (Released:2023-01-01)
参考文献数
25

The vibration mode of a bolt-clamped ultrasonic transducer was experimentally analyzed. In the experiment, a designed and manufactured semicircular wedge-shaped jig was used to apply constant pressure to a narrow band-shaped area on the lateral side of the transducer. Constant force was applied to the jig with a vise and a torque wrench. As the position of the pressure applied by the jig moved along the length of the transducer, the change of the input admittance characteristic of the transducer was observed. Each vibration mode was analyzed from the change in the magnitude and the resonant frequency of the input admittance. The proposed method made it possible to practically determine the position of the node for each vibration mode, and is expected to provide useful information to utilize the harmonic modes.
著者
Jihyeon Yun Takayuki Arai
出版者
ACOUSTICAL SOCIETY OF JAPAN
雑誌
Acoustical Science and Technology (ISSN:13463969)
巻号頁・発行日
vol.41, no.2, pp.501-512, 2020-03-01 (Released:2020-03-01)
参考文献数
25
被引用文献数
1

Previous research reported that Korean nasal consonants can be denasalized in word-initial position. This study examined the perception of word-initial nasal onset /n/ for native Korean listeners using synthesized /Ca/ stimuli with a Klatt synthesizer. We tested the effects of consonant duration, consonant nasality, and vowel nasalization on perception. In a rating experiment, listeners evaluated the goodness of the stimuli as /na/ on a seven-point scale. The participants generally gave favorable ratings to the stimuli with nasalized vowels. Two-thirds of the participants responded that the stimuli with no nasality are good exemplars of /na/, whereas the other listeners did not. In a yes-no experiment, participants judged if the stimuli were /na/ or not. They responded in similar ways they did in the rating experiment. Many listeners gave positive responses as /na/ even to the stimuli with 0 voice onset time, yet the stimuli with longer prevoicing or nasal murmur were more likely to be perceived as /na/. Vowel nasality affected the perception of /na/, while some listeners preferred oral vowels over the nasalized vowels when they evaluated the /na/-likeness.
著者
Yuki Saito Taiki Nakamura Yusuke Ijima Kyosuke Nishida Shinnosuke Takamichi
出版者
ACOUSTICAL SOCIETY OF JAPAN
雑誌
Acoustical Science and Technology (ISSN:13463969)
巻号頁・発行日
vol.42, no.1, pp.1-11, 2021-01-01 (Released:2021-01-01)
参考文献数
34
被引用文献数
1

We propose non-parallel and many-to-many voice conversion (VC) using variational autoencoders (VAEs) that constructs VC models for converting arbitrary speakers' characteristics into those of other arbitrary speakers without parallel speech corpora for training the models. Although VAEs conditioned by one-hot coded speaker codes can achieve non-parallel VC, the phonetic contents of the converted speech tend to vanish, resulting in degraded speech quality. Another issue is that they cannot deal with unseen speakers not included in training corpora. To overcome these issues, we incorporate deep-neural-network-based automatic speech recognition (ASR) and automatic speaker verification (ASV) into the VAE-based VC. Since phonetic contents are given as phonetic posteriorgrams predicted from the ASR models, the proposed VC can overcome the quality degradation. Our VC utilizes d-vectors extracted from the ASV models as continuous speaker representations that can deal with unseen speakers. Experimental results demonstrate that our VC outperforms the conventional VAE-based VC in terms of mel-cepstral distortion and converted speech quality. We also investigate the effects of hyperparameters in our VC and reveal that 1) a large d-vector dimensionality that gives the better ASV performance does not necessarily improve converted speech quality, and 2) a large number of pre-stored speakers improves the quality.
著者
Daiki Takeuchi Kohei Yatabe Yuma Koizumi Yasuhiro Oikawa Noboru Harada
出版者
ACOUSTICAL SOCIETY OF JAPAN
雑誌
Acoustical Science and Technology (ISSN:13463969)
巻号頁・発行日
vol.41, no.5, pp.769-775, 2020-09-01 (Released:2020-09-01)
参考文献数
39
被引用文献数
6

In recent single-channel speech enhancement, deep neural network (DNN) has played a quite important role for achieving high performance. One standard use of DNN is to construct a mask-generating function for time-frequency (T-F) masking. For applying a mask in T-F domain, the short-time Fourier transform (STFT) is usually utilized because of its well-understood and invertible nature. While the mask-generating regression function has been studied for a long time, there is less research on T-F transform from the viewpoint of speech enhancement. Since the performance of speech enhancement depends on both the T-F mask estimator and T-F transform, investigating T-F transform should be beneficial for designing a better enhancement system. In this paper, as a step toward optimal T-F transform in terms of speech enhancement, we experimentally investigated the effect of parameter settings of STFT on a DNN-based mask estimator. We conducted the experiments using three types of DNN architectures with three types of loss functions, and the results suggested that U-Net is robust to the parameter setting while that is not the case for fully connected and BLSTM networks.
著者
Jeff Moore Jason Shaw Shigeto Kawahara Takayuki Arai
出版者
ACOUSTICAL SOCIETY OF JAPAN
雑誌
Acoustical Science and Technology (ISSN:13463969)
巻号頁・発行日
vol.39, no.2, pp.75-83, 2018-03-01 (Released:2018-03-01)
参考文献数
27
被引用文献数
1

This study examines the tongue shapes used by Japanese speakers to produce the English liquids /ɹ/ and /l/. Four native Japanese speakers of varying levels of English acquisition and one North American English speaker were recorded both acoustically and with Electromagnetic Articulography. Seven distinct articulation strategies were identified. Results indicate that the least advanced speaker uses a single articulation strategy for both sounds. Intermediate speakers used a wide range of articulations, while the most advanced non-native speaker relied on a single strategy for each sound.