著者
Jihyeon Yun Takayuki Arai
出版者
ACOUSTICAL SOCIETY OF JAPAN
雑誌
Acoustical Science and Technology (ISSN:13463969)
巻号頁・発行日
vol.41, no.2, pp.501-512, 2020-03-01 (Released:2020-03-01)
参考文献数
25
被引用文献数
1

Previous research reported that Korean nasal consonants can be denasalized in word-initial position. This study examined the perception of word-initial nasal onset /n/ for native Korean listeners using synthesized /Ca/ stimuli with a Klatt synthesizer. We tested the effects of consonant duration, consonant nasality, and vowel nasalization on perception. In a rating experiment, listeners evaluated the goodness of the stimuli as /na/ on a seven-point scale. The participants generally gave favorable ratings to the stimuli with nasalized vowels. Two-thirds of the participants responded that the stimuli with no nasality are good exemplars of /na/, whereas the other listeners did not. In a yes-no experiment, participants judged if the stimuli were /na/ or not. They responded in similar ways they did in the rating experiment. Many listeners gave positive responses as /na/ even to the stimuli with 0 voice onset time, yet the stimuli with longer prevoicing or nasal murmur were more likely to be perceived as /na/. Vowel nasality affected the perception of /na/, while some listeners preferred oral vowels over the nasalized vowels when they evaluated the /na/-likeness.
著者
Yuki Saito Taiki Nakamura Yusuke Ijima Kyosuke Nishida Shinnosuke Takamichi
出版者
ACOUSTICAL SOCIETY OF JAPAN
雑誌
Acoustical Science and Technology (ISSN:13463969)
巻号頁・発行日
vol.42, no.1, pp.1-11, 2021-01-01 (Released:2021-01-01)
参考文献数
34
被引用文献数
1

We propose non-parallel and many-to-many voice conversion (VC) using variational autoencoders (VAEs) that constructs VC models for converting arbitrary speakers' characteristics into those of other arbitrary speakers without parallel speech corpora for training the models. Although VAEs conditioned by one-hot coded speaker codes can achieve non-parallel VC, the phonetic contents of the converted speech tend to vanish, resulting in degraded speech quality. Another issue is that they cannot deal with unseen speakers not included in training corpora. To overcome these issues, we incorporate deep-neural-network-based automatic speech recognition (ASR) and automatic speaker verification (ASV) into the VAE-based VC. Since phonetic contents are given as phonetic posteriorgrams predicted from the ASR models, the proposed VC can overcome the quality degradation. Our VC utilizes d-vectors extracted from the ASV models as continuous speaker representations that can deal with unseen speakers. Experimental results demonstrate that our VC outperforms the conventional VAE-based VC in terms of mel-cepstral distortion and converted speech quality. We also investigate the effects of hyperparameters in our VC and reveal that 1) a large d-vector dimensionality that gives the better ASV performance does not necessarily improve converted speech quality, and 2) a large number of pre-stored speakers improves the quality.
著者
Eberhard Zwicker Hugo Fastl Ulrich Widmann Kenji Kurakata Sonoko Kuwano Seiichiro Namba
出版者
Acoustical Society of Japan
雑誌
Journal of the Acoustical Society of Japan (E) (ISSN:03882861)
巻号頁・発行日
vol.12, no.1, pp.39-42, 1991 (Released:2011-02-17)
参考文献数
12
被引用文献数
46 87

The method for calculating loudness level proposed by Zwicker is standardized in ISO 532B. This is a graphical procedure and it can be tedious to calculate loudness level by this procedure. Recently, DIN 45631 has been revised including a computer program for calculating loudness level in BASIC which runs on IBM-compatible PC's. Since the NEC PC-9801 series computers are popular in Japan, the program has been modified for the NEC PC-9801 series computers and is introduced in this paper.
著者
Daiki Takeuchi Kohei Yatabe Yuma Koizumi Yasuhiro Oikawa Noboru Harada
出版者
ACOUSTICAL SOCIETY OF JAPAN
雑誌
Acoustical Science and Technology (ISSN:13463969)
巻号頁・発行日
vol.41, no.5, pp.769-775, 2020-09-01 (Released:2020-09-01)
参考文献数
39
被引用文献数
6

In recent single-channel speech enhancement, deep neural network (DNN) has played a quite important role for achieving high performance. One standard use of DNN is to construct a mask-generating function for time-frequency (T-F) masking. For applying a mask in T-F domain, the short-time Fourier transform (STFT) is usually utilized because of its well-understood and invertible nature. While the mask-generating regression function has been studied for a long time, there is less research on T-F transform from the viewpoint of speech enhancement. Since the performance of speech enhancement depends on both the T-F mask estimator and T-F transform, investigating T-F transform should be beneficial for designing a better enhancement system. In this paper, as a step toward optimal T-F transform in terms of speech enhancement, we experimentally investigated the effect of parameter settings of STFT on a DNN-based mask estimator. We conducted the experiments using three types of DNN architectures with three types of loss functions, and the results suggested that U-Net is robust to the parameter setting while that is not the case for fully connected and BLSTM networks.
著者
Jeff Moore Jason Shaw Shigeto Kawahara Takayuki Arai
出版者
ACOUSTICAL SOCIETY OF JAPAN
雑誌
Acoustical Science and Technology (ISSN:13463969)
巻号頁・発行日
vol.39, no.2, pp.75-83, 2018-03-01 (Released:2018-03-01)
参考文献数
27
被引用文献数
1

This study examines the tongue shapes used by Japanese speakers to produce the English liquids /ɹ/ and /l/. Four native Japanese speakers of varying levels of English acquisition and one North American English speaker were recorded both acoustically and with Electromagnetic Articulography. Seven distinct articulation strategies were identified. Results indicate that the least advanced speaker uses a single articulation strategy for both sounds. Intermediate speakers used a wide range of articulations, while the most advanced non-native speaker relied on a single strategy for each sound.
著者
Taguti Tomoyasu Ohtsuki Katsuya Yamasaki Teruo Kuwano Sonoko Namba Seiichiro
出版者
ACOUSTICAL SOCIETY OF JAPAN
雑誌
Acoustical Science and Technology (ISSN:13463969)
巻号頁・発行日
vol.23, no.5, pp.244-251, 2002 (Released:2002-09-01)
参考文献数
17
被引用文献数
2

Tone stopping is the act of ending the vibration of a piano string by the contact of the damper. This paper studies the perceptual effect of tone stopping. Five performances of a music passage were synthesized with piano tones of simulated tone stopping at sound level, i.e., with the tones that were obtained by processing the waveform of a single, sustained tone of a real piano to induce a desired ending profile with the onset portion kept intact. These performances were rated by ten musically trained subjects with the method of paired comparisons on twenty adjectives. The result indicated that: (1) a short plateau followed by a slow decay made the tone reverberating, lustrous, and beautiful, (2) a long plateau followed by a fast decay made the tone sticky, immature, and blunt, and (3) a short plateau followed by a fast decay made the tone tight, sharp, and nimble.
著者
Akinori Ito
出版者
ACOUSTICAL SOCIETY OF JAPAN
雑誌
Acoustical Science and Technology (ISSN:13463969)
巻号頁・発行日
vol.41, no.1, pp.166-169, 2020-01-01 (Released:2020-01-06)
参考文献数
40

This article briefly reviews the research works related to metacommunication. Metacommunication is a term meaning ``communication on communication,'' which is related to marginal communication such as conveying recognition, comprehension, and evaluation of an interlocutor's words. Herein, several research works are reviewed from the metacommunication point of view.
著者
Makoto Morinaga Junichi Mori Takanori Matsui Yasuaki Kawase Kazuyuki Hanaka
出版者
ACOUSTICAL SOCIETY OF JAPAN
雑誌
Acoustical Science and Technology (ISSN:13463969)
巻号頁・発行日
vol.40, no.6, pp.391-398, 2019-11-01 (Released:2019-11-01)
参考文献数
15
被引用文献数
6

We have been developing an aircraft model identification system that uses a convolutional neural network (CNN). The assumption is that this identification system would be used to estimate the number of flights to create noise maps. In our previous study, we used the CNN model to classify five aircraft comprising three rotorcraft, one turboprop, and one jet aircraft, and the accuracy reached 99%. In the present study, to examine whether this method is also effective for identifying the sound sources of jet aircraft, we conducted two case studies using frequency characteristics of aircraft noise obtained from field measurements around Osaka International Airport and Narita International Airport. Targeting 7 and 18 types of sound source at Osaka and Narita, respectively, an identification rate of 98% was obtained in both cases. This suggests that the present system can estimate the number of jet aircraft flights for each engine type or each aircraft model with very high accuracy.
著者
Cohen David Rossing Thomas D.
出版者
ACOUSTICAL SOCIETY OF JAPAN
雑誌
Acoustical Science and Technology (ISSN:13463969)
巻号頁・発行日
vol.24, no.1, pp.1-6, 2003 (Released:2003-01-01)
参考文献数
14
被引用文献数
4

Using electronic TV holography, we have studied the vibrational modes of four mandolins and a mandola. The lowest (0,0) modes may appear either as a triplet (as in a guitar) or as a doublet. The modal frequencies correlate well with the frequency response curves. Sound spectra indicate that sound radiation is quite uniform over the 0–5 kHz range with some rolloff above 2.5 kHz.
著者
Tsubasa Kusano Kohei Yatabe Yasuhiro Oikawa
出版者
ACOUSTICAL SOCIETY OF JAPAN
雑誌
Acoustical Science and Technology (ISSN:13463969)
巻号頁・発行日
vol.39, no.3, pp.215-225, 2018-05-01 (Released:2018-05-01)
参考文献数
47
被引用文献数
2

In marine seismic surveys to explore seafloor resources, the structure below the seafloor is estimated from the obtained sound waves, which are emitted by a marine seismic sound source and reflected or refracted between the layers below the seafloor. In order to estimate the structure below the seafloor from returned waves, information of the sound source position and the sound speed are needed. Marine seismic vibrators, which are one of the marine seismic sound sources, have some advantages such as high controllability of the frequency and phase of the sound, and oscillation at a high depth. However, when the sound source position is far from the sea surface, it becomes difficult to specify the exact position. In this paper, we propose a method to estimate the position of a marine seismic vibrator and the sound speed from obtained seismic data by formulating an optimization problem via hyperbolic Radon transform. Numerical simulations confirmed that the proposed method almost achieves theoretical lower bounds for the variances of the estimations.
著者
Yôiti Suzuki Hisashi Takeshima Kenji Kurakata
出版者
ACOUSTICAL SOCIETY OF JAPAN
雑誌
Acoustical Science and Technology (ISSN:13463969)
巻号頁・発行日
pp.e23.66, (Released:2023-10-25)
参考文献数
14

As significant errors were reported in 1985 for the international standard related to equal-loudness-level contours (ELLCs) for pure tones, the earlier international standard was fully revised in 2003 as ISO 226:2003, after 18 years of revision work. Twenty years later, the standard has been revised again as ISO 226:2023. One motivation for the revision was to reflect the lowering of the threshold of hearing at 20 Hz by 0.4 dB in ISO 389-7:2019. In addition, the following two points of substance were revised: (1) implementation of the power exponent relating loudness perception to physical intensity formulated in an academic paper published in 2004, which describes the derivation of ELLCs relating to ISO 226:2003, and (2) adoption of mathematical expressions that preserve the appropriate number of significant digits. In this review, the process of the revision and the technical details of the changes are described. The differences from the 2003 edition are only 0.6 dB at most, and the 2023 standard can be regarded as the same as the 2003 edition in terms of practical use.
著者
Jürgen Herre Schuyler R. Quackenbush
出版者
ACOUSTICAL SOCIETY OF JAPAN
雑誌
Acoustical Science and Technology (ISSN:13463969)
巻号頁・発行日
vol.43, no.2, pp.143-148, 2022-03-01 (Released:2022-03-01)
参考文献数
14
被引用文献数
2

The term "Immersive Audio" is frequently used to describe an audio experience that provides to the listener the sensation of being fully immersed or "present" in a sound scene. This can be achieved via different presentation modes, such as surround sound (several loudspeakers horizontally arranged around the listener), 3D audio (with loudspeakers at, above and below listener ear level) and binaural audio to headphones. This article provides an overview of the recent MPEG standard, MPEG-H 3D Audio, which is a versatile standard that supports multiple immersive sound signal formats (channels, objects, higher order ambisonics), and is now being adopted in broadcast and streaming applications.
著者
Sungyoung Kim
出版者
ACOUSTICAL SOCIETY OF JAPAN
雑誌
Acoustical Science and Technology (ISSN:13463969)
巻号頁・発行日
vol.41, no.1, pp.129-133, 2020-01-01 (Released:2020-01-06)
参考文献数
20
被引用文献数
1

This paper reviews previous experimental studies on the relationship between a listener's cultural framework and auditory perception of an enclosed space. Cultural influence on auditory perception of noise and music has been assessed through a range of studies. Is it same for spatial hearing? When we enter to a space, would a particular cultural framework influence on understanding of the corresponding auditory environment? As physical buildings and enclosures reflect architectural and visual heritage, the auditory environment of an enclosed space also represents a unique and distinct heritage where people have interacted with and shaped their culture. When two listener groups (East-Asian and North-American) compared a reproduced field, previous findings show that (1) the semantic value of a same descriptor was distinctly different for two groups, and (2) there was an inverse relationship between the area of a personal space and size of a desired (preferred) auditory environment. With the advance of virtual reality (VR) technology, listeners can enter any auditory environment ubiquitously. Therefore, researchers and developers in the field should consider multiple user groups and the role of cultural framework in virtual environments.
著者
Takao Tsuchiya Yusuke Makino Yu Teshima Shizuko Hiryu
出版者
ACOUSTICAL SOCIETY OF JAPAN
雑誌
Acoustical Science and Technology (ISSN:13463969)
巻号頁・発行日
vol.44, no.2, pp.101-109, 2023-03-01 (Released:2023-03-01)
参考文献数
28
被引用文献数
2

This paper reports on the implementation of a moving sound source and receiver with directivity in the two-dimensional finite-difference time-domain (FDTD) method. A two-dimensional fundamental solution of a moving monopole source is theoretically derived. Then, a fundamental solution of a moving dipole source is obtained by differentiating the fundamental solution of a monopole source in space. Finally, the directivity of moving monopole, dipole, and cardioid sources is theoretically derived. Numerical experiments performed on the two-dimensional sound field showed that the effect of moving velocity on amplitude differs for the monopole and dipole sources. Furthermore, it was found that directivity characteristics of dipole and cardioid sources vary depending on the beam steering angle and moving direction. The present method can be accurately applied to the moving sound source and receiver with directivity.
著者
Valbona Berisha Shukri Klinaku
出版者
ACOUSTICAL SOCIETY OF JAPAN
雑誌
Acoustical Science and Technology (ISSN:13463969)
巻号頁・発行日
vol.44, no.1, pp.24-28, 2023-01-01 (Released:2023-01-01)
参考文献数
17
被引用文献数
2

The principle of relativity requires that laws be invariant in all inertial reference frames. The laws of mechanics are invariant to Galilean relativity. The acoustic wave equation is a mechanical law. So why does the acoustic wave equation turn out to be noninvariant to Galilean transformation? What does this mean? Why do the principle of relativity, the wave equation, and the current Galilean transformation not agree between them? Indeed, to provide the invariance of the wave equation, the Galilean transformation must be transformed. The transformed Galilean transformation has a wide base of arguments.