著者
Yasufumi Uezu Sadao Hiroya Takemi Mochida
出版者
ACOUSTICAL SOCIETY OF JAPAN
雑誌
Acoustical Science and Technology (ISSN:13463969)
巻号頁・発行日
vol.41, no.5, pp.720-728, 2020-09-01 (Released:2020-09-01)
参考文献数
30
被引用文献数
2

Auditory feedback has a crucial role in stably controlling speaking and singing. Formant-transformed auditory feedback (TAF) is used to investigate the relationship between perturbation to the formant frequency and the compensatory response to clarify the mechanism of auditory-speech motor control. Although previous studies for formant TAF applied linear predictive coding (LPC) to estimate formant frequencies, LPC estimates false formants for high-pitch voice. In this paper, we investigate how different vocal-tract spectrum estimation methods in real-time formant TAFs affect the compensatory response of formant frequencies to perturbations. A phase equalization-based autoregressive exogenous model (PEAR) is applied to the TAF system as a formant estimation method that can estimate the formant frequency more accurately and robustly than LPC can. Fifteen Japanese native speakers were asked to repeat the Japanese syllables /he/ or /hi/ while receiving feedback sounds whose formants F1 and F2 were transformed. From the results for the /he/ condition, the F1 compensatory response for PEAR was significantly larger than that of LPC, and the compensation error in the F1–F2 plane for PEAR was less than that for LPC. Our results suggest that PEAR can increase both the accuracy of formant frequency estimation and the naturalness of the transformed speech sound.
著者
Hiroki Oohashi Sadao Hiroya Takemi Mochida
出版者
ACOUSTICAL SOCIETY OF JAPAN
雑誌
Acoustical Science and Technology (ISSN:13463969)
巻号頁・発行日
vol.36, no.6, pp.478-488, 2015 (Released:2015-11-01)
参考文献数
22
被引用文献数
1 4

This paper presents a real-time robust formant tracking system for speech using a real-time phase equalization-based autoregressive exogenous model (PEAR) with electroglottography (EGG). Although linear predictive coding (LPC) analysis is a popular method for estimating formant frequencies, it is known that the estimation accuracy for speech with high fundamental frequency F0 would be degraded since the harmonic structure of the glottal source spectrum deviates more from the Gaussian noise assumption in LPC as its F0 increases. In contrast, PEAR, which employs phase equalization and LPC with an impulse train as the glottal source signals, estimates formant frequencies robustly even for speech with high F0. However, PEAR requires higher computational complexity than LPC. In this study, to reduce this computational complexity, a novel formulation of PEAR was derived, which enabled us to implement PEAR for a real-time robust formant tracking system. In addition, since PEAR requires timings of glottal closures, a stable detection method using EGG was devised. We developed the real-time system on a digital signal processor and showed that, for both the synthesized and natural vowels, the proposed method can estimate formant frequencies more robustly than LPC against a wider range of F0.
著者
Yasufumi Uezu Tokihiko Kaburagi
出版者
ACOUSTICAL SOCIETY OF JAPAN
雑誌
Acoustical Science and Technology (ISSN:13463969)
巻号頁・発行日
vol.37, no.6, pp.267-276, 2016-11-01 (Released:2016-11-01)
参考文献数
17
被引用文献数
1 4

When one of the dominant harmonics (the fundamental frequency and its harmonic components) is close to the first formant frequency, the effect of the source-filter interaction can induce voice register transition, in which the vocal-fold vibration becomes unstable and the pitch jumps abruptly. We investigated the relationship between the dominant harmonics, the first formant frequency, and the pitch jump width in the modal-falsetto transition to examine the effect of source-filter interaction. We measured temporal patterns of the fundamental frequency and the first formant when subjects performed rising glissandi with /a/ and /i/ vowels. For the /a/ vowel, there were weak proximity relationships between the dominant harmonics and first formant during the transition, indicating that source-induced transition occurred. For the /i/ vowel, in contrast, the fundamental frequency was regularly close to the first formant in the transition, indicating that the acoustically induced transition was caused by the source-filter interaction. Additionally, it was found that the difference between these two mechanisms had little influence on the pitch jump width. Finally, we concluded that the source-filter interaction is a contributory factor of the modal-falsetto transition, in agreement with foregoing studies.
著者
Masato Akagi Taro Ienaga
出版者
Acoustical Society of Japan
雑誌
Journal of the Acoustical Society of Japan (E) (ISSN:03882861)
巻号頁・発行日
vol.18, no.2, pp.73-80, 1997 (Released:2011-02-17)
参考文献数
7
被引用文献数
2 9

Speaker individualities in fundamental frequency (F0) contours are investigated through analyses of several speakers'uttered speech and psychoacoustic experiments. The analyses are performed to extract significant physical characteristics of F0 by using Fujisaki and Hirose's analysis method and the F-ratio of each physical characteristic. The experiments are performed to clarify the relationship between these physical characteristics and the perception of speaker's speech. The stimuli used in the experiments are re-synthesized with manipulated Fo contours and spectral envelopes averaged overall for all speakers by using the Log Magnitude Approximation analysis-synthesis system. The analysis and experimental results indicate that (1) there is speaker individuality in the Fo contours, (2) some specific parameters related to the dynamics of F0 contours have many speaker individuality features and speaker individuality can be controlled by manipulating these parameters, and (3) although there are speaker individuality features in the time-averaged F0, they help improve speaker identification less than the dynamics of the F0 contours.
著者
Toshio Irino Masashi Unoki
出版者
Acoustical Society of Japan
雑誌
Journal of the Acoustical Society of Japan (E) (ISSN:03882861)
巻号頁・発行日
vol.20, no.6, pp.397-406, 1999 (Released:2011-02-17)
参考文献数
26
被引用文献数
8 21

This paper proposes a new auditory filterbank that enables signal resynthesis from dynamic representations produced by a level-dependent auditory filterbank. The filterbank is based on a new IIR implementation of the gammachirp, which has been shown to be an excellent candidate for asymmetric, level-dependent auditory filters. Initially, the gammachirp filter is shown to be decomposed into a combination of a gammatone filter and an asymmetric function. The asymmetric function is excellently simulated with a minimum-phase IIR filter, named the “asymmetric compensation filter”. Then, two filterbank structures are presented each based on the combination of a gammatone filterbank and a bank of asymmetric compensation filters controlled by a signal level estimation mechanism. The inverse filter of the asymmetric compensation filter is always stable because the minimum-phase condition is satisfied. When a bank of inverse filters is utilized after the gammachirp analysis filterbank and the idea of wavelet transform is applied, it is possible to resynthesize signals with small time-invariant errors and achieve a guaranteed precision. This feature has never been accomplished by conventional active auditory filterbanks. The proposed analysis/synthesis gammachirp filterbank is expected to be useful in various applications where human auditory filtering has to be modeled.
著者
Chia-huei Tseng Ya-Ting Wang Satoshi Shioiri
出版者
ACOUSTICAL SOCIETY OF JAPAN
雑誌
Acoustical Science and Technology (ISSN:13463969)
巻号頁・発行日
vol.41, no.1, pp.2-5, 2020-01-01 (Released:2020-01-06)
参考文献数
9
被引用文献数
1

``Ma'' is a Japanese word that contains very rich meanings. It is used commonly to refer space, time, and things in between by Japanese. The mutual understanding and agreement of such concept by group individuals is a key to sustain social harmonics. In the past, this concept is primarily discussed in literature/humanity fields, and little in scientific and engineering communities. In this presentation, I will try to offer a few examples (e.g. music appreciation of silence, Japanese comic story-telling, Rakugo) to demonstrate that it is possible to use an interdisciplinary approach to investigate the concept of ``Ma'' scientifically. Furthermore, this may provide a starting point for designers and engineers to device into the interpersonal communication on other abstract concepts.
著者
Maria Chait
出版者
ACOUSTICAL SOCIETY OF JAPAN
雑誌
Acoustical Science and Technology (ISSN:13463969)
巻号頁・発行日
vol.41, no.1, pp.48-53, 2020-01-01 (Released:2020-01-06)
参考文献数
25
被引用文献数
11

Sensitivity to patterns is fundamental to sensory processing, in particular in the auditory system, and a major component of the influential `predictive coding' theory of brain function. Supported by growing experimental evidence, the `predictive coding' framework suggests that perception is driven by a mechanism of inference, based on an internal model of the signal source. However, a key element of this theory –- the process through which the brain acquires this model, and its neural underpinnings –- remains poorly understood. Here I review recent brain imaging and behavioural work which focuses on this missing link. Together these emerging results paint a picture of the brain as a regularity seeker, rapidly extracting and maintaining representations of acoustic structure on multiple time scales and even when these are not relevant to behaviour.
著者
Alexei Kochetov
出版者
ACOUSTICAL SOCIETY OF JAPAN
雑誌
Acoustical Science and Technology (ISSN:13463969)
巻号頁・発行日
vol.39, no.2, pp.84-91, 2018-03-01 (Released:2018-03-01)
参考文献数
20
被引用文献数
1

This study employed electropalatography (EPG) to explore place and manner of articulation differences in Japanese consonants. Linguopalatal contact data were collected from 5 native speakers using custom-made artificial palates. The materials included words with 10 word-initial consonants and a word-final moraic nasal. Quantitative analyses of the data revealed some consistent differences among consonants in constriction location and constriction degree, even within the same-place classes. Certain differences among dorsal consonants, as well as among consonants with no active lingual constriction were also observed. The results for Japanese coronal consonants were further compared to previous quantitative findings for English and Spanish with the goal to establish common manner-specific patterns of linguopalatal contact across languages.
著者
Fabin Acquaticci Juan F. Guarracino Sergio N. Gwirc Sergio E. Lew
出版者
ACOUSTICAL SOCIETY OF JAPAN
雑誌
Acoustical Science and Technology (ISSN:13463969)
巻号頁・発行日
vol.40, no.2, pp.116-126, 2019-03-01 (Released:2019-03-01)
参考文献数
24
被引用文献数
6

In this work, we built ultrasonic disc-shaped transducer for targeted neuromodulation with the addition of a solid axicon lens based on a polydimethylsiloxane (PDMS) interface. We made a numerical and experimental characterization of its acoustic field. The motor cortex of CF-1 mice was stimulated, through the skin and skull into the intact brain, with low-intensity pulsed ultrasound. Evoked muscle responses in different body segments were clearly observed, including hindlimb, forelimb, and tail. Axicon lens affixed on the face of the transducer makes possible a targeted modulation of the motor cortex by pulsed ultrasound, inducing muscle contraction in a specific body segment. In this approach, the lateral and axial spatial resolution is comparable to spherical segment ultrasound transducers, but with a shorter focal length. Thus, ultrasound axicon looks attractive to investigate the functional contributions of fine-grained spatial structures in the brain.
著者
Brian C. J. Moore
出版者
ACOUSTICAL SOCIETY OF JAPAN
雑誌
Acoustical Science and Technology (ISSN:13463969)
巻号頁・発行日
vol.40, no.2, pp.61-83, 2019-03-01 (Released:2019-03-01)
参考文献数
159
被引用文献数
29

Within the cochlea, broadband sounds like speech and music are filtered into a series of narrowband signals, each with a relatively slowly varying envelope (ENV) imposed on a rapidly oscillating carrier (the temporal fine structure, TFS). Information about ENV and TFS is conveyed in the timing and short-term rate of action potentials in the auditory nerve. This paper describes the role of ENV and TFS information in pitch perception, binaural processing, and the perception of speech in the presence of background sounds. The paper also describes the effects of hearing loss and age on the processing of TFS and ENV information. The monaural and binaural processing of TFS information is adversely affected by both hearing loss and increasing age. The monaural processing of ENV information is little affected by hearing loss or by increasing age. The binaural processing of ENV information deteriorates somewhat with increasing age but is not markedly affected by hearing loss. The reduced TFS processing abilities found for older/hearing-impaired subjects may partially account for the difficulties that such subjects experience in complex listening situations.
著者
岩宮 眞一郎 小杉 経平 北村 音一
出版者
Acoustical Society of Japan
雑誌
Journal of the Acoustical Society of Japan (E) (ISSN:03882861)
巻号頁・発行日
vol.4, no.2, pp.73-82, 1983 (Released:2011-02-17)
参考文献数
17
被引用文献数
8 9

When we listen to vibrato tones, we perceive not only frequency fluctuation but also somewhat steady pitch. This pitch sensation of vibrato tones is defined as “principal pitch.” The principal pitch of vibrato tones is located around its carrier frequency, by a method of adjustment. As the location of principal pitch of vibrato tones has the width proportional to its extent of frequency changing, it is impossible to be located more precisely. The principal pitch of symmetrical trill tones is located around its carrier frequency, too. The location of principal pitch of asymmetrical trill tones is shifted in the direction of the longer one between high frequency interval and low frequency interval. This means that the location of principal pitch of frequency modulated tones is defined not only by higher and lower extremes of modulation extent but also by the whole process of frequency changing. It is more shifted than the simple time average of frequency changing. From these experimental results we suppose the pitch averaging mechanism in the auditory process to perceive principal pitch of frequency modulated tones. In this process, this mechanism samples pitch changing continuously, registers in sensory store, and averages every certain moment. When a pitch distribution of frequency modulated tones is asymmetrical, this mechanism accomplishes its function after emphasizing the large part of the distributio.
著者
William F. Katz Sonya Mehta Matthew Wood
出版者
ACOUSTICAL SOCIETY OF JAPAN
雑誌
Acoustical Science and Technology (ISSN:13463969)
巻号頁・発行日
vol.39, no.2, pp.130-137, 2018-03-01 (Released:2018-03-01)
参考文献数
26
被引用文献数
5

In order to investigate the articulatory processes involved in producing Japanese /r/, we obtained speech recordings for native talkers of standard Japanese using an electromagnetic articulography (EMA) system. Each talker produced repetitions of /r/ in a carrier phrase designed to contrast syllable (CV and VCV VCV) and vowel (/a/, /i/, /u/, /e/, and /o/) contexts. Kinematic recordings were made using tongue (tip, TT; dorsum, TD; body, TB; left lateral, TLL; and right lateral, TRL) and lower lip/jaw (LL) sensors. We measured TT vertical displacement, TT duration at maximum position, and tongue blade width for the consonant gestures. In a perceptual experiment, American English listeners decided whether these consonants consisted of `l,' `r,' or `d.' The kinematic results indicate Japanese talkers produced CV consonants with greater stricture and longer closures than consonants in intervocalic positions. CV productions also had narrower tongue blade widths than VCV VCV productions, especially in /i/ and /u/ contexts. The data were modeled with Dirichlet regression in order to determine how strongly tongue width and context (syllable and vowel) factors predict listeners' judgments. The results showed a significant fit for `r' judgments, with the tongue width fit successively increased by the addition of syllable and vowel context information.
著者
Masahiro Ishibashi Tohru Idogawa
出版者
Acoustical Society of Japan
雑誌
Journal of the Acoustical Society of Japan (E) (ISSN:03882861)
巻号頁・発行日
vol.8, no.4, pp.139-144, 1987 (Released:2011-02-17)
参考文献数
5
被引用文献数
2

Input impulse responses of several bassoons are presented. The impulse responses are obtained by the inverse Fourier transform of measured input impedances of the bassoons as seen from the bocal entrance. Method of input impedance determination are outlined. Comparison of the impulse responses shows such differences between a particular bas-soon and the others as can be found only by test blowing.
著者
Minoru Nagata
出版者
Acoustical Society of Japan
雑誌
Journal of the Acoustical Society of Japan (E) (ISSN:03882861)
巻号頁・発行日
vol.10, no.2, pp.59-72, 1989 (Released:2011-02-17)
参考文献数
15
被引用文献数
1 2