著者
Yuki Saito Kohei Yatabe Shogun
出版者
ACOUSTICAL SOCIETY OF JAPAN
雑誌
Acoustical Science and Technology (ISSN:13463969)
巻号頁・発行日
pp.e23.67, (Released:2023-12-02)
参考文献数
11

Understanding of gameplay can enhance the experience and entertainment of video game. In this study, we propose to utilize the sound generated by a controller for analyzing the information of gameplay. Controller sound is a user-friendly feature related to gameplay because it can be very easily recorded. As a first step of the research, we performed identification of characters of Super Smash Bros. Ultimate only from controller sound as an example task for examining whether controller sound contains valuable information. The results showed that our model achieved 79% accuracy for identification of five characters only using the controller sound.
著者
Kenta Ofuji Naomi Ogasawara
出版者
ACOUSTICAL SOCIETY OF JAPAN
雑誌
Acoustical Science and Technology (ISSN:13463969)
巻号頁・発行日
vol.39, no.2, pp.56-65, 2018-03-01 (Released:2018-03-01)
参考文献数
23
被引用文献数
5 5

In this paper, we study the effects of acoustic characteristics of spoken disaster warnings in Japanese on listeners' perceived intelligibility, reliability, and urgency. Our findings are threefold: (a) For both speaking speed and fo, setting them to normal (compared from slow/fast ({+}/{-}20%) for speed, and from low/high (+/- up to 36 Hz) for fo) improved the average evaluations for Intelligibility and Reliability. (b) For Urgency only, setting speed to faster (both slow to normal and normal to fast) or setting fo to higher (both low to normal and normal to high) resulted in an improved average evaluation. (c) For all of intelligibility, reliability, and urgency, the main effect of speaking speed was the most dominant. In particular, urgency can be influenced by the speed factor alone by up to 39%. By setting speed to fast (+20%), all other things being equal, the average perceived urgency raised to 4.0 on the 1–5 scale from 3.2 when the speed is normal. Based on these results, we argue that the speech rate may effectively be varied depending on the purpose of an evacuation call, whether it prioritizes urgency, or intelligibility and reliability. Care should be taken to the possibility that the respondent-specific variation and experimental conditions may interplay these results.
著者
Yoshiki Masuyama Tsubasa Kusano Kohei Yatabe Yasuhiro Oikawa
出版者
ACOUSTICAL SOCIETY OF JAPAN
雑誌
Acoustical Science and Technology (ISSN:13463969)
巻号頁・発行日
vol.40, no.3, pp.186-197, 2019-05-01 (Released:2019-05-01)
参考文献数
29
被引用文献数
3

For musical instrument sounds containing partials, which are referred to as modes, the decaying processes of the modes significantly affect the timbre of musical instruments and characterize the sounds. However, their accurate decomposition around the onset is not an easy task, especially when the sounds have sharp onsets and contain the non-modal percussive components such as the attack. This is because the sharp onsets of modes comprise peaky but broad spectra, which makes it difficult to get rid of the attack component. In this paper, an optimization-based method of modal decomposition is proposed to overcome it. The proposed method is formulated as a constrained optimization problem to enforce the perfect reconstruction property which is important for accurate decomposition and causality of modes. Three numerical simulations and application to the real piano sounds confirm the performance of the proposed method.
著者
Kei Sawada Kei Hashimoto Keiichiro Oura Yoshihiko Nankaku Keiichi Tokuda
出版者
ACOUSTICAL SOCIETY OF JAPAN
雑誌
Acoustical Science and Technology (ISSN:13463969)
巻号頁・発行日
vol.39, no.2, pp.119-129, 2018-03-01 (Released:2018-03-01)
参考文献数
35

This paper proposes a method for constructing text-to-speech (TTS) systems for languages with unknown pronunciations. One goal of speech synthesis research is to establish a framework that can be used to construct TTS systems for any written language. Generally, language-specific knowledge is required to construct TTS systems for a new language. However, it is difficult to acquire language-specific knowledge in each new language. Therefore, constructing a TTS system for a new language entails huge costs. To address this problem, we investigate a framework for automatically constructing a TTS system from a target language database consisting of only speech data and corresponding Unicode texts. In the proposed method, pseudo phonetic information of the target language with unknown pronunciation is obtained by a speech recognizer of a rich-resource proxy language. Then, a grapheme-to-phoneme converter and a statistical parametric speech synthesizer are constructed based on the obtained pseudo phonetic information. The proposed method was applied to Japanese and was evaluated in terms of objective and subjective measures. Additionally, we challenged the construction of TTS systems for nine Indian languages using the proposed method, and TTS systems were evaluated in the Blizzard Challenge 2014 and 2015.
著者
Itsuki Ogawa Masanori Morise
出版者
ACOUSTICAL SOCIETY OF JAPAN
雑誌
Acoustical Science and Technology (ISSN:13463969)
巻号頁・発行日
vol.42, no.3, pp.140-145, 2021-05-01 (Released:2021-05-01)
参考文献数
39
被引用文献数
10

We have built a singing database that can be used for research purposes. Since recent songs are protected by copyright law, researchers typically use songs that can be used without copyright. With changes to the copyright law in Japan in 2019, we can now release a singing database consisting of songs protected by the law under several restrictions. Our database mainly consists of Japanese pop songs by a professional singer. We collected a total of 50 songs with around 57 minutes of vocals recorded in a studio. After recording, we labeled the phoneme boundaries and converted the songs into the MusicXML format required for the study of statistical parametric singing synthesis. Statistical analysis of the database was then carried out. First, we counted the number of phonemes to clarify their distribution. Second, we performed acoustical analysis on the distribution of pitch, the interval between notes, and duration. Results showed that although the information is biased, the amount of singing is sufficient in light of the findings of a prior study on singing synthesis. The corpus is freely available at our website, https://zunko.jp/kiridev/login.php [1].
著者
Shinnosuke Takamichi Ryosuke Sonobe Kentaro Mitsui Yuki Saito Tomoki Koriyama Naoko Tanji Hiroshi Saruwatari
出版者
ACOUSTICAL SOCIETY OF JAPAN
雑誌
Acoustical Science and Technology (ISSN:13463969)
巻号頁・発行日
vol.41, no.5, pp.761-768, 2020-09-01 (Released:2020-09-01)
参考文献数
50
被引用文献数
17

In this paper, we develop two corpora for speech synthesis research. Thanks to improvements in machine learning techniques, including deep learning, speech synthesis is becoming a machine learning task. To accelerate speech synthesis research, we aim at developing Japanese voice corpora reasonably accessible from not only academic institutions but also commercial companies. In this paper, we construct the JSUT and JVS corpora. They are designed mainly for text-to-speech synthesis and voice conversion, respectively. The JSUT corpus contains 10 hours of reading-style speech uttered by a single speaker, and the JVS corpus contains 30 hours containing three styles of speech uttered by 100 speakers. This paper describes how we designed the corpora and summarizes the specifications. The corpora are available at our project pages.
著者
Kohei Yatabe Yoshiki Masuyama Tsubasa Kusano Yasuhiro Oikawa
出版者
ACOUSTICAL SOCIETY OF JAPAN
雑誌
Acoustical Science and Technology (ISSN:13463969)
巻号頁・発行日
vol.40, no.3, pp.170-177, 2019-05-01 (Released:2019-05-01)
参考文献数
48
被引用文献数
21

As importance of the phase of complex spectrogram has been recognized widely, many techniques have been proposed for handling it. However, several definitions and terminologies for the same concept can be found in the literature, which has confused beginners. In this paper, two major definitions of the short-time Fourier transform and their phase conventions are summarized to alleviate such complication. A phase-aware signal-processing scheme based on phase conversion is also introduced with a set of executable MATLAB functions (https://doi.org/10/c3qb).
著者
Daichi Kitamura
出版者
ACOUSTICAL SOCIETY OF JAPAN
雑誌
Acoustical Science and Technology (ISSN:13463969)
巻号頁・発行日
vol.40, no.3, pp.155-161, 2019-05-01 (Released:2019-05-01)
参考文献数
35
被引用文献数
7

Nonnegative matrix factorization (NMF) is a powerful technique of extracting meaningful patterns from an observed matrix and has been used for many applications in the audio signal processing field. In this article, the principle of NMF and some extensions based on a complex generative model are reviewed. Also, their application to audio source separation is presented.
著者
Masahiro Harazono Daichi Kitamura Masashi Nakayama
出版者
ACOUSTICAL SOCIETY OF JAPAN
雑誌
Acoustical Science and Technology (ISSN:13463969)
巻号頁・発行日
vol.33, no.5, pp.301-309, 2012-05-01 (Released:2012-09-01)
参考文献数
7
被引用文献数
1

As a factor to characterize the sound of an electric guitar, it is thought that a characteristic of the pickup contributes most. The pickups most often used are classified roughly into single-coil models and humbucking models. The single-coil pickup is made by winding the thin wires with several thousand turns of coils around six polarizing pole pieces each corresponding to a string of the guitar, and the change in the magnetic reluctance owing to the string vibration that causes the change in the magnetic flux is transformed into an electrical signal. The humbucking pickup is composed of one magnetic circuit with two single-coil pickups, and made to be in phase electrically and out of phase magnetically for the purpose of removing circumference magnetic noise. In this paper, the response of the humbucking pickup excited by a string vibration set up by a real commercial solid body electric guitar is analyzed, and a simulation result is shown to agree with an actual measured value with sufficient precision. In addition, the response of the humbucking pickup imitated with two single-coil pickups is compared with the single-coil pickup and some additional considerations in the characteristics have been gained through analysis.
著者
Shuhei Okada Tatsuya Hirahara
出版者
ACOUSTICAL SOCIETY OF JAPAN
雑誌
Acoustical Science and Technology (ISSN:13463969)
巻号頁・発行日
vol.36, no.5, pp.449-452, 2015 (Released:2015-09-01)
参考文献数
6
被引用文献数
1 1
著者
Akira Nishimura Nobuo Koizumi
出版者
ACOUSTICAL SOCIETY OF JAPAN
雑誌
Acoustical Science and Technology (ISSN:13463969)
巻号頁・発行日
vol.31, no.2, pp.172-180, 2010-03-01 (Released:2010-03-01)
参考文献数
15
被引用文献数
1 1

A method of sampling jitter measurement based on time-domain analytic signals is proposed. Computer simulations and actual measurements were performed to compare the proposed method with the conventional method, in which jitter is evaluated from the amplitudes of sideband spectra for observed signals in the frequency domain. The results show that the proposed method is effective in that it 1) provides high temporal resolution as a result of the direct derivation of the jitter waveform, 2) achieves higher accuracy in the measurement of jitter amplitude, and 3) can separate phase modulation that originate in sampling jitter from amplitude modulation that originate in digital-to-analog and analog-to-digital conversion processes. Suitable measurement conditions and measurements to separate the effects of jitter in a digital-to-analog converter and an analog-to-digital converter are described.
著者
M. Charles Liberman
出版者
ACOUSTICAL SOCIETY OF JAPAN
雑誌
Acoustical Science and Technology (ISSN:13463969)
巻号頁・発行日
vol.41, no.1, pp.59-62, 2020-01-01 (Released:2020-01-06)
参考文献数
42
被引用文献数
4 8

In acquired sensorineural hearing loss, the hearing impairment arises mainly from damage to cochlear hair cells or the sensory fibers of the auditory nerve that innervate them. Hair cell loss or damage is well captured by the changes in the threshold audiogram, but the degree of neural damage is not. We have recently shown, in animal models of noise-damage and aging, and in autopsy specimens from aging humans, that the synapses connecting inner hair cells and auditory nerve fibers are the first to degenerate. This primary neural degeneration, or cochlear synaptopathy, leaves many surviving inner hair cells permanently disconnected from their sensory innervation, and many spiral ganglion cells surviving with only their central projections to the brainstem intact. This pathology represents a kind of ``hidden hearing loss.'' This review summarizes current speculations as to the functional consequences of this primary neural degeneration and the prospects for a therapeutic rescue based on local delivery of neuroptrophins to elicit neurite extension and synaptogenesis in the adult ear.