著者
Masanori MORISE Fumiya YOKOMORI Kenji OZAWA
出版者
The Institute of Electronics, Information and Communication Engineers
雑誌
IEICE Transactions on Information and Systems (ISSN:09168532)
巻号頁・発行日
vol.E99.D, no.7, pp.1877-1884, 2016-07-01 (Released:2016-07-01)
参考文献数
38
被引用文献数
91 571

A vocoder-based speech synthesis system, named WORLD, was developed in an effort to improve the sound quality of real-time applications using speech. Speech analysis, manipulation, and synthesis on the basis of vocoders are used in various kinds of speech research. Although several high-quality speech synthesis systems have been developed, real-time processing has been difficult with them because of their high computational costs. This new speech synthesis system has not only sound quality but also quick processing. It consists of three analysis algorithms and one synthesis algorithm proposed in our previous research. The effectiveness of the system was evaluated by comparing its output with against natural speech including consonants. Its processing speed was also compared with those of conventional systems. The results showed that WORLD was superior to the other systems in terms of both sound quality and processing speed. In particular, it was over ten times faster than the conventional systems, and the real time factor (RTF) indicated that it was fast enough for real-time processing.
著者
Tadachika OKI Satoshi TAOKA Toshiya MASHIMA Toshimasa WATANABE
出版者
一般社団法人 電子情報通信学会
雑誌
IEICE Transactions on Information and Systems (ISSN:09168532)
巻号頁・発行日
vol.E95.D, no.3, pp.769-777, 2012-03-01 (Released:2012-03-01)
参考文献数
15
被引用文献数
1

The k-edge-connectivity augmentation problem with bipartition constraints (kECABP, for short) is defined by “Given an undirected graph G=(V,E) and a bipartition π={VB,VW} of V with VB∩VW=∅, find an edge set Ef of minimum cardinality, consisting of edges that connect VB and VW, such that G'=(V,E∪Ef) is k-edge-connected.” The problem has applications for security of statistical data stored in a cross tabulated table, and so on. In this paper we propose a fast algorithm for finding an optimal solution to (σ+1)ECABP in O(|V||E|+|V2|log |V|) time when G is σ-edge-connected (σ > 0), and show that the problem can be solved in linear time if σ ∈ {1,2}.
著者
Hiroki TAMARU Yuki SAITO Shinnosuke TAKAMICHI Tomoki KORIYAMA Hiroshi SARUWATARI
出版者
The Institute of Electronics, Information and Communication Engineers
雑誌
IEICE Transactions on Information and Systems (ISSN:09168532)
巻号頁・発行日
vol.E103.D, no.3, pp.639-647, 2020-03-01 (Released:2020-03-01)
参考文献数
32
被引用文献数
3

This paper proposes a generative moment matching network (GMMN)-based post-filtering method for providing inter-utterance pitch variation to singing voices and discusses its application to our developed mixing method called neural double-tracking (NDT). When a human singer sings and records the same song twice, there is a difference between the two recordings. The difference, which is called inter-utterance variation, enriches the performer's musical expression and the audience's experience. For example, it makes every concert special because it never recurs in exactly the same manner. Inter-utterance variation enables a mixing method called double-tracking (DT). With DT, the same phrase is recorded twice, then the two recordings are mixed to give richness to singing voices. However, in synthesized singing voices, which are commonly used to create music, there is no inter-utterance variation because the synthesis process is deterministic. There is also no inter-utterance variation when only one voice is recorded. Although there is a signal processing-based method called artificial DT (ADT) to layer singing voices, the signal processing results in unnatural sound artifacts. To solve these problems, we propose a post-filtering method for randomly modulating synthesized or natural singing voices as if the singer sang again. The post-filter built with our method models the inter-utterance pitch variation of human singing voices using a conditional GMMN. Evaluation results indicate that 1) the proposed method provides perceptible and natural inter-utterance variation to synthesized singing voices and that 2) our NDT exhibits higher double-trackedness than ADT when applied to both synthesized and natural singing voices.
著者
Yuki SAITO Kei AKUZAWA Kentaro TACHIBANA
出版者
The Institute of Electronics, Information and Communication Engineers
雑誌
IEICE TRANSACTIONS on Information and Systems (ISSN:09168532)
巻号頁・発行日
vol.E103-D, no.9, pp.1978-1987, 2020-09-01

This paper presents a method for many-to-one voice conversion using phonetic posteriorgrams (PPGs) based on an adversarial training of deep neural networks (DNNs). A conventional method for many-to-one VC can learn a mapping function from input acoustic features to target acoustic features through separately trained DNN-based speech recognition and synthesis models. However, 1) the differences among speakers observed in PPGs and 2) an over-smoothing effect of generated acoustic features degrade the converted speech quality. Our method performs a domain-adversarial training of the recognition model for reducing the PPG differences. In addition, it incorporates a generative adversarial network into the training of the synthesis model for alleviating the over-smoothing effect. Unlike the conventional method, ours jointly trains the recognition and synthesis models so that they are optimized for many-to-one VC. Experimental evaluation demonstrates that the proposed method significantly improves the converted speech quality compared with conventional VC methods.
著者
Sho ENDO Jun SONODA Motoyuki SATO Takafumi AOKI
出版者
The Institute of Electronics, Information and Communication Engineers
雑誌
IEICE TRANSACTIONS on Information and Systems (ISSN:09168532)
巻号頁・発行日
vol.E94-D, no.12, pp.2338-2344, 2011-12-01

Finite difference time domain (FDTD) method has been accelerated on the Cell Broadband Engine (Cell B.E.). However the problem has arisen that speedup is limited by the bandwidth of the main memory on large-scale analysis. As described in this paper, we propose a novel algorithm and implement FDTD using it. We compared the novel algorithm with results obtained using region segmentation, thereby demonstrating that the proposed algorithm has shorter calculation time than that provided by region segmentation.
著者
Akira TAMAMORI
出版者
The Institute of Electronics, Information and Communication Engineers
雑誌
IEICE Transactions on Information and Systems (ISSN:09168532)
巻号頁・発行日
vol.E106.D, no.7, pp.1244-1248, 2023-07-01 (Released:2023-07-01)
参考文献数
14
被引用文献数
1

This paper proposes an enhanced model of Random Projection Outlyingness (RPO) for unsupervised outlier detection. When datasets have multiple modalities, the RPOs have frequent detection errors. The proposed model deals with this problem via unsupervised clustering and a local score weighting. The experimental results demonstrate that the proposed model outperforms RPO and is comparable with other existing unsupervised models on benchmark datasets, in terms of in terms of Area Under the Curves (AUCs) of Receiver Operating Characteristic (ROC).
著者
Yosuke MUKASA Tomoya WAKAIZUMI Shu TANAKA Nozomu TOGAWA
出版者
The Institute of Electronics, Information and Communication Engineers
雑誌
IEICE TRANSACTIONS on Information and Systems (ISSN:09168532)
巻号頁・発行日
vol.E104-D, no.10, pp.1592-1600, 2021-10-01
被引用文献数
5

In an amusement park, an attraction-visiting route considering the waiting time and traveling time improves visitors' satisfaction and experience. We focus on Ising machines to solve the problem, which are recently expected to solve combinatorial optimization problems at high speed by mapping the problems to Ising models or quadratic unconstrained binary optimization (QUBO) models. We propose a mapping of the visiting-route recommendation problem in amusement parks to a QUBO model for solving it using Ising machines. By using an actual Ising machine, we could obtain feasible solutions one order of magnitude faster with almost the same accuracy as the simulated annealing method for the visiting-route recommendation problem.
著者
Kunihiro OGATA Tomoki MITA Takeshi SHIMIZU Nobuya YAMASAKI
出版者
The Institute of Electronics, Information and Communication Engineers
雑誌
IEICE Transactions on Information and Systems (ISSN:09168532)
巻号頁・発行日
vol.E98.D, no.11, pp.1916-1922, 2015-11-01 (Released:2015-11-01)
参考文献数
19
被引用文献数
5

Some unilateral lower-limb amputees, have through continued exertion, increase the foot reaction force of the sound leg. The asymmetric gait with a prosthetic leg may thus negatively affect the musculoskeletal health of the leg on the healthy side. Therefore, it is important for these amputees to learn how to adjust the balance of each foot load in training. The aim of this study is to develop a training support system visualizing floor-reaction forces using a color-depth sensor. The pose of the entire body of the amputee is measured by the depth sensor, and the floor reaction force is estimated based on Zero Moment Point (ZMP), which is calculated using the center of mass of the amputee. Evaluation experiments of the proposed method were performed and they confirmed the effectiveness of the estimation method and the training with the visualization of reaction force.
著者
Yuki SAITO Kei AKUZAWA Kentaro TACHIBANA
出版者
The Institute of Electronics, Information and Communication Engineers
雑誌
IEICE Transactions on Information and Systems (ISSN:09168532)
巻号頁・発行日
vol.E103.D, no.9, pp.1978-1987, 2020-09-01 (Released:2020-09-01)
参考文献数
53

This paper presents a method for many-to-one voice conversion using phonetic posteriorgrams (PPGs) based on an adversarial training of deep neural networks (DNNs). A conventional method for many-to-one VC can learn a mapping function from input acoustic features to target acoustic features through separately trained DNN-based speech recognition and synthesis models. However, 1) the differences among speakers observed in PPGs and 2) an over-smoothing effect of generated acoustic features degrade the converted speech quality. Our method performs a domain-adversarial training of the recognition model for reducing the PPG differences. In addition, it incorporates a generative adversarial network into the training of the synthesis model for alleviating the over-smoothing effect. Unlike the conventional method, ours jointly trains the recognition and synthesis models so that they are optimized for many-to-one VC. Experimental evaluation demonstrates that the proposed method significantly improves the converted speech quality compared with conventional VC methods.
著者
Junya KOGUCHI Shinnosuke TAKAMICHI Masanori MORISE Hiroshi SARUWATARI Shigeki SAGAYAMA
出版者
The Institute of Electronics, Information and Communication Engineers
雑誌
IEICE TRANSACTIONS on Information and Systems (ISSN:09168532)
巻号頁・発行日
vol.E103-D, no.12, pp.2673-2681, 2020-12-01
被引用文献数
2

We propose a speech analysis-synthesis and deep neural network (DNN)-based text-to-speech (TTS) synthesis framework using Gaussian mixture model (GMM)-based approximation of full-band spectral envelopes. GMMs have excellent properties as acoustic features in statistic parametric speech synthesis. Each Gaussian function of a GMM fits the local resonance of the spectrum. The GMM retains the fine spectral envelope and achieve high controllability of the structure. However, since conventional speech analysis methods (i.e., GMM parameter estimation) have been formulated for a narrow-band speech, they degrade the quality of synthetic speech. Moreover, a DNN-based TTS synthesis method using GMM-based approximation has not been formulated in spite of its excellent expressive ability. Therefore, we employ peak-picking-based initialization for full-band speech analysis to provide better initialization for iterative estimation of the GMM parameters. We introduce not only prediction error of GMM parameters but also reconstruction error of the spectral envelopes as objective criteria for training DNN. Furthermore, we propose a method for multi-task learning based on minimizing these errors simultaneously. We also propose a post-filter based on variance scaling of the GMM for our framework to enhance synthetic speech. Experimental results from evaluating our framework indicated that 1) the initialization method of our framework outperformed the conventional one in the quality of analysis-synthesized speech; 2) introducing the reconstruction error in DNN training significantly improved the synthetic speech; 3) our variance-scaling-based post-filter further improved the synthetic speech.
著者
Mohammed Salah AL-RADHI Tamás Gábor CSAPÓ Géza NÉMETH
出版者
The Institute of Electronics, Information and Communication Engineers
雑誌
IEICE Transactions on Information and Systems (ISSN:09168532)
巻号頁・発行日
vol.E103.D, no.5, pp.1099-1107, 2020-05-01 (Released:2020-05-01)
参考文献数
36
被引用文献数
2

In this article, we propose a method called “continuous noise masking (cNM)” that allows eliminating residual buzziness in a continuous vocoder, i.e. of which all parameters are continuous and offers a simple and flexible speech analysis and synthesis system. Traditional parametric vocoders generally show a perceptible deterioration in the quality of the synthesized speech due to different processing algorithms. Furthermore, an inaccurate noise resynthesis (e.g. in breathiness or hoarseness) is also considered to be one of the main underlying causes of performance degradation, leading to noisy transients and temporal discontinuity in the synthesized speech. To overcome these issues, a new cNM is developed based on the phase distortion deviation in order to reduce the perceptual effect of the residual noise, allowing a proper reconstruction of noise characteristics, and model better the creaky voice segments that may happen in natural speech. To this end, the cNM is designed to keep only voice components under a condition of the cNM threshold while discarding others. We evaluate the proposed approach and compare with state-of-the-art vocoders using objective and subjective listening tests. Experimental results show that the proposed method can reduce the effect of residual noise and can reach the quality of other sophisticated approaches like STRAIGHT and log domain pulse model (PML).
著者
Yuki SAITO Shinnosuke TAKAMICHI Hiroshi SARUWATARI
出版者
一般社団法人 電子情報通信学会
雑誌
IEICE Transactions on Information and Systems (ISSN:09168532)
巻号頁・発行日
vol.E100.D, no.8, pp.1925-1928, 2017-08-01 (Released:2017-08-01)
参考文献数
20
被引用文献数
19

This paper proposes Deep Neural Network (DNN)-based Voice Conversion (VC) using input-to-output highway networks. VC is a speech synthesis technique that converts input features into output speech parameters, and DNN-based acoustic models for VC are used to estimate the output speech parameters from the input speech parameters. Given that the input and output are often in the same domain (e.g., cepstrum) in VC, this paper proposes a VC using highway networks connected from the input to output. The acoustic models predict the weighted spectral differentials between the input and output spectral parameters. The architecture not only alleviates over-smoothing effects that degrade speech quality, but also effectively represents the characteristics of spectral parameters. The experimental results demonstrate that the proposed architecture outperforms Feed-Forward neural networks in terms of the speech quality and speaker individuality of the converted speech.
著者
CHOI Yunja
出版者
The Institute of Electronics, Information and Communication Engineers
雑誌
IEICE transactions on information and systems (ISSN:09168532)
巻号頁・発行日
vol.96, no.3, pp.735-738, 2013-03-01
被引用文献数
1

An automotive operating system is a typical safety-critical software and therefore requires extensive analysis w.r.t its effect on system safety. Our earlier work [1] reported a systematic model checking approach for checking the safety properties of the OSEK/VDX-based operating system Trampoline. This article reports further performance improvement using embeddedC constructs for efficient verification of the Trampoline model developed in the earlier work. Experiments show that the use of embeddedC constructs greatly reduces verification costs.
著者
Osama HALABI Fatma AL-MESAIFRI Mariam AL-ANSARI Roqaya AL-SHAABI Kazunori MIYATA
出版者
The Institute of Electronics, Information and Communication Engineers
雑誌
IEICE TRANSACTIONS on Information and Systems (ISSN:09168532)
巻号頁・発行日
vol.E97-D, no.8, pp.2048-2052, 2014-08-01

This paper proposes a novel multimodal interactive surgical simulator that incorporates haptic, olfactory, as well as traditional vision feedback. A scent diffuser was developed to produce odors when errors occur. Haptic device was used to provide the sense of touch to the user. The preliminary results show that adding smell as an aid to the simulation enhanced the memory retention that lead to better performance.
著者
Akira TAMAMORI Yoshihiko NANKAKU Keiichi TOKUDA
出版者
The Institute of Electronics, Information and Communication Engineers
雑誌
IEICE TRANSACTIONS on Information and Systems (ISSN:09168532)
巻号頁・発行日
vol.E97-D, no.7, pp.1842-1854, 2014-07-01

In this paper, a novel statistical model based on 2-D HMMs for image recognition is proposed. Recently, separable lattice 2-D HMMs (SL2D-HMMs) were proposed to model invariance to size and location deformation. However, their modeling accuracy is still insufficient because of the following two assumptions, which are inherited from 1-D HMMs: i) the stationary statistics within each state and ii) the conditional independent assumption of state output probabilities. To overcome these shortcomings in 1-D HMMs, trajectory HMMs were proposed and successfully applied to speech recognition and speech synthesis. This paper derives 2-D trajectory HMMs by reformulating the likelihood of SL2D-HMMs through the imposition of explicit relationships between static and dynamic features. The proposed model can efficiently capture dependencies between adjacent observations without increasing the number of model parameters. The effectiveness of the proposed model was evaluated in face recognition experiments on the XM2VTS database.
著者
Nagaoka Chika Komori Masashi
出版者
Institute of Electronics, Information and Communication Engineers
雑誌
IEICE Transactions on Information and Systems (ISSN:09168532)
巻号頁・発行日
vol.E91-D, no.6, pp.1634-1640, 2008-06
被引用文献数
41

Body movement synchrony (i.e. rhythmic synchronization between the body movements of interacting partners) has been described by subjective impressions of skilled counselors and has been considered to reflect the depth of the client-counselor relationship. This study analyzed temporal changes in body movement synchrony through a video analysis of client-counselor dialogues in counseling sessions. Four 50-minute psychotherapeutic counseling sessions were analyzed, including two negatively evaluated sessions (low evaluation groups) and two positively evaluated sessions (high evaluation groups). In addition, two 50-minute ordinary advice sessions between two high school teachers and the clients in the high rating group were analyzed. All sessions represent role-playing. The intensity of the participants' body movement was measured using a video-based system. Temporal change of body movement synchrony was analyzed using moving correlations of the intensity between the two time series. The results revealed (1) A consistent temporal pattern among the four counseling cases, though the moving correlation coefficients were higher for the high evaluation group than the low evaluation group and (2) Different temporal patterns for the counseling and advice sessions even when the clients were the same. These results were discussed from the perspective of the quality of client-counselor relationship.
著者
Masashi KOMORI Hiroko KAMIDE Satoru KAWAMURA Chika NAGAOKA
出版者
The Institute of Electronics, Information and Communication Engineers
雑誌
IEICE TRANSACTIONS on Information and Systems (ISSN:09168532)
巻号頁・発行日
vol.E96-D, no.3, pp.507-513, 2013-03-01

This study investigated the relationship between social skills and facial asymmetry in facial expressions. Three-dimensional facial landmark data of facial expressions (neutral, happy, and angry) were obtained from Japanese participants (n = 62). Following a facial expression task, each participant completed KiSS-18 (Kikuchi's Scale of Social Skills; Kikuchi, 2007). Using a generalized Procrustes analysis, faces and their mirror-reversed versions were represented as points on a hyperplane. The asymmetry of each individual face was defined as Euclidian distance between the face and its mirror reversed face on this plane. Subtraction of the asymmetry level of a neutral face of each individual from the asymmetry level of a target emotion face was defined as the index of “expression asymmetry” given by a particular emotion. Correlation coefficients of KiSS-18 scores and expression asymmetry scores were computed for both happy and angry expressions. Significant negative correlations between KiSS-18 scores and expression asymmetries were found for both expressions. Results indicate that the symmetry in facial expressions increases with higher level of social skills.
著者
Shuhei ENOMOTO Hiroki KUZUNO Hiroshi YAMADA
出版者
The Institute of Electronics, Information and Communication Engineers
雑誌
IEICE Transactions on Information and Systems (ISSN:09168532)
巻号頁・発行日
vol.E105.D, no.11, pp.1890-1899, 2022-11-01 (Released:2022-11-01)
参考文献数
32
被引用文献数
3

CPU flush instruction-based cache side-channel attacks (cache instruction attacks) target a wide range of machines. For instance, Meltdown / Spectre combined with FLUSH+RELOAD gain read access to arbitrary data in operating system kernel and user processes, which work on cloud virtual machines, laptops, desktops, and mobile devices. Additionally, fault injection attacks use a CPU cache. For instance, Rowhammer, is a cache instruction attack that attempts to obtain write access to arbitrary data in physical memory, and affects machines that have DDR3. To protect against existing cache instruction attacks, various existing mechanisms have been proposed to modify hardware and software aspects; however, when latest cache instruction attacks are disclosed, these mechanisms cannot prevent these. Moreover, additional countermeasure requires long time for the designing and developing process. This paper proposes a novel mechanism termed FlushBlocker to protect against all types of cache instruction attacks and mitigate against cache instruction attacks employ latest side-channel vulnerability until the releasing of additional countermeasures. FlushBlocker employs an approach that restricts the issuing of cache flush instructions and the attacks that lead to failure by limiting control of the CPU cache. To demonstrate the effectiveness of this study, FlushBlocker was implemented in the latest Linux kernel, and its security and performance were evaluated. Results show that FlushBlocker successfully prevents existing cache instruction attacks (e.g., Meltdown, Spectre, and Rowhammer), the performance overhead was zero, and it was transparent in real-world applications.
著者
Shiling SHI Stefan HOLST Xiaoqing WEN
出版者
The Institute of Electronics, Information and Communication Engineers
雑誌
IEICE TRANSACTIONS on Information and Systems (ISSN:09168532)
巻号頁・発行日
vol.E106-D, no.10, pp.1694-1704, 2023-10-01
被引用文献数
1

High power dissipation during scan test often causes undue yield loss, especially for low-power circuits. One major reason is that the resulting IR-drop in shift mode may corrupt test data. A common approach to solving this problem is partial-shift, in which multiple scan chains are formed and only one group of scan chains is shifted at a time. However, existing partial-shift based methods suffer from two major problems: (1) their IR-drop estimation is not accurate enough or computationally too expensive to be done for each shift cycle; (2) partial-shift is hence applied to all shift cycles, resulting in long test time. This paper addresses these two problems with a novel IR-drop-aware scan shift method, featuring: (1) Cycle-based IR-Drop Estimation (CIDE) supported by a GPU-accelerated dynamic power simulator to quickly find potential shift cycles with excessive peak IR-drop; (2) a scan shift scheduling method that generates a scan chain grouping targeted for each considered shift cycle to reduce the impact on test time. Experiments on ITC'99 benchmark circuits show that: (1) the CIDE is computationally feasible; (2) the proposed scan shift schedule can achieve a global peak IR-drop reduction of up to 47%. Its scheduling efficiency is 58.4% higher than that of an existing typical method on average, which means our method has less test time.