著者
Yamato OHTANI Masatsune TAMURA Masahiro MORITA Masami AKAMINE
出版者
一般社団法人 電子情報通信学会
雑誌
IEICE Transactions on Information and Systems (ISSN:09168532)
巻号頁・発行日
vol.E99.D, no.10, pp.2481-2489, 2016-10-01 (Released:2016-10-01)
参考文献数
36
被引用文献数
1

This paper describes a novel statistical bandwidth extension (BWE) technique based on a Gaussian mixture model (GMM) and a sub-band basis spectrum model (SBM), in which each dimensional component represents a specific acoustic space in the frequency domain. The proposed method can achieve the BWE from speech data with an arbitrary frequency bandwidth whereas the conventional methods perform the conversion from fixed narrow-band data. In the proposed method, we train a GMM with SBM parameters extracted from full-band spectra in advance. According to the bandwidth of input signal, the trained GMM is reconstructed to the GMM of the joint probability density between low-band SBM and high-band SBM components. Then high-band SBM components are estimated from low-band SBM components of the input signal based on the reconstructed GMM. Finally, BWE is achieved by adding the spectra decoded from estimated high-band SBM components to the ones of the input signal. To construct the full-band signal from the narrow-band one, we apply this method to log-amplitude spectra and aperiodic components. Objective and subjective evaluation results show that the proposed method extends the bandwidth of speech data robustly for the log-amplitude spectra. Experimental results also indicate that the aperiodic component extracted from the upsampled narrow-band signal realizes the same performance as the restored and the full-band aperiodic components in the proposed method.
著者
Yuki SAITO Shinnosuke TAKAMICHI Hiroshi SARUWATARI
出版者
The Institute of Electronics, Information and Communication Engineers
雑誌
IEICE TRANSACTIONS on Information and Systems (ISSN:09168532)
巻号頁・発行日
vol.E100-D, no.8, pp.1925-1928, 2017-08-01

This paper proposes Deep Neural Network (DNN)-based Voice Conversion (VC) using input-to-output highway networks. VC is a speech synthesis technique that converts input features into output speech parameters, and DNN-based acoustic models for VC are used to estimate the output speech parameters from the input speech parameters. Given that the input and output are often in the same domain (e.g., cepstrum) in VC, this paper proposes a VC using highway networks connected from the input to output. The acoustic models predict the weighted spectral differentials between the input and output spectral parameters. The architecture not only alleviates over-smoothing effects that degrade speech quality, but also effectively represents the characteristics of spectral parameters. The experimental results demonstrate that the proposed architecture outperforms Feed-Forward neural networks in terms of the speech quality and speaker individuality of the converted speech.
著者
Changki LEE
出版者
一般社団法人 電子情報通信学会
雑誌
IEICE Transactions on Information and Systems (ISSN:09168532)
巻号頁・発行日
vol.E100.D, no.4, pp.882-887, 2017-04-01 (Released:2017-04-01)
参考文献数
14
被引用文献数
10

Recurrent neural networks (RNNs) are a powerful model for sequential data. RNNs that use long short-term memory (LSTM) cells have proven effective in handwriting recognition, language modeling, speech recognition, and language comprehension tasks. In this study, we propose LSTM conditional random fields (LSTM-CRF); it is an LSTM-based RNN model that uses output-label dependencies with transition features and a CRF-like sequence-level objective function. We also propose variations to the LSTM-CRF model using a gate recurrent unit (GRU) and structurally constrained recurrent network (SCRN). Empirical results reveal that our proposed models attain state-of-the-art performance for named entity recognition.
著者
Hiroshi IWATA Nanami KATAYAMA Ken'ichi YAMAGUCHI
出版者
The Institute of Electronics, Information and Communication Engineers
雑誌
IEICE TRANSACTIONS on Information and Systems (ISSN:09168532)
巻号頁・発行日
vol.E100-D, no.6, pp.1182-1189, 2017-06-01

In accordance with Moore's law, recent design issues include shortening of time-to-market and detection of delay faults. Several studies with respect to formal techniques have examined the first issue. Using the equivalence checking, it is possible to identify whether large circuits are equivalent or not in a practical time frame. With respect to the latter issue, it is difficult to achieve 100% fault efficiency even for transition faults in full scan designs. This study involved proposing a redundant transition fault identification method using equivalence checking. The main concept of the proposed algorithm involved combining the following two known techniques, 1. modeling of a transition fault as a stuck-at fault with temporal expansion and 2. detection of a stuck-at fault by using equivalence checking tools. The experimental results indicated that the proposed redundant identification method using a formal approach achieved 100% fault efficiency for all benchmark circuits in a practical time even if a commercial ATPG tool was unable to achieve 100% fault efficiency for several circuits.
著者
Tadashi MATSUO Nobutaka SHIMADA
出版者
一般社団法人 電子情報通信学会
雑誌
IEICE Transactions on Information and Systems (ISSN:09168532)
巻号頁・発行日
vol.E100.D, no.6, pp.1350-1359, 2017-06-01 (Released:2017-06-01)
参考文献数
27
被引用文献数
3

Appearance-based generic object recognition is a challenging problem because all possible appearances of objects cannot be registered, especially as new objects are produced every day. Function of objects, however, has a comparatively small number of prototypes. Therefore, function-based classification of new objects could be a valuable tool for generic object recognition. Object functions are closely related to hand-object interactions during handling of a functional object; i.e., how the hand approaches the object, which parts of the object and contact the hand, and the shape of the hand during interaction. Hand-object interactions are helpful for modeling object functions. However, it is difficult to assign discrete labels to interactions because an object shape and grasping hand-postures intrinsically have continuous variations. To describe these interactions, we propose the interaction descriptor space which is acquired from unlabeled appearances of human hand-object interactions. By using interaction descriptors, we can numerically describe the relation between an object's appearance and its possible interaction with the hand. The model infers the quantitative state of the interaction from the object image alone. It also identifies the parts of objects designed for hand interactions such as grips and handles. We demonstrate that the proposed method can unsupervisedly generate interaction descriptors that make clusters corresponding to interaction types. And also we demonstrate that the model can infer possible hand-object interactions.
著者
Shigeki MATSUDA Teruaki HAYASHI Yutaka ASHIKARI Yoshinori SHIGA Hidenori KASHIOKA Keiji YASUDA Hideo OKUMA Masao UCHIYAMA Eiichiro SUMITA Hisashi KAWAI Satoshi NAKAMURA
出版者
The Institute of Electronics, Information and Communication Engineers
雑誌
IEICE TRANSACTIONS on Information and Systems (ISSN:09168532)
巻号頁・発行日
vol.E100-D, no.4, pp.621-632, 2017-04-01

This study introduces large-scale field experiments of VoiceTra, which is the world's first speech-to-speech multilingual translation application for smart phones. In the study, approximately 10 million input utterances were collected since the experiments commenced. The usage of collected data was analyzed and discussed. The study has several important contributions. First, it explains system configuration, communication protocol between clients and servers, and details of multilingual automatic speech recognition, multilingual machine translation, and multilingual speech synthesis subsystems. Second, it demonstrates the effects of mid-term system updates using collected data to improve an acoustic model, a language model, and a dictionary. Third, it analyzes system usage.
著者
Nobuaki MINEMATSU Ryuji KITA Keikichi HIROSE
出版者
The Institute of Electronics, Information and Communication Engineers
雑誌
IEICE TRANSACTIONS on Information and Systems (ISSN:09168532)
巻号頁・発行日
vol.E86-D, no.3, pp.550-557, 2003-03-01

Accurate estimation of accentual attribute values of words, which is required to apply rules of Japanese word accent sandhi to prosody generation, is an important factor to realize high-quality text-to-speech (TTS) conversion. The rules were already formulated by Sagisaka et al. and are widely used in Japanese TTS conversion systems. Application of these rules, however, requires values of a few accentual attributes of each constituent word of input text. The attribute values cannot be found in any public database or any accent dictionaries of Japanese. Further, these values are difficult even for native speakers of Japanese to estimate only with their introspective consideration of properties of their mother tongue. In this paper, an algorithm was proposed, where these values were automatically estimated from a large amount of data of accent types of accentual phrases, which were collected through a long series of listening experiments. In the proposed algorithm, inter-speaker differences of knowledge of accent sandhi were well considered. To improve the coverage of the estimated values over the obtained data, the rules were tentatively modified. Evaluation experiments using two-mora accentual phrases showed the high validity of the estimated values and the modified rules and also some defects caused by varieties of linguistic expressions of Japanese.
著者
Yugo HAYASHI
出版者
The Institute of Electronics, Information and Communication Engineers
雑誌
IEICE TRANSACTIONS on Information and Systems (ISSN:09168532)
巻号頁・発行日
vol.E99-D, no.6, pp.1455-1461, 2016-06-01

The present study investigated the performance of text-based explanation for a large number of learners in an online tutoring task guided by a Pedagogical Conversational Agent (PCA). In the study, a lexical network analysis that focused on the co-occurrence of keywords in learner's explanation text, which were used as dependent variables, was performed. This method was used to investigate how the variables, which consisted of expressions of emotion, embodied characteristics of the PCA, and personal characteristics of the learner, influenced the performance of the explanation text. The learners (participants) were students enrolled in a psychology class. The learners provided explanations to a PCA one-on-one as an after-school activity. In this activity, the PCA, portraying the role of a questioner, asked the learners to explain a key concept taught in their class. The students were randomly assigned one key term out of 30 and were asked to formulate explanations by answering different types of questions. The task consisted of 17 trials. More than 300 text-based explanation dialogues were collected from learners using a web-based explanation system, and the factors influencing learner performance were investigated. Machine learning results showed that during the explanation activity, the expressions used and the gender of the PCA influenced learner performance. Results showed that (1) learners performed better when a male PCA expressed negative emotions as opposed to when a female PCA expressed negative emotions, and (2) learners performed better when a female PCA expressed positive expressions as opposed to when a female PCA expressed negative expressions. This paper provides insight into capturing the behavior of humans performing online tasks, and it puts forward suggestions related to the design of an efficient online tutoring system using PCA.
著者
Sutee SUDPRASERT Asanee KAWTRAKUL Christian BOITET Vincent BERMENT
出版者
The Institute of Electronics, Information and Communication Engineers
雑誌
IEICE TRANSACTIONS on Information and Systems (ISSN:09168532)
巻号頁・発行日
vol.E92-D, no.10, pp.2122-2136, 2009-10-01

In this paper, we present a new dependency parsing method for languages which have very small annotated corpus and for which methods of segmentation and morphological analysis producing a unique (automatically disambiguated) result are very unreliable. Our method works on a morphosyntactic lattice factorizing all possible segmentation and part-of-speech tagging results. The quality of the input to syntactic analysis is hence much better than that of an unreliable unique sequence of lemmatized and tagged words. We propose an adaptation of Eisner's algorithm for finding the k-best dependency trees in a morphosyntactic lattice structure encoding multiple results of morphosyntactic analysis. Moreover, we present how to use Dependency Insertion Grammar in order to adjust the scores and filter out invalid trees, the use of language model to rescore the parse trees and the k-best extension of our parsing model. The highest parsing accuracy reported in this paper is 74.32% which represents a 6.31% improvement compared to the model taking the input from the unreliable morphosyntactic analysis tools.
著者
Marie KATSURAI Ikki OHMUKAI Hideaki TAKEDA
出版者
The Institute of Electronics, Information and Communication Engineers
雑誌
IEICE TRANSACTIONS on Information and Systems (ISSN:09168532)
巻号頁・発行日
vol.E99-D, no.4, pp.1010-1018, 2016-04-01

It is crucial to promote interdisciplinary research and recommend collaborators from different research fields via academic database analysis. This paper addresses a problem to characterize researchers' interests with a set of diverse research topics found in a large-scale academic database. Specifically, we first use latent Dirichlet allocation to extract topics as distributions over words from a training dataset. Then, we convert the textual features of a researcher's publications to topic vectors, and calculate the centroid of these vectors to summarize the researcher's interest as a single vector. In experiments conducted on CiNii Articles, which is the largest academic database in Japan, we show that the extracted topics reflect the diversity of the research fields in the database. The experiment results also indicate the applicability of the proposed topic representation to the author disambiguation problem.
著者
Akio DOI Akio KOIDE
出版者
The Institute of Electronics, Information and Communication Engineers
雑誌
IEICE TRANSACTIONS on Information and Systems (ISSN:09168532)
巻号頁・発行日
vol.E74-D, no.1, pp.214-224, 1991-01-25

A rapid, stable, and simple triangulation method is presented for approximating equi-valued surfaces by polyhedra. It is applicable to various scientific and engineering fields, because equi-valued surfaces can be closed or open, and because spatial functions can be given by a parameter set of defining functions or by the numerical values of functions at grid points. The method is tested numerically to compare its accuracy, computation time, and data volume with those of other methods.
著者
Kunihiro OGATA Tomoki MITA Takeshi SHIMIZU Nobuya YAMASAKI
出版者
The Institute of Electronics, Information and Communication Engineers
雑誌
IEICE TRANSACTIONS on Information and Systems (ISSN:09168532)
巻号頁・発行日
vol.E98-D, no.11, pp.1916-1922, 2015-11-01

Some unilateral lower-limb amputees, have through continued exertion, increase the foot reaction force of the sound leg. The asymmetric gait with a prosthetic leg may thus negatively affect the musculoskeletal health of the leg on the healthy side. Therefore, it is important for these amputees to learn how to adjust the balance of each foot load in training. The aim of this study is to develop a training support system visualizing floor-reaction forces using a color-depth sensor. The pose of the entire body of the amputee is measured by the depth sensor, and the floor reaction force is estimated based on Zero Moment Point (ZMP), which is calculated using the center of mass of the amputee. Evaluation experiments of the proposed method were performed and they confirmed the effectiveness of the estimation method and the training with the visualization of reaction force.
著者
Shinya TAKAMAEDA-YAMAZAKI Hiroshi NAKATSUKA Yuichiro TANAKA Kenji KISE
出版者
The Institute of Electronics, Information and Communication Engineers
雑誌
IEICE TRANSACTIONS on Information and Systems (ISSN:09168532)
巻号頁・発行日
vol.E98-D, no.12, pp.2150-2158, 2015-12-01

Soft processors are widely used in FPGA-based embedded computing systems. For such purposes, efficiency in resource utilization is as important as high performance. This paper proposes Ultrasmall, a new soft processor architecture for FPGAs. Ultrasmall supports a subset of the MIPS-I instruction set architecture and employs an area efficient microarchitecture to reduce the use of FPGA resources. While supporting the original 32-bit ISA, Ultrasmall uses a 2-bit serial ALU for all of its operations. This approach significantly reduces the resource utilization instead of increasing the performance overheads. In addition to these device-independent optimizations, we applied several device-dependent optimizations for Xilinx Spartan-3E FPGAs using 4-input lookup tables (LUTs). Optimizations using specific primitives aggressively reduce the number of occupied slices. Our evaluation result shows that Ultrasmall occupies only 84% of the previous small soft processor. In addition to the utilized resource reduction, Ultrasmall achieves 2.9 times higher performance than the previous approach.
著者
MinKyu KIM SunHo KI YoungDuke SEO JinHong PARK ChuShik JHON
出版者
The Institute of Electronics, Information and Communication Engineers
雑誌
IEICE TRANSACTIONS on Information and Systems (ISSN:09168532)
巻号頁・発行日
vol.E98-D, no.12, pp.2353-2357, 2015-12-01

Recently in the mobile graphic industry, ultra-realistic visual qualities with 60fps and limited power budget for GPU have been required. For graphics-heavy applications that run at 30 fps, we easily observed very noticeable flickering artifacts. Further, the workload imposed by high resolutions at high frame rates directly decreases the battery life. Unlike the recent frame rate up sampling algorithms which remedy the flickering but cause inevitable significant overheads to reconstruct intermediate frames, we propose a dynamic rendering quality scaling (DRQS) that includes dynamic rendering based on resolution changes and quality scaling to increase the frame rate with negligible overhead using a transform matrix. Further DRQS reduces the workload up to 32% without human visual-perceptual changes for graphics-light applications.
著者
Manuel CEDILLO HERNANDEZ Antonio CEDILLO HERNANDEZ Francisco GARCIA UGALDE Mariko NAKANO MIYATAKE Hector PEREZ MEANA
出版者
一般社団法人 電子情報通信学会
雑誌
IEICE Transactions on Information and Systems (ISSN:09168532)
巻号頁・発行日
vol.E98.D, no.9, pp.1702-1705, 2015-09-01 (Released:2015-09-01)
参考文献数
8
被引用文献数
2

In this letter we present an imperceptible and robust watermarking algorithm that uses a cryptographic hash function in the authentication application of digital medical imaging. In the proposed scheme we combine discrete Fourier transform (DFT) and local image masking to detect the watermark after a geometrical distortion and improve its imperceptibility. The image quality is measured by metrics currently used in digital image processing, such as VSNR, SSIM and PSNR.
著者
Shuji SAKAI Koichi ITO Takafumi AOKI Takafumi WATANABE Hiroki UNTEN
出版者
The Institute of Electronics, Information and Communication Engineers
雑誌
IEICE TRANSACTIONS on Information and Systems (ISSN:09168532)
巻号頁・発行日
vol.E98-D, no.10, pp.1818-1828, 2015-10-01

Methods of window matching to estimate 3D points are the most serious factors affecting the accuracy, robustness, and computational cost of Multi-View Stereo (MVS) algorithms. Most existing MVS algorithms employ window matching based on Normalized Cross-Correlation (NCC) to estimate the depth of a 3D point. NCC-based window matching estimates the displacement between matching windows with sub-pixel accuracy by linear/cubic interpolation, which does not represent accurate sub-pixel values of matching windows. This paper proposes a technique of window matching that is very accurate using Phase-Only Correlation (POC) with geometric correction for MVS. The accurate sub-pixel displacement between two matching windows can be estimated by fitting the analytical correlation peak model of the POC function. The proposed method also corrects the geometric transformations of matching windows by taking into consideration the 3D shape of a target object. The use of the proposed geometric correction approach makes it possible to achieve accurate 3D reconstruction from multi-view images even for images with large transformations. The proposed method demonstrates more accurate 3D reconstruction from multi-view images than the conventional methods in a set of experiments.
著者
Junichi HORI Koji SAKANO Yoshiaki SAITOH
出版者
The Institute of Electronics, Information and Communication Engineers
雑誌
IEICE TRANSACTIONS on Information and Systems (ISSN:09168532)
巻号頁・発行日
vol.E89-D, no.6, pp.1790-1797, 2006-06-01

A communication support interface controlled by eye movements and voluntary eye blink has been developed for disabled individuals with motor paralysis who cannot speak. Horizontal and vertical electro-oculograms were measured using two surface electrodes attached above and beside the dominant eye and referring to an earlobe electrode and amplified with AC-coupling in order to reduce the unnecessary drift. Four directional cursor movements --up, down, right, and left-- and one selected operation were realized by logically combining the two detected channel signals based on threshold settings specific to the individual. Letter input experiments were conducted on a virtual screen keyboard. The method's usability was enhanced by minimizing the number of electrodes and applying training to both the subject and the device. As a result, an accuracy of 90.1 3.6% and a processing speed of 7.7 1.9 letters/min. were obtained using our method.
著者
Masashi SUGIYAMA Ichiro TAKEUCHI Taiji SUZUKI Takafumi KANAMORI Hirotaka HACHIYA Daisuke OKANOHARA
出版者
The Institute of Electronics, Information and Communication Engineers
雑誌
IEICE TRANSACTIONS on Information and Systems (ISSN:09168532)
巻号頁・発行日
vol.E93-D, no.3, pp.583-594, 2010-03-01

Estimating the conditional mean of an input-output relation is the goal of regression. However, regression analysis is not sufficiently informative if the conditional distribution has multi-modality, is highly asymmetric, or contains heteroscedastic noise. In such scenarios, estimating the conditional distribution itself would be more useful. In this paper, we propose a novel method of conditional density estimation that is suitable for multi-dimensional continuous variables. The basic idea of the proposed method is to express the conditional density in terms of the density ratio and the ratio is directly estimated without going through density estimation. Experiments using benchmark and robot transition datasets illustrate the usefulness of the proposed approach.
著者
Akisato KIMURA Kevin DUH Tsutomu HIRAO Katsuhiko ISHIGURO Tomoharu IWATA Albert AU YEUNG
出版者
The Institute of Electronics, Information and Communication Engineers
雑誌
IEICE TRANSACTIONS on Information and Systems (ISSN:09168532)
巻号頁・発行日
vol.E97-D, no.6, pp.1557-1566, 2014-06-01

Social media such as microblogs have become so pervasive such that it is now possible to use them as sensors for real-world events and memes. While much recent research has focused on developing automatic methods for filtering and summarizing these data streams, we explore a different trend called social curation. In contrast to automatic methods, social curation is characterized as a human-in-the-loop and sometimes crowd-sourced mechanism for exploiting social media as sensors. Although social curation web services like Togetter, Naver Matome and Storify are gaining popularity, little academic research has studied the phenomenon. In this paper, our goal is to investigate the phenomenon and potential of this new field of social curation. First, we perform an in-depth analysis of a large corpus of curated microblog data. We seek to understand why and how people participate in this laborious curation process. We then explore new ways in which information retrieval and machine learning technologies can be used to assist curators. In particular, we propose a novel method based on a learning-to-rank framework that increases the curator's productivity and breadth of perspective by suggesting which novel microblogs should be added to the curated content.
著者
Donghai TIAN Jingfeng XUE Changzhen HU Xuanya LI
出版者
The Institute of Electronics, Information and Communication Engineers
雑誌
IEICE TRANSACTIONS on Information and Systems (ISSN:09168532)
巻号頁・発行日
vol.E97-D, no.6, pp.1648-1651, 2014-06-01

A whitelisting approach is a promising solution to prevent unwanted processes (e.g., malware) getting executed. However, previous solutions suffer from limitations in that: 1) Most methods place the whitelist information in the kernel space, which could be tempered by attackers; 2) Most methods cannot prevent the execution of kernel processes. In this paper, we present VAW, a novel application whitelisting system by using the virtualization technology. Our system is able to block the execution of unauthorized user and kernel processes. Compared with the previous solutions, our approach can achieve stronger security guarantees. The experiments show that VAW can deny the execution of unwanted processes effectively with a little performance overhead.