著者
Yugo Sato Tsukasa Fukusato Shigeo Morishima
出版者
The Institute of Image Information and Television Engineers
雑誌
ITE Transactions on Media Technology and Applications (ISSN:21867364)
巻号頁・発行日
vol.7, no.2, pp.68-79, 2019 (Released:2019-04-01)
参考文献数
60
被引用文献数
1

This paper presents an interactive face retrieval framework for clarifying an image representation envisioned by a user. Our system is designed for a situation in which the user wishes to find a person but has only visual memory of the person. We address a critical challenge of image retrieval across the user's inputs. Instead of target-specific information, the user can select several images that are similar to an impression of the target person the user wishes to search for. Based on the user's selection, our proposed system automatically updates a deep convolutional neural network. By interactively repeating these process, the system can reduce the gap between human-based similarities and computer-based similarities and estimate the target image representation. We ran user studies with 10 participants on a public database and confirmed that the proposed framework is effective for clarifying the image representation envisioned by the user easily and quickly.
著者
Tatsunori Hirai Hironori Doi Shigeo Morishima
雑誌
情報処理学会論文誌 (ISSN:18827764)
巻号頁・発行日
vol.59, no.3, 2018-03-15

This paper presents a topic modeling method to retrieve similar music fragments and its application, Music-Mixer, which is a computer-aided DJ system that supports DJ performance by automatically mixing songs in a seamless manner. MusicMixer mixes songs based on audio similarity calculated via beat analysis and latent topic analysis of the chromatic signal in the audio. The topic represents latent semantics on how chromatic sounds are generated. Given a list of songs, a DJ selects a song with beats and sounds similar to a specific point of the currently playing song to seamlessly transition between songs. By calculating similarities between all existing song sections that can be naturally mixed, MusicMixer retrieves the best mixing point from a myriad of possibilities and enables seamless song transitions. Although it is comparatively easy to calculate beat similarity from audio signals, considering the semantics of songs from the viewpoint of a human DJ has proven difficult. Therefore, we propose a method to represent audio signals to construct topic models that acquire latent semantics of audio. The results of a subjective experiment demonstrate the effectiveness of the proposed latent semantic analysis method. MusicMixer achieves automatic song mixing using the audio signal processing approach; thus, users can perform DJ mixing simply by selecting a song from a list of songs suggested by the system.------------------------------This is a preprint of an article intended for publication Journal ofInformation Processing(JIP). This preprint should not be cited. Thisarticle should be cited as: Journal of Information Processing Vol.26(2018) (online)DOI http://dx.doi.org/10.2197/ipsjjip.26.276------------------------------
著者
Masahide Kawai Tomoyori Iwao Daisuke Mima Akinobu Maejima Shigeo Morishima
出版者
一般社団法人 情報処理学会
雑誌
Journal of Information Processing (ISSN:18826652)
巻号頁・発行日
vol.22, no.2, pp.401-409, 2014 (Released:2014-04-15)
参考文献数
27
被引用文献数
2 10

Speech animation synthesis is still a challenging topic in the field of computer graphics. Despite many challenges, representing detailed appearance of inner mouth such as nipping tongue's tip with teeth and tongue's back hasn't been achieved in the resulting animation. To solve this problem, we propose a method of data-driven speech animation synthesis especially when focusing on the inside of the mouth. First, we classify inner mouth into teeth labeling opening distance of the teeth and a tongue according to phoneme information. We then insert them into existing speech animation based on opening distance of the teeth and phoneme information. Finally, we apply patch-based texture synthesis technique with a 2,213 images database created from 7 subjects to the resulting animation. By using the proposed method, we can automatically generate a speech animation with the realistic inner mouth from the existing speech animation created by previous methods.