著者
KIM Jinyoung LEE Joohun SHIRAI Katsuhiko
出版者
一般社団法人電子情報通信学会
雑誌
IEICE transactions on fundamentals of electronics, communications and computer sciences (ISSN:09168508)
巻号頁・発行日
vol.85, no.9, pp.2164-2168, 2002-09-01
参考文献数
9
被引用文献数
13

In this paper, for real-time automatic image transform based lip-reading under illumination variations, an efficient (smaller feature data size) and robust (better recognition under different lighting conditions) method is proposed. Image transform based approach obtains a compressed representation of image pixel values of speaker's mouth and is reported to show superior lip-reading performance. However, this approach inevitably produces large feature vectors relevant to lip information to require much computation time for lip-reading even when principal component analysis (PCA) is applied. To reduce the necessary dimension of feature vectors, the proposed method folded the lip image based on its symmetry in a frame image. This method also compensates the unbalanced illumination between the left and the right lip areas. Additionally, to filter out the inter-frame time-domain spectral distortion of each pixel contaminated by illumination noise, our method adapted the hi-pass filtering on the variations of pixel values between consecutive frames. In the experimental results performed on database recorded at various lighting conditions, the proposed lip-folding or/and inter-frame filtering reduced much the necessary number of feature data, principal components in this work, and showed superior recognition rate compared to the conventional method.