著者
Masayuki Takeda Yusuke Shibata Tetsuya Matsumoto Takuya Kida Ayumi Shinohara Shuichi Fukamachi Takeshi Shinohara Setsuo Arikawa
雑誌
情報処理学会論文誌 (ISSN:18827764)
巻号頁・発行日
vol.42, no.3, pp.370-384, 2001-03-15

This paper describes our recent studies onstring pattern matching in compressed textsmainly from practical viewpoints.The aim is to speed up the string pattern matching task in comparison with an ordinary search over the original texts.We have successfully developed (1) an AC type algorithmfor searching in Huffman encoded files and (2) a KMP typealgorithm and (3) a BM type algorithm for searchingin files compressed by the so-called byte pair encoding (BPE).Each of the algorithms reduces the search timeat nearly the same rate as the compression ratio.Surprisingly the BM type algorithm runs over BPE compressed filesabout $1.2$--$3.0$ times faster thanthe exact match routines of the software package {?tt agrep} which is known as the fastest pattern matching tool.
著者
Satoshi Yoshida Takashi Uemura Takuya Kida Tatsuya Asai Seishi Okamoto
出版者
Information and Media Technologies 編集運営会議
雑誌
Information and Media Technologies (ISSN:18810896)
巻号頁・発行日
vol.7, no.1, pp.129-140, 2012 (Released:2012-03-15)
参考文献数
32

We address the problem of improving variable-length-to-fixed-length codes (VF codes). A VF code that we deal here with is an encoding scheme that parses an input text into variable length substrings and then assigns a fixed length codeword to each parsed substring. VF codes have favourable properties for fast decoding and fast compressed pattern matching, but they are worse in compression ratio than the latest compression methods. The compression ratio of a VF code depends on the parse tree used as a dictionary. To gain a better compression ratio we present several improvement methods for constructing parse trees. All of them are heuristical solutions since it is intractable to construct the optimal parse tree. We compared our methods with the previous VF codes, and showed experimentally that their compression ratios reach to the level of state-of-the-art compression methods.