- 著者
-
阪口 哲男
于 家富
- 出版者
- 情報知識学会
- 雑誌
- 情報知識学会誌 (ISSN:09171436)
- 巻号頁・発行日
- vol.15, no.2, pp.53-56, 2005
- 参考文献数
- 4
The growth of unsolicited bulk e-mails (spams) is a crucial problem on e-mails of the Internet. There are many anti-spam tools based on automatic classification by learning, such as Bayesian filters. They are dependent on language of e-mails because they have lexical analyzer to get words from e-mails. However, spams are written in various languages, such as English, Japanese, Chinese, and so on. This paper proposes a language independent method for filtering spams. By the method, e-mails are classified into spams and no-spams by SVM which uses frequencies of sub-strings extracted from e-mails. This paper also describes a result of test of the method with sample e-mails written in English, Japanese, Chinese, and some other languages, and discusses about the result and future works.