ウェブ検索クエリログとクリックスルーログを用いた同義語獲得

2 0 0 0 ウェブ検索クエリログとクリックスルーログを用いた同義語獲得

著者: 内海慶小町守
出版者: 情報処理学会
雑誌: 情報処理学会論文誌データベース(TOD) (ISSN:18827799)
巻号頁・発行日: vol.6, no.1, pp.16-28, 2013-01-23

近年のウェブ検索エンジンの多くはクエリ拡張機能やクエリ書き換えを備えている.これらの機能の実現にはシソーラスや同義語辞書を用いるが,人手での辞書作成はコストがかかる.そのため,ウェブ検索ログやクリックスルーログを用いた同義語獲得の研究が行われている.これまでに提案された手法では,生成モデルである Noisy Channel Model によって同義語獲得をモデル化しており,柔軟な素性設計が行えなかったため,クエリと同義語候補の表層の編集距離を素性として追加する等が難しかった.我々は,この問題に対処すべく,同義語獲得に識別モデルを用いた手法を提案する.クエリ書き換えのための同義語辞書では, 1 つのクエリに対してより適切と考えられる 1 つの同義語を登録する.そのため同義語獲得手法には,同義語候補が複数ある場合には最適な候補を 1 位に提示することが求められる.そこで提案手法では,クエリと同義語候補の表層に基づく素性を利用した ListNet を用いて 1 位正解率を直接最大化する.また,従来の識別モデルでは,有効な組合せ素性の追加等,素性エンジニアリングを行う必要があったが,我々は ListNet に隠れ層を導入することで,素性エンジニアリングなしに有効な組合せ素性の生成と重み付けを可能とした.これにより, Noisy Channel Model を用いた従来の手法に比べ,より高い精度で同義語を獲得することができた.Recent web search engines often employ query expansion and query reformulation techniques. These techniques use thesauri and synonym dictionaries, but manually making dictionary requires time and costs. Thus, automatic acquisition of synonymous expressions using web-search logs and click-through logs has been studied. One of the previous work formulates the synonym extraction problem as a generative process using the noisy channel model, but since generative models do not allow flexible feature design, it is difficult to use as features edit distance between the surface of a query and its synonym. To deal with this problem, we employed discriminative approaches for synonym extraction. When creating a synonym dictionary for query reformulation, only one synonym which better leads to appropriate search results is registered for each query. Therefore, it is required that the synonym acquisition method for query reformulation must pick an optimal entry if there are several synonym candidates. Hence we propose to maximize the 1-best accuracy using ListNet with features based on the surface of a query and its synonym to achieve the goal. Moreover, though most traditional discriminative methods require feature engineering to find efficient combinations of features, we automate this process by introducing hidden layers to the ranking function. Our proposed method outperformed previous method based on the noisy channel model in the task of synonym extraction.

2 0 0 0 ウェブ検索クエリログとクリックスルーログを用いた同義語獲得

言及状況

はてなブックマーク (1 users, 1 posts)

Twitter (1 users, 1 posts, 0 favorites)

収集済み URL リスト