著者
Manabu Okumura Kiyoaki Shirai Kanako Komiya Hikaru Yokono
出版者
The Association for Natural Language Processing
雑誌
自然言語処理 (ISSN:13407619)
巻号頁・発行日
vol.18, no.3, pp.293-307, 2011 (Released:2011-10-04)
参考文献数
12
被引用文献数
4 4

An overview of the SemEval-2 Japanese WSD task is presented. The new characteristics of our task are (1) the task will use the first balanced Japanese sense-tagged corpus, and (2) the task will take into account not only the instances that have a sense in the given set but also the instances that have a sense that cannot be found in the set. It is a lexical sample task, and word senses are defined according to a Japanese dictionary, the Iwanami Kokugo Jiten. This dictionary and a training corpus were distributed to participants. The number of target words was 50, with 22 nouns, 23 verbs, and 5 adjectives. Fifty instances of each target word were provided, consisting of a total of 2,500 instances for the evaluation. Nine systems from four organizations participated in the task.
著者
Piyoros Tungthamthiti Kiyoaki Shirai Masnizah Mohd
出版者
Information and Media Technologies 編集運営会議
雑誌
Information and Media Technologies (ISSN:18810896)
巻号頁・発行日
vol.12, pp.80-102, 2017 (Released:2017-06-15)
参考文献数
29

Recognition of sarcasm in microblogging is important in a range of NLP applications, such as opinion mining. However, this is a challenging task, as the real meaning of a sarcastic sentence is the opposite of the literal meaning. Furthermore, microblogging messages are short and usually written in a free style that may include misspellings, grammatical errors, and complex sentence structures. This paper proposes a novel method for identifying sarcasm in tweets. It combines two supervised classifiers, a Support Vector Machine (SVM) using N-gram features and an SVM using our proposed features. Our features represent the intensity and contradictions of sentiment in a tweet, derived by sentiment analysis. The sentiment contradiction feature also considers coherence among multiple sentences in the tweet, and this is automatically identified by our proposed method using unsupervised clustering and an adaptive genetic algorithm. Furthermore, a method for identifying the concepts of unknown sentiment words is used to compensate for gaps in the sentiment lexicon. Our method also considers punctuation and the special symbols that are frequently used in Twitter messaging. Experiments using two datasets demonstrated that our proposed system outperformed baseline systems on one dataset, while producing comparable results on the other. Accuracy of 82% and 76% was achieved in sarcasm identification on the two datasets.