著者
Akisato KIMURA Kevin DUH Tsutomu HIRAO Katsuhiko ISHIGURO Tomoharu IWATA Albert AU YEUNG
出版者
The Institute of Electronics, Information and Communication Engineers
雑誌
IEICE TRANSACTIONS on Information and Systems (ISSN:09168532)
巻号頁・発行日
vol.E97-D, no.6, pp.1557-1566, 2014-06-01

Social media such as microblogs have become so pervasive such that it is now possible to use them as sensors for real-world events and memes. While much recent research has focused on developing automatic methods for filtering and summarizing these data streams, we explore a different trend called social curation. In contrast to automatic methods, social curation is characterized as a human-in-the-loop and sometimes crowd-sourced mechanism for exploiting social media as sensors. Although social curation web services like Togetter, Naver Matome and Storify are gaining popularity, little academic research has studied the phenomenon. In this paper, our goal is to investigate the phenomenon and potential of this new field of social curation. First, we perform an in-depth analysis of a large corpus of curated microblog data. We seek to understand why and how people participate in this laborious curation process. We then explore new ways in which information retrieval and machine learning technologies can be used to assist curators. In particular, we propose a novel method based on a learning-to-rank framework that increases the curator's productivity and breadth of perspective by suggesting which novel microblogs should be added to the curated content.
著者
Akisato Kimura Masashi Sugiyama Takuho Nakano Hirokazu Kameoka Hitoshi Sakano Eisaku Maeda Katsuhiko Ishiguro
雑誌
情報処理学会論文誌数理モデル化と応用(TOM) (ISSN:18827780)
巻号頁・発行日
vol.6, no.1, pp.128-135, 2013-03-12

Canonical correlation analysis (CCA) is a powerful tool for analyzing multi-dimensional paired data. However, CCA tends to perform poorly when the number of paired samples is limited, which is often the case in practice. To cope with this problem, we propose a semi-supervised variant of CCA named SemiCCA that allows us to incorporate additional unpaired samples for mitigating overfitting. Advantages of the proposed method over previously proposed methods are its computational efficiency and intuitive operationality: it smoothly bridges the generalized eigenvalue problems of CCA and principal component analysis (PCA), and thus its solution can be computed efficiently just by solving a single eigenvalue problem as the original CCA.