Ryosuke Yamanishi, Hoshu Takemoto, Yoko Nishihara, Mitsuo Yoshida, Tomoko Ohsuga, Keizo Oyama: Applying Existing Datasets as a Pseudo Corpus for Sentiment Representation on Social Media, Proc. of the 26th International Conference on Knowledge Based and Intelligent Information and Engineering Systems, 2022年9月

This paper proposes a method to represent the sentiment characteristics of opinions on social media by using some datasets as a pseudo corpus without any annotations. The widespread social media enables us to easily share our own opinions on the Web and communicate with each other. The more the data on social media increase, the more the demands for analysis of the data increase, e.g., text classification and sentiment analysis. Usually, the annotated data should be required in the existing text classification using the supervised machine learning method. However, it is reasonable to say that the criteria for the annotated labels should differ for each period, culture, and independent sense. Effective text classification for such different criteria needs the different types of annotations corresponding to each measure, and it requires much time and human resources. A pseudo corpus consists of multiple existing datasets with different characteristics in the proposed method. The classification model for each dataset is obtained as learning the pseudo corpus. The sentiment of the input text, which domain is different from the learned datasets, is represented as the likelihood distribution for the datasets in the pseudo corpus. This paper discusses the potential and limitations of this idea through the experiment.