The final goal of this research is to build a system to support appropriate understanding and learning of language expressions. Native speakers use the same expressions for different interpretations depending on the context. As a preliminary step toward achieving this goal, we aim to develop a language model that appropriately processes context-dependent linguistic expressions. Some key factors are linked to understanding linguistic expressions: the speaker’s emotion, the situation where the utterance happens, and the content of the utterance. Thus, this study trained a large-scale language model (a.k.a. LLM) using a dataset that combines these three factors and the text of the utterance. The constructed language processing model should appropriately understand linguistic expressions according to context. For the speaker’s emotion, we used the Multimodal EmotionLines Dataset for training, which includes video dialogues with emotion labels and dialogue situations. In the future, we will apply this model to construct a role-playing language learning support. It will evaluate the appropriateness of the learner’s input utterances according to the situation and generate contextualized responses.