Naoki Fujimoto, Jotaro Tasaki, Ryosuke Yamanishi: Cross-modal Impression Evaluation: text and sound, International Conference on Technology and Application of Artificial Intelligence 2025, 2025年12月 (to appear) – Laboratory of Content-oriented Computational Culture & Arts

This study examined modality-specific processing in impression formation by comparing environmental sounds and text representations of identical urban environments. Total 796 participants evaluated five urban scenes across fifteen dimensions including Russell’s valence-arousal model and ISO 12913 soundscape scales. MANOVA with modality (environmental sound: n = 455, text: n = 314) and scene (five locations) revealed significant main effects and interactions. Modality main effects were largest for valence (η2p = .130) and pleasantness (η2p = .167), while scene effects peaked for the chaotic dimension (η2p = .162). Valence-arousal correlations showed modality dependence: environmental sounds (r = .335) versus text . Modality × scene interactions remained small, and modality effects persisted after controlling for imagery vivid-ness and confidence. These findings indicate multimodal AI systems require modality-specific architectures that account for differential dimensional independence and integration

タグ: entertain, IntlConf