Environment and speaker related emotion recognition in conversations
Z Zhong, S Yang, G Becigneul - The 2nd International Conference on …, 2021 - dl.acm.org
Z Zhong, S Yang, G Becigneul
The 2nd International Conference on Computing and Data Science, 2021•dl.acm.orgERC (Emotion Recognition in Conversations) is the basis for computers to understand the
speakers' sayings and do other downstream tasks, such as chatting and playing music
related to the speaker's emotion. However, most approaches are either based on general-
purpose language models which fail to exploit crucial information about the environment
and the speakers, or resort to complex entanglements of neural network architectures
resulting in less stable training procedures and slower inference time. To bridge this gap, we …
speakers' sayings and do other downstream tasks, such as chatting and playing music
related to the speaker's emotion. However, most approaches are either based on general-
purpose language models which fail to exploit crucial information about the environment
and the speakers, or resort to complex entanglements of neural network architectures
resulting in less stable training procedures and slower inference time. To bridge this gap, we …
ERC (Emotion Recognition in Conversations) is the basis for computers to understand the speakers' sayings and do other downstream tasks, such as chatting and playing music related to the speaker's emotion. However, most approaches are either based on general-purpose language models which fail to exploit crucial information about the environment and the speakers, or resort to complex entanglements of neural network architectures resulting in less stable training procedures and slower inference time. To bridge this gap, we propose a simple and efficient plug-and-play solution based on segment embeddings, leveraging architectural characteristics of recent causal transformers. We obtained near state-of-the-art results in ERC on MELD at a fraction of the usual inference time simply by incorporating information about the speakers and their environments obtained via LDA (Latent Dirichlet Allocation) into DialoGPT, thus without using any additional labels. Empirically, our method yields an F1 score of 63.4 - against current state-of-the-art of 65.2 - while being 1.5x faster at inference time and leveraging a completely different information edge.
ACM Digital Library
以上显示的是最相近的搜索结果。 查看全部搜索结果