Benchmarking Cross-Domain Audio-Visual Deception Detection
Automated deception detection is crucial for assisting humans in accurately assessing
truthfulness and identifying deceptive behavior. Conventional contact-based techniques, like
polygraph devices, rely on physiological signals to determine the authenticity of an
individual's statements. Nevertheless, recent developments in automated deception
detection have demonstrated that multimodal features derived from both audio and video
modalities may outperform human observers on publicly available datasets. Despite these …
truthfulness and identifying deceptive behavior. Conventional contact-based techniques, like
polygraph devices, rely on physiological signals to determine the authenticity of an
individual's statements. Nevertheless, recent developments in automated deception
detection have demonstrated that multimodal features derived from both audio and video
modalities may outperform human observers on publicly available datasets. Despite these …
Automated deception detection is crucial for assisting humans in accurately assessing truthfulness and identifying deceptive behavior. Conventional contact-based techniques, like polygraph devices, rely on physiological signals to determine the authenticity of an individual's statements. Nevertheless, recent developments in automated deception detection have demonstrated that multimodal features derived from both audio and video modalities may outperform human observers on publicly available datasets. Despite these positive findings, the generalizability of existing audio-visual deception detection approaches across different scenarios remains largely unexplored. To close this gap, we present the first cross-domain audio-visual deception detection benchmark, that enables us to assess how well these methods generalize for use in real-world scenarios. We used widely adopted audio and visual features and different architectures for benchmarking, comparing single-to-single and multi-to-single domain generalization performance. To further exploit the impacts using data from multiple source domains for training, we investigate three types of domain sampling strategies, including domain-simultaneous, domain-alternating, and domain-by-domain for multi-to-single domain generalization evaluation. Furthermore, we proposed the Attention-Mixer fusion method to improve performance, and we believe that this new cross-domain benchmark will facilitate future research in audio-visual deception detection. Protocols and source code are available at \href{https://github.com/Redaimao/cross_domain_DD}{https://github.com/Redaimao/cross\_domain\_DD}.
arxiv.org
以上显示的是最相近的搜索结果。 查看全部搜索结果