Entity-level Interaction via Heterogeneous Graph for Multimodal Named Entity Recognition

Abstract

Multimodal Named Entity Recognition (MNER) faces two specific challenges: 1) How to capture useful entity-related visual information; 2) How to alleviate the interference of visual noise. Previous works have gained progress by improving interacting mechanisms or seeking for better visual features. However, existing methods neglect the integrity of entity semantics and conduct cross-modal interaction at token-level, which cuts apart the semantics of entities and makes non-entity tokens easily interfered with by irrelevant visual noise. Thus in this paper, we propose an end-to-end heterogeneous Graph-based Entity-level Interacting model (GEI) for MNER. GEI first utilizes a span detection subtask to obtain entity representations, which serve as the bridge between two modalities.Then, the heterogeneous graph interacting network interacts entity with object nodes to capture entity-related visual information, and fuses it into only entity-associated tokens to rid non-entity tokens of the visual noise. Experiments on two widely used datasets demonstrate the effectiveness of our method.

Publication
EMNLP 2022 Findings
Guanting Dong
Guanting Dong
Postgraduate Student

Spoken Language Understading and related applications

Weiran Xu
Weiran Xu
Associate Professor, Master Supervisor, Ph.D Supervisor

Information Retrieval, Pattern Recognition, Machine Learning