Propagation and pitfalls: Reasoning-based assessment of knowledge editing through counterfactual tasks
Current approaches of knowledge editing struggle to effectively propagate updates to
interconnected facts. In this work, we delve into the barriers that hinder the appropriate
propagation of updated knowledge within these models for accurate reasoning. To support
our analysis, we introduce a novel reasoning-based benchmark--ReCoE (Reasoning-based
Counterfactual Editing dataset)--which covers six common reasoning schemes in real world.
We conduct a thorough analysis of existing knowledge editing techniques, including input …
interconnected facts. In this work, we delve into the barriers that hinder the appropriate
propagation of updated knowledge within these models for accurate reasoning. To support
our analysis, we introduce a novel reasoning-based benchmark--ReCoE (Reasoning-based
Counterfactual Editing dataset)--which covers six common reasoning schemes in real world.
We conduct a thorough analysis of existing knowledge editing techniques, including input …
Current approaches of knowledge editing struggle to effectively propagate updates to interconnected facts. In this work, we delve into the barriers that hinder the appropriate propagation of updated knowledge within these models for accurate reasoning. To support our analysis, we introduce a novel reasoning-based benchmark -- ReCoE (Reasoning-based Counterfactual Editing dataset) -- which covers six common reasoning schemes in real world. We conduct a thorough analysis of existing knowledge editing techniques, including input augmentation, finetuning, and locate-and-edit. We found that all model editing methods show notably low performance on this dataset, especially in certain reasoning schemes. Our analysis over the chain-of-thought generation of edited models further uncover key reasons behind the inadequacy of existing knowledge editing methods from a reasoning standpoint, involving aspects on fact-wise editing, fact recall ability, and coherence in generation. We will make our benchmark publicly available.
arxiv.org
以上显示的是最相近的搜索结果。 查看全部搜索结果