On Reference (In-)Determinacy in Natural Language Inference

Sihao Chen, Chaitanya Malaviya, Alex Fabrikant, Hagai Taitelbaum, Tal Schuster, Senaka Buthpitiya, Dan Roth


Abstract
We revisit the reference determinacy (RD) assumption in the task of natural language inference (NLI), i.e., the premise and hypothesis are assumed to refer to the same context when human raters annotate a label. While RD is a practical assumption for constructing a new NLI dataset, we observe that current NLI models—which are typically trained solely on hypothesis-premise pairs created with the RD assumption—fail in downstream applications such as fact verification, where the input premise and hypothesis may refer to different contexts. To highlight the impact of this phenomenon in real-world use cases, we introduce RefNLI, a diagnostic benchmark for identifying reference ambiguity in NLI examples. In RefNLI, the premise is retrieved from a knowledge source (i.e. Wikipedia) and does not necessarily refer to the same context as the hypothesis. With RefNLI, we demonstrate that finetuned NLI models and few-shot prompted LLMs both fail to recognize context mismatch, leading to >80% false contradiction and >50% entailment predictions. We discover that the existence of reference ambiguity in NLI examples can in part explain the inherent human disagreements in NLI, and provide insight into how the RD assumption impacts NLI dataset creation process.
Anthology ID:
2025.findings-naacl.450
Volume:
Findings of the Association for Computational Linguistics: NAACL 2025
Month:
April
Year:
2025
Address:
Albuquerque, New Mexico
Editors:
Luis Chiruzzo, Alan Ritter, Lu Wang
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
8066–8078
Language:
URL:
https://aclanthology.org/2025.findings-naacl.450/
DOI:
Bibkey:
Cite (ACL):
Sihao Chen, Chaitanya Malaviya, Alex Fabrikant, Hagai Taitelbaum, Tal Schuster, Senaka Buthpitiya, and Dan Roth. 2025. On Reference (In-)Determinacy in Natural Language Inference. In Findings of the Association for Computational Linguistics: NAACL 2025, pages 8066–8078, Albuquerque, New Mexico. Association for Computational Linguistics.
Cite (Informal):
On Reference (In-)Determinacy in Natural Language Inference (Chen et al., Findings 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.findings-naacl.450.pdf

OSZAR »