Yejin Cho


2025

pdf bib
The BiGGen Bench: A Principled Benchmark for Fine-grained Evaluation of Language Models with Language Models
Seungone Kim | Juyoung Suk | Ji Yong Cho | Shayne Longpre | Chaeeun Kim | Dongkeun Yoon | Guijin Son | Yejin Cho | Sheikh Shafayat | Jinheon Baek | Sue Hyun Park | Hyeonbin Hwang | Jinkyung Jo | Hyowon Cho | Haebin Shin | Seongyun Lee | Hanseok Oh | Noah Lee | Namgyu Ho | Se June Joo | Miyoung Ko | Yoonjoo Lee | Hyungjoo Chae | Jamin Shin | Joel Jang | Seonghyeon Ye | Bill Yuchen Lin | Sean Welleck | Graham Neubig | Moontae Lee | Kyungjae Lee | Minjoon Seo
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)

As language models (LMs) become capable of handling a wide range of tasks, their evaluation is becoming as challenging as their development. Most generation benchmarks currently assess LMs using abstract evaluation criteria-like helpfulness and harmlessness-which often lack the flexibility and granularity of human assessment. Additionally, these benchmarks tend to focus disproportionately on specific capabilities such as instruction following, leading to coverage bias. To overcome these limitations, we introduce the BiGGen Bench, a principled generation benchmark designed to thoroughly evaluate nine distinct capabilities of LMs across 77 diverse tasks. A key feature of the BiGGen Bench is its use of instance-specific evaluation criteria, closely mirroring the nuanced discernment of human evaluation. We apply this benchmark to assess 100 frontier LMs using five evaluator LMs. Our code, data, and evaluation results are all publicly available at https://github.com/prometheus-eval/prometheus-eval.

2023

pdf bib
Disentangling Structure and Style: Political Bias Detection in News by Inducing Document Hierarchy
Jiwoo Hong | Yejin Cho | Jiyoung Han | Jaemin Jung | James Thorne
Findings of the Association for Computational Linguistics: EMNLP 2023

We address an important gap in detecting political bias in news articles. Previous works that perform document classification can be influenced by the writing style of each news outlet, leading to overfitting and limited generalizability. Our approach overcomes this limitation by considering both the sentence-level semantics and the document-level rhetorical structure, resulting in a more robust and style-agnostic approach to detecting political bias in news articles. We introduce a novel multi-head hierarchical attention model that effectively encodes the structure of long documents through a diverse ensemble of attention heads. While journalism follows a formalized rhetorical structure, the writing style may vary by news outlet. We demonstrate that our method overcomes this domain dependency and outperforms previous approaches for robustness and accuracy. Further analysis and human evaluation demonstrate the ability of our model to capture common discourse structures in journalism.

2020

pdf bib
Leveraging WordNet Paths for Neural Hypernym Prediction
Yejin Cho | Juan Diego Rodriguez | Yifan Gao | Katrin Erk
Proceedings of the 28th International Conference on Computational Linguistics

We formulate the problem of hypernym prediction as a sequence generation task, where the sequences are taxonomy paths in WordNet. Our experiments with encoder-decoder models show that training to generate taxonomy paths can improve the performance of direct hypernym prediction. As a simple but powerful model, the hypo2path model achieves state-of-the-art performance, outperforming the best benchmark by 4.11 points in hit-at-one (H@1).
OSZAR »