David Broneske


2025

pdf bib
AutoML Meets Hugging Face: Domain-Aware Pretrained Model Selection for Text Classification
Parisa Safikhani | David Broneske
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 4: Student Research Workshop)

The effectiveness of embedding methods is crucial for optimizing text classification performance in Automated Machine Learning (AutoML). However, selecting the most suitable pre-trained model for a given task remains challenging. This study introduces the Corpus-Driven Domain Mapping (CDDM) pipeline, which utilizes a domain-annotated corpus of pre-fine-tuned models from the Hugging Face Model Hub to improve model selection. Integrating these models into AutoML systems significantly boosts classification performance across multiple datasets compared to baseline methods. Despite some domain recognition inaccuracies, results demonstrate CDDM’s potential to enhance model selection, streamline AutoML workflows, and reduce computational costs.

pdf bib
VerbCraft: Morphologically-Aware Armenian Text Generation Using LLMs in Low-Resource Settings
Hayastan Avetisyan | David Broneske
Proceedings of the Third Workshop on Resources and Representations for Under-Resourced Languages and Domains (RESOURCEFUL-2025)

Understanding and generating morphologically complex verb forms is a critical challenge in Natural Language Processing (NLP), particularly for low-resource languages like Armenian. Armenian’s verb morphology encodes multiple layers of grammatical information, such as tense, aspect, mood, voice, person, and number, requiring nuanced computational modeling. We introduce VerbCraft, a novel neural model that integrates explicit morphological classifiers into the mBART-50 architecture. VerbCraft achieves a BLEU score of 0.4899 on test data, compared to the baseline’s 0.9975, reflecting its focus on prioritizing morphological precision over fluency. With over 99% accuracy in aspect and voice predictions and robust performance on rare and irregular verb forms, VerbCraft addresses data scarcity through synthetic data generation with human-in-the-loop validation. Beyond Armenian, it offers a scalable framework for morphologically rich, low-resource languages, paving the way for linguistically informed NLP systems and advancing language preservation efforts.

2023

pdf bib
Large Language Models and Low-Resource Languages: An Examination of Armenian NLP
Hayastan Avetisyan | David Broneske
Findings of the Association for Computational Linguistics: IJCNLP-AACL 2023 (Findings)

2021

pdf bib
Identifying and Understanding Game-Framing in Online News: BERT and Fine-Grained Linguistic Features
Hayastan Avetisyan | David Broneske
Proceedings of the 4th International Conference on Natural Language and Speech Processing (ICNLSP 2021)

OSZAR »