2025
pdf
bib
abs
The Impact of Visual Information in Chinese Characters: Evaluating Large Models’ Ability to Recognize and Utilize Radicals
Xiaofeng Wu
|
Karl Stratos
|
Wei Xu
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
The glyphic writing system of Chinese incorporates information-rich visual features in each character, such as radicals that provide hints about meaning or pronunciation. However, there has been no investigation into whether contemporary Large Language Models (LLMs) and Vision-Language Models (VLMs) can harness these sub-character features in Chinese through prompting. In this study, we establish a benchmark to evaluate LLMs’ and VLMs’ understanding of visual elements in Chinese characters, including radicals, composition structures, strokes, and stroke counts. Our results reveal that models surprisingly exhibit some, but still limited, knowledge of the visual information, regardless of whether images of characters are provided. To incite models’ ability to use radicals, we further experiment with incorporating radicals into the prompts for Chinese language processing (CLP) tasks. We observe consistent improvement in Part-Of-Speech tagging when providing additional information about radicals, suggesting the potential to enhance CLP by integrating sub-character information.
2016
pdf
bib
abs
Fast Gated Neural Domain Adaptation: Language Model as a Case Study
Jian Zhang
|
Xiaofeng Wu
|
Andy Way
|
Qun Liu
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers
Neural network training has been shown to be advantageous in many natural language processing applications, such as language modelling or machine translation. In this paper, we describe in detail a novel domain adaptation mechanism in neural network training. Instead of learning and adapting the neural network on millions of training sentences – which can be very time-consuming or even infeasible in some cases – we design a domain adaptation gating mechanism which can be used in recurrent neural networks and quickly learn the out-of-domain knowledge directly from the word vector representations with little speed overhead. In our experiments, we use the recurrent neural network language model (LM) as a case study. We show that the neural LM perplexity can be reduced by 7.395 and 12.011 using the proposed domain adaptation mechanism on the Penn Treebank and News data, respectively. Furthermore, we show that using the domain-adapted neural LM to re-rank the statistical machine translation n-best list on the French-to-English language pair can significantly improve translation quality.
pdf
bib
abs
ProphetMT: A Tree-based SMT-driven Controlled Language Authoring/Post-Editing Tool
Xiaofeng Wu
|
Jinhua Du
|
Qun Liu
|
Andy Way
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
This paper presents ProphetMT, a tree-based SMT-driven Controlled Language (CL) authoring and post-editing tool. ProphetMT employs the source-side rules in a translation model and provides them as auto-suggestions to users. Accordingly, one might say that users are writing in a Controlled Language that is understood by the computer. ProphetMT also allows users to easily attach structural information as they compose content. When a specific rule is selected, a partial translation is promptly generated on-the-fly with the help of the structural information. Our experiments conducted on English-to-Chinese show that our proposed ProphetMT system can not only better regularise an author’s writing behaviour, but also significantly improve translation fluency which is vital to reduce the post-editing time. Additionally, when the writing and translation process is over, ProphetMT can provide an effective colour scheme to further improve the productivity of post-editors by explicitly featuring the relations between the source and target rules.
2015
pdf
bib
CASICT-DCU Participation in WMT2015 Metrics Task
Hui Yu
|
Qingsong Ma
|
Xiaofeng Wu
|
Qun Liu
Proceedings of the Tenth Workshop on Statistical Machine Translation
2014
pdf
bib
RED: A Reference Dependency Based MT Evaluation Metric
Hui Yu
|
Xiaofeng Wu
|
Jun Xie
|
Wenbin Jiang
|
Qun Liu
|
Shouxun Lin
Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers
pdf
bib
The DCU-ICTCAS MT system at WMT 2014 on German-English Translation Task
Liangyou Li
|
Xiaofeng Wu
|
Santiago Cortés Vaíllo
|
Jun Xie
|
Andy Way
|
Qun Liu
Proceedings of the Ninth Workshop on Statistical Machine Translation
pdf
bib
DCU-Lingo24 Participation in WMT 2014 Hindi-English Translation task
Xiaofeng Wu
|
Rejwanul Haque
|
Tsuyoshi Okita
|
Piyush Arora
|
Andy Way
|
Qun Liu
Proceedings of the Ninth Workshop on Statistical Machine Translation
pdf
bib
RED, The DCU-CASICT Submission of Metrics Tasks
Xiaofeng Wu
|
Hui Yu
|
Qun Liu
Proceedings of the Ninth Workshop on Statistical Machine Translation
2013
pdf
bib
The CNGL-DCU-Prompsit Translation Systems for WMT13
Raphael Rubino
|
Antonio Toral
|
Santiago Cortés Vaíllo
|
Jun Xie
|
Xiaofeng Wu
|
Stephen Doherty
|
Qun Liu
Proceedings of the Eighth Workshop on Statistical Machine Translation
pdf
bib
DCU Participation in WMT2013 Metrics Task
Xiaofeng Wu
|
Hui Yu
|
Qun Liu
Proceedings of the Eighth Workshop on Statistical Machine Translation
2012
pdf
bib
System Combination with Extra Alignment Information
Xiaofeng Wu
|
Tsuyoshi Okita
|
Josef van Genabith
|
Qun Liu
Proceedings of the Second Workshop on Applying Machine Learning Techniques to Optimise the Division of Labour in Hybrid MT
2008
pdf
bib
A New Approach to Automatic Document Summarization
Xiaofeng Wu
|
Chengqing Zong
Proceedings of the Third International Joint Conference on Natural Language Processing: Volume-I