Xuanchang Zhang

I am Xuanchang Zhang (章轩畅), a graduate of Shanghai Jiao Tong University and a member of the ACM Honors Class. My research focuses on improving large language models, with particular emphasis on model evaluation and alignment.

During my undergraduate studies, I worked on prompt evaluation and optimization advised by Prof. Zhuosheng Zhang and Prof. Hai Zhao at SJTU, which was accepted by EMNLP 2024. More recently, I studied format biases in reinforcement learning from human feedback (RLHF) at the UIUC, advised by Prof. Tong Zhang. This work was accepted by ACL 2025.

News

Jun 25, 2025	Paper “From Lists to Emojis: How Format Bias Affects Model Alignment” has been accepted to ACL 2025!
Sep 20, 2024	Paper “GLaPE: Gold Label-agnostic Prompt Evaluation for Large Language Models” has been accepted to EMNLP 2024!

Publications

ACL

From Lists to Emojis: How Format Bias Affects Model Alignment

Xuanchang Zhang^*, Wei Xiong^*, Lichang Chen^*, and 3 more authors

In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Jul 2025

Abs DOI arXiv Code

In this paper, we study format biases in reinforcement learning from human feedback (RLHF). We observe that many widely-used preference models—including human evaluators, GPT-4, and top-ranking models on the RewardBench benchmark—exhibit strong biases towards specific format patterns, such as lists, links, bold text, and emojis. Furthermore, large language models (LLMs) can exploit these biases to achieve higher rankings on popular benchmarks like AlpacaEval and LMSYS Chatbot Arena. One notable example is verbosity bias, where current preference models favor longer responses that appear more comprehensive, even when their quality is equal to or lower than shorter responses. However, format biases beyond verbosity remain largely underexplored. In this work, we extend the study of biases in preference learning beyond the commonly recognized length bias, offering a comprehensive analysis of a wider range of format biases. Additionally, we show that with a small amount of biased data (less than 1%), we can inject significant bias into the reward model. Moreover, these format biases can also be easily exploited by downstream alignment algorithms, such as *best-of-n sampling* and online iterative *DPO*, as it is usually easier to manipulate the format than to improve the quality of responses. Our findings emphasize the need to disentangle format and content both for designing alignment algorithms and evaluating models.
EMNLP

GLaPE: Gold Label-agnostic Prompt Evaluation for Large Language Models

Xuanchang Zhang^*, Zhuosheng Zhang, and Hai Zhao

In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, Nov 2024

Abs DOI arXiv Code

Despite the rapid progress of large language models (LLMs), their task performance remains sensitive to prompt design. Recent studies have explored leveraging the LLM itself as an optimizer to identify optimal prompts that maximize task accuracy. However, when evaluating prompts, such approaches heavily rely on elusive manually annotated gold labels to calculate task accuracy for each candidate prompt, which hinders its generality. To overcome the limitation, this work proposes GLaPE, a gold label-agnostic prompt evaluation method to alleviate dependence on gold labels. GLaPE is composed of two critical aspects: self-consistency evaluation of a single prompt and mutual-consistency refinement across multiple prompts. Experimental results on 8 widely-recognized reasoning tasks demonstrate that GLaPE can produce more effective prompts, achieving performance comparable to those derived from manually annotated gold labels. Analysis shows that GLaPE provides reliable evaluations aligned with accuracy, even in the absence of gold labels. Code is publicly available at **Anonymous**.