Yuyang Jiang

江宇阳

MS student
Department of Statistics, University of Chicago
Email: yuyang2001@uchicago.edu

I'm currently a research intern (2026) at Vector Institute, focusing on agent safety audits. Before that, I earned my B.Econ. in Economics and Mathematics from the China Economics and Management Academy at CUFE (Beijing, China, 2023; Playground), where I studied modeling human behavior at both micro and macro levels. To deepen my expertise in data-driven methods, I later joined the Department of Statistics for an M.S. in Statistics at UChicago (Chicago, IL, 2025; Playground), where I have been strengthening my theoretical foundations in machine learning and deep learning.

In my spare time, I enjoy yoga 🧘‍♀️, tennis 🎾, arts 🖼️ (painting, visual symbolism, exhibitions, etc.), and traveling 🗺️.

Research Interest: AI Evaluation

Robust evaluation is essential for guiding the training of reliable systems. It not only measures system performance, but also seeks to understand system behavior and, most importantly, continuously refine alignment rubrics so they reflect rational human intentions and can be distilled into the evaluation process.

Static Evaluation: Design granular yet scalable metrics that capture richer task-specific properties and better match real design goals.
Human-in-the-Loop Evaluation: (1) Build representative feedback loops under limited budgets; (2) monitor bidirectional risks in human-AI interaction (e.g., humans: over-reliance, manipulation; AI: over-alignment, sycophancy) and develop collaboration paradigms that preserve rationality on both sides.
Interactive (Agentic) Evaluation: (1) Study the strengths and limits of foundational structures that emerge in agentic systems; (2) test the robustness of cooperative behaviors under adversarial conditions.

I'm especially interested in applying these ideas to AI safety and healthcare.

🤖 AI conference . 🩻 Healthcare conference . *Equal first co-authors; †Equal second co-authors.

Publications

Resources and Evaluation

CLEAR: A Clinically Grounded Tabular Framework for Radiology Report Evaluation
Yuyang Jiang, Chacha Chen, Shengyuan Wang, Feng Li, Zecong Tang, Benjamin M. Mervak, Lydia Chelala, Christopher M Straus, Reve Chahine, Samuel G. Armato III*, Chenhao Tan*.
EMNLP 2025 🤖, AAR 2026 (Poster) 🩻. [Paper] [Poster] [Slides] [Dataset submitted to PhysioNet (under review)] [Code]

GPT-4V Cannot Generate Radiology Reports Yet
Yuyang Jiang*, Chacha Chen*, Dang Nguyen, Benjamin M. Mervak, Chenhao Tan.
ML4H 2024 (Poster) 🩻, NAACL 2025 🤖. [Paper] [Poster] [Slides] [Code]

Safety and Alignment

Beyond One-Way Influence: Bidirectional Opinion Dynamics in Multi-Turn Human-LLM Interactions
Yuyang Jiang, Longjie Guo†, Yuchen Wu†, Aylin Caliskan, Tanu Mitra and Hua Shen.
In Revision, CHI 2026 BiAlign Workshop 🤖. [Preprint]

Collaborative Disagreement Resolution for Scalable Oversight
Yuyang Jiang*, Chacha Chen*, Teng Wu†, Liwen Sun†, Han Liu, Shi Feng and Chenhao Tan.
Submitted. [Manuscript] [Preprint coming in one week ❤️‍🔥]

Academic Service

Presentation: AAR 2026 🩻 (Poster), EMNLP 2025 🤖 (Poster), NAACL 2025 🤖 (Poster), ML4H 2024 🩻 (Poster), TTIC Multimodal AI Workshop 2024 🤖 (Lightning Talk)

Reviewer: CHIL 2026 🩻, Sage Digital Health 🩻, CHI 2026 🤖

Teaching Experience

BUSN 32200: Artificial Intelligence MBA Course (Course Design Team Member) | Winter 2025 | Instructor: Dacheng Xiu

BUSN 32810: Artificial Intelligence EMBA Course (Course Design Team Member) | Summer 2024 | Instructor: Dacheng Xiu

BUSN 20800: Big Data Undergraduate Course | Winter 2024 | Instructor: Dacheng Xiu