alt text  

Chaoyou Fu   alt text

Researcher & Assistant Professor & PhD Supervisor
Nanjing University

Google Scholar  |  GitHub
Email: alt text


Biography

I am now working at Nanjing University, with Prof. Tieniu Tan and Prof. Caifeng Shan. Before that, I was a Senior Researcher at Tencent Youtu Lab, engaged in academic research and engineering landing works as a Technology & Project Leader, from 2022 to 2024. I obtained my Ph.D. degree from NLPR-CASIA in 2022, under the supervision of Prof. Ran He.


My current research interests mainly focus on Multimodal LLM, LLM, and biometrics.


We are looking for self-motivated PhD and Master candidates! If you are interested, please feel free to contact me. Meanwhile, I am open to any discussion or collaboration.

Selected Publications

alt text  

MM-RLHF: The Next Step Forward in Multimodal LLM Alignment
Yi-Fan Zhang, Tao Yu, Haochen Tian, Chaoyou Fu [Corresponding Author], Peiyan Li, et al
arXiv 2025, Code

alt text  

Long-VITA: Scaling Large Multi-modal Models to 1 Million Tokens with Leading Short-Context Accuracy
Yunhang Shen, Chaoyou Fu [Corresponding Author], Shaoqi Dong, Xiong Wang, Yi-Fan Zhang, et al
arXiv 2025, Code

alt text  

VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction
Chaoyou Fu [Project Leader], Haojia Lin, Xiong Wang, Yi-Fan Zhang, et al
arXiv 2025, Code [2k+ Stars 🌟]

alt text  

MME-Survey: A Comprehensive Survey on Evaluation of Multimodal LLMs
Chaoyou Fu [Project Leader], Yi-Fan Zhang, Shukang Yin, Bo Li, et al
arXiv 2024, Project

alt text  

VITA: Towards Open-Source Interactive Omni Multimodal LLM
Chaoyou Fu [Project Leader], Haojia Lin, Zuwei Long, Yunhang Shen, et al
arXiv 2024, Project

alt text  

Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis
Chaoyou Fu [Project Leader], Yuhan Dai, Yongdong Luo, Lei Li, Shuhuai Ren, Renrui Zhang, et al
CVPR 2025, Project

alt text  

MME: A Comprehensive Evaluation Benchmark for Multimodal Large Language Models
Chaoyou Fu [Project Leader], Peixian Chen, Yunhang Shen, Yulei Qin, Mengdan Zhang, Xu Lin, et al
arXiv 2023, Leaderboard [with 50+ MLLMs 🌟], Citation

alt text  

A Survey on Multimodal Large Language Models
Shukang Yin, Chaoyou Fu [Project Leader], Sirui Zhao, Ke Li, Xing Sun, Tong Xu, and Enhong Chen
National Science Review 2024, Project [10k+ Stars 🌟], Citation

alt text  

Woodpecker: Hallucination Correction for Multimodal Large Language Models
Shukang Yin, Chaoyou Fu [Project Leader], Sirui Zhao, Tong Xu, et al
arXiv 2023, Code [500+ Stars 🌟]

alt text  

A Challenger to GPT-4V? Early Explorations of Gemini in Visual Expertise
Chaoyou Fu [Project Leader], Renrui Zhang, Zihan Wang, Yubo Huang, Zhengye Zhang, et al
arXiv 2023, Project

alt text  

APE: Aligning and Prompting Everything All at Once for Universal Visual Perception
Yunhang Shen, Chaoyou Fu, Peixian Chen, Mengdan Zhang, Ke Li, Xing Sun, Yunsheng Wu, et al
CVPR 2024, Code [500+ Stars 🌟]

alt text  

DVG-Face: Dual Variational Generation for Heterogeneous Face Recognition
Chaoyou Fu, Xiang Wu, Yibo Hu, Huaibo Huang, and Ran He
TPAMI 2022, Code

alt text  

Towards Lightweight Pixel-Wise Hallucination for Heterogeneous Face Recognition
Chaoyou Fu, Xiaoqiang Zhou, Weizan He, and Ran He
TPAMI 2023

alt text  

High Fidelity Face Manipulation with Extreme Poses and Expressions
Chaoyou Fu, Yibo Hu, Xiang Wu, Guoli Wang, Qian Zhang, and Ran He
TIFS 2021, Dataset

alt text  

CM-NAS: Cross-Modality Neural Architecture Search for Visible-Infrared Person Re-Identification
Chaoyou Fu, Yibo Hu, Xiang Wu, Hailin Shi, Tao Mei, and Ran He
ICCV 2021, Code

Academic Services

Honors and Awards