alt text  

Chaoyou Fu   alt text

Researcher & Assistant Professor & PhD Supervisor
Nanjing University

Google Scholar  |  GitHub
Email: alt text


Biography

I am now working at Nanjing University, with Prof. Tieniu Tan and Prof. Caifeng Shan, and leading NJU-MiG (Multimodal intelligence Group). My current research interests mainly focus on Multimodal LLM and LLM. Before that, I was a Senior Researcher at Tencent Youtu Lab, engaged in academic research and engineering landing works as a Technology & Project Leader, from 2022 to 2024. I obtained my Ph.D. degree from NLPR-CASIA in 2022, under the supervision of Prof. Tieniu Tan and Prof. Ran He.


傅朝友,南京大学智能科学与技术学院研究员、助理教授、博导,入选中国科协“青年人才托举工程”。2022年博士毕业于中科院自动化所谭铁牛、赫然老师团队。研究方向为多模态智能,谷歌学术累计引用5600余次,一作单篇引用破千次,开源项目累计获得2万余次GitHub Stars,代表性工作包括VITA多模态大模型系列(VITA-1.0/-1.5、Long-VITA、VITA-Audio、VITA-VLA、VITA-E),MME多模态评测基准系列(MME、Video-MME、MME-RealWorld)和Awesome-MLLM社区等。担任Pattern Recognition期刊编委、ICLR会议领域主席、CSIG青工委委员、CCF-AI和CCF-CV专委会执行委员。曾获中科院院长特别奖、IEEE Biometrics Council Best Doctoral Dissertation Award、世界人工智能大会云帆奖、小米青年学者-科技创新奖、北京市优秀博士学位论文、中科院优秀博士学位论文、CVPR 2023 Outstanding Reviewer。


We are looking for self-motivated PhD and Master candidates! If you are interested, please feel free to contact me. Meanwhile, I am open to any discussion or collaboration.


alt text

Selected Publications

alt text  

VITA-E: Natural Embodied Interaction with Concurrent Seeing, Hearing, Speaking, and Acting
Xiaoyu Liu, Chaoyou Fu [Corresponding Author], Chi Yan, Chu Wu, et al
arXiv 2025, Project

alt text  

VITA-VLA: Efficiently Teaching Vision-Language Models to Act via Action Expert Distillation
Shaoqi Dong, Chaoyou Fu [Corresponding Author], Haihan Gao, Yi-Fan Zhang, et al
arXiv 2025, Project

alt text  

VITA-Audio: Fast Interleaved Cross-Modal Token Generation for Efficient Large Speech-Language Model
Zuwei Long, Yunhang Shen, Chaoyou Fu [Corresponding Author], Heting Gao, et al
NeurIPS 2025, Code

alt text  

Long-VITA: Scaling Large Multi-modal Models to 1 Million Tokens with Leading Short-Context Accuracy
Yunhang Shen, Chaoyou Fu [Corresponding Author], Shaoqi Dong, Xiong Wang, Yi-Fan Zhang, et al
arXiv 2025, Code

alt text  

VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction
Chaoyou Fu [Project Leader], Haojia Lin, Xiong Wang, Yi-Fan Zhang, et al
NeurIPS 2025 [Spotlight], Code [2k+ Stars 🌟]

alt text  

VITA: Towards Open-Source Interactive Omni Multimodal LLM
Chaoyou Fu [Project Leader], Haojia Lin, Zuwei Long, Yunhang Shen, et al
arXiv 2024, Project

alt text  

MME-Survey: A Comprehensive Survey on Evaluation of Multimodal LLMs
Chaoyou Fu [Project Leader], Yi-Fan Zhang, Shukang Yin, Bo Li, et al
arXiv 2024, Project

alt text  

Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis
Chaoyou Fu [Project Leader], Yuhan Dai, Yongdong Luo, Lei Li, Shuhuai Ren, Renrui Zhang, et al
CVPR 2025, Project

alt text  

MME: A Comprehensive Evaluation Benchmark for Multimodal Large Language Models
Chaoyou Fu [Project Leader], Peixian Chen, Yunhang Shen, Yulei Qin, Mengdan Zhang, Xu Lin, et al
NeurIPS DB 2025 [Spotlight], Leaderboard [with 50+ MLLMs 🌟], Citation

alt text  

A Survey on Multimodal Large Language Models
Shukang Yin, Chaoyou Fu [Project Leader], Sirui Zhao, Ke Li, Xing Sun, Tong Xu, and Enhong Chen
National Science Review 2024, Project [10k+ Stars 🌟], Citation

alt text  

Woodpecker: Hallucination Correction for Multimodal Large Language Models
Shukang Yin, Chaoyou Fu [Project Leader], Sirui Zhao, Tong Xu, et al
arXiv 2023, Code [500+ Stars 🌟]

alt text  

APE: Aligning and Prompting Everything All at Once for Universal Visual Perception
Yunhang Shen, Chaoyou Fu, Peixian Chen, Mengdan Zhang, Ke Li, Xing Sun, Yunsheng Wu, et al
CVPR 2024, Code [500+ Stars 🌟]

alt text  

DVG-Face: Dual Variational Generation for Heterogeneous Face Recognition
Chaoyou Fu, Xiang Wu, Yibo Hu, Huaibo Huang, and Ran He
TPAMI 2022, Code

alt text  

Towards Lightweight Pixel-Wise Hallucination for Heterogeneous Face Recognition
Chaoyou Fu, Xiaoqiang Zhou, Weizan He, and Ran He
TPAMI 2023

alt text  

High Fidelity Face Manipulation with Extreme Poses and Expressions
Chaoyou Fu, Yibo Hu, Xiang Wu, Guoli Wang, Qian Zhang, and Ran He
TIFS 2021, Dataset

alt text  

CM-NAS: Cross-Modality Neural Architecture Search for Visible-Infrared Person Re-Identification
Chaoyou Fu, Yibo Hu, Xiang Wu, Hailin Shi, Tao Mei, and Ran He
ICCV 2021, Code

Academic Services

Honors and Awards