alt text  

Chaoyou Fu   alt text

Researcher & Assistant Professor & PhD Supervisor
Nanjing University

Google Scholar  |  GitHub
Email: alt text


Biography

傅朝友,南京大学智能科学与技术学院研究员、助理教授、博导,入选中国科协“青年人才托举工程”。2022年博士毕业于中科院自动化所。研究方向为多模态智能,谷歌学术引用8500余次,两篇一作单篇引用过千次,六篇一作单篇引用过百次,开源项目累计获得2万余次GitHub Stars。代表性工作包括VITA多模态大模型系列(VITA-1.0/-1.5、Long-VITA、VITA-Audio),MME多模态评测基准系列(MME、Video-MME、MME-RealWorld)和Awesome-MLLM社区等。担任Pattern Recognition/IEEE T-BIOM期刊编委、ICLR/ICML会议领域主席、CSIG青工委委员、CCF-AI/-CV专委会执行委员。曾获小米青年学者-科技创新奖、华为紫金学者、世界人工智能大会云帆奖、中科院院长特别奖、IEEE Biometrics Council Best Doctoral Dissertation Award、北京市优博、中科院优博、CVPR 2023 Outstanding Reviewer。


alt text

Selected Publications

alt text  

Video-MME-v2: Towards the Next Stage in Benchmarks for Comprehensive Video Understanding
Chaoyou Fu [Project Leader], Haozhi Yuan, Yuhao Dong, Yi-Fan Zhang, et al
arXiv 2026, Project

alt text  

VITA-E: Natural Embodied Interaction with Concurrent Seeing, Hearing, Speaking, and Acting
Xiaoyu Liu, Chaoyou Fu [Corresponding Author], Chi Yan, Chu Wu, et al
arXiv 2025, Project

alt text  

VITA-Audio: Fast Interleaved Cross-Modal Token Generation for Efficient Large Speech-Language Model
Zuwei Long, Yunhang Shen, Chaoyou Fu [Corresponding Author], Heting Gao, et al
NeurIPS 2025, Code

alt text  

Long-VITA: Scaling Large Multi-modal Models to 1 Million Tokens with Leading Short-Context Accuracy
Yunhang Shen, Chaoyou Fu [Corresponding Author], Shaoqi Dong, Xiong Wang, Yi-Fan Zhang, et al
arXiv 2025, Code

alt text  

VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction
Chaoyou Fu [Project Leader], Haojia Lin, Xiong Wang, Yi-Fan Zhang, et al
NeurIPS 2025 [Spotlight], Code [2k+ Stars 🌟]

alt text  

VITA: Towards Open-Source Interactive Omni Multimodal LLM
Chaoyou Fu [Project Leader], Haojia Lin, Zuwei Long, Yunhang Shen, et al
arXiv 2024, Project

alt text  

MME-Survey: A Comprehensive Survey on Evaluation of Multimodal LLMs
Chaoyou Fu [Project Leader], Yi-Fan Zhang, Shukang Yin, Bo Li, et al
arXiv 2024, Project

alt text  

Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis
Chaoyou Fu [Project Leader], Yuhan Dai, Yongdong Luo, Lei Li, Shuhuai Ren, Renrui Zhang, et al
CVPR 2025, Project

alt text  

MME: A Comprehensive Evaluation Benchmark for Multimodal Large Language Models
Chaoyou Fu [Project Leader], Peixian Chen, Yunhang Shen, Yulei Qin, Mengdan Zhang, Xu Lin, et al
NeurIPS DB 2025 [Spotlight], Leaderboard [with 50+ MLLMs 🌟], Citation

alt text  

A Survey on Multimodal Large Language Models
Shukang Yin, Chaoyou Fu [Project Leader], Sirui Zhao, Ke Li, Xing Sun, Tong Xu, and Enhong Chen
National Science Review 2024, Project [10k+ Stars 🌟], Citation

alt text  

Woodpecker: Hallucination Correction for Multimodal Large Language Models
Shukang Yin, Chaoyou Fu [Project Leader], Sirui Zhao, Tong Xu, et al
arXiv 2023, Code [500+ Stars 🌟]

alt text  

APE: Aligning and Prompting Everything All at Once for Universal Visual Perception
Yunhang Shen, Chaoyou Fu, Peixian Chen, Mengdan Zhang, Ke Li, Xing Sun, Yunsheng Wu, et al
CVPR 2024, Code [500+ Stars 🌟]

alt text  

DVG-Face: Dual Variational Generation for Heterogeneous Face Recognition
Chaoyou Fu, Xiang Wu, Yibo Hu, Huaibo Huang, and Ran He
TPAMI 2022, Code

alt text  

CM-NAS: Cross-Modality Neural Architecture Search for Visible-Infrared Person Re-Identification
Chaoyou Fu, Yibo Hu, Xiang Wu, Hailin Shi, Tao Mei, and Ran He
ICCV 2021, Code

Academic Services

Honors and Awards