Seamless Interaction: Dyadic Audiovisual Motion Modeling and Large-Scale Dataset
Technical Report
I am an AI Research Scientist at Meta, working on multimodal generative models and multimodal large language models. I obtained my PhD in AI at HKUST in 2025, advised by Prof. Qifeng Chen. I previously obtained my Bachelor's degree with highest honor from Computer Science program at Hongyi Honor College, Wuhan University (WHU) in China.
I was fortunate to have two Research Scientist Internships at Meta in 2025 and 2024, focusing on Social AI Agents and Multimodal Large Language Models (MLLMs). In Fall 2023, I visited Prof. Xiaolong Wang's group at UCSD. Before that, I was a Research Intern at the International Digital Economy Academy (IDEA), mentored by Dr. Yu Li, where I worked on digital humans and motion generation. Before joining HKUST, I was fortunate to work with Prof. Qi Dou, Prof. Junsong Yuan, and Prof. Tomizuka.
My recent research interests include:
Image and video generation, embodied motion generation, world models & world action models.
Pre-training and post-training of MLLMs, unified visual understanding and generation.
Self-improving AI Agent, AI Autoresearch Scientist Agent, Computer-Use Agent, Social Intelligent Agent, RL for Agents.
Developing realistic digital humans and humanoid robots with expressive physically-grounded motions, and embodied intelligence.
For a complete list, please visit my Google Scholar profile.
Technical Report
CVPR 2024 (Acceptance rate: 23.6%)
WACV 2023 (First round acceptance rate: 22.23%)
ACM Multimedia 2022 (Acceptance rate: 27.9%)
Medical Image Analysis 2021
Fall 2022, HKUST
Deep Learning in Computer Vision
Spring 2022, HKUST
Advanced Deep Learning Architectures
CVPR, ECCV, NeurIPS, Siggraph, AAAI, ACM Multimedia, and WACV
I have been playing the violin since 8, and I love singing also.
I love table tennis and served on the table tennis team of the CS department at WHU.