avatar

Jiaming Han

be simple, be happy

Biography

I am a PhD student of MMLab@CUHK advised by Prof. Xiangyu Yue. Recently, I focus on efficient and unified multimodal LLMs, such as LLaMA-Adpater, OneLLM and Tar. I received my Master and Bachelor degree from Wuhan University and Central South University, respectively. I interned at Bytedance Seed, Shanghai AI Lab and Tencent YouTu Lab.

News

  • 09/2025: Tar is accepted by NeurlPS 2025.
  • 08/2025: Reflective Planning is accepted by CoRL 2025.
  • 02/2025: RAP is accepted by CVPR 2025.
  • 02/2024: OneLLM is accepted by CVPR 2024.
  • 01/2024: LLaMA-Adapter is accepted by ICLR 2024!.
  • 12/2023: We release OneLLM which aligns eight modalities to language using a unified framework.
  • 09/2023: ImageBind-LLM is released at arXiv.
  • 05/2023: We release ImageBind-LLM: a LLM connects Image, Video, Audio, Point Cloud and more! Check our demo.
  • 04/2023: We release multi-modal instruction model LLaMA-Adapter V2. Check our demo at OpenGVLab.
  • 03/2023: We release the paper and code of LLaMA-Adapter.
  • 11/2022: One paper on Few-Shot Object Detection is accepted by AAAI 2023.
  • 03/2022: We release the paper and code of OpenDet.
  • 03/2022: One paper on Open-Set Object Detection is accepted by CVPR 2022.
  • 02/2022: Our works S2A-Net and ReDet are included in OpenMMLab's mmrotate.
  • 08/2021: Third-party implementation of S2A-Net with Jittor and PaddlePaddle.
  • 03/2021: We release the paper and code of ReDet.
  • 02/2021: One paper is accepted by CVPR 2021.

Selected Publications

Bridge: Growing Visual Generative Capacity for Pre-Trained MLLMs
Bridge: Growing Visual Generative Capacity for Pre-Trained MLLMs
Hanyu Wang*, Jiaming Han*, Ziyan Yang, Qi Zhao, Shanchuan Lin, Xiangyu Yue, Abhinav Shrivastava, Zhenheng Yang, Hao Chen
Vision as a Dialect: Unifying Visual Understanding and Generation via Text-Aligned Representations
Vision as a Dialect: Unifying Visual Understanding and Generation via Text-Aligned Representations
Jiaming Han, Hao Chen, Yang Zhao, Hanyu Wang, Qi Zhao, Ziyan Yang, Hao He, Xiangyu Yue, Lu Jiang
CrossLMM: Decoupling Long Video Sequences from LMMs via Dual Cross-Attention Mechanisms
CrossLMM: Decoupling Long Video Sequences from LMMs via Dual Cross-Attention Mechanisms
Shilin Yan, Jiaming Han, Joey Tsai, Hongwei Xue, Rongyao Fang, Lingyi Hong, Ziyu Guo, Ray Zhang
Multimodal Long Video Modeling Based on Temporal Dynamic Context
Multimodal Long Video Modeling Based on Temporal Dynamic Context
Haoran Hao*, Jiaming Han*, Yiyuan Zhang, Xiangyu Yue
Reflective Planning: Vision-Language Models for Multi-Stage Long-Horizon Robotic Manipulation
Reflective Planning: Vision-Language Models for Multi-Stage Long-Horizon Robotic Manipulation
Yunhai Feng, Jiaming Han, Zhuoran Yang, Xiangyu Yue, Sergey Levine, Jianlan Luo
Retrieval-Augmented Personalization for Multimodal Large Language Models
Retrieval-Augmented Personalization for Multimodal Large Language Models
Haoran Hao*, Jiaming Han*, Changsheng Li, Yu-Feng Li, Xiangyu Yue
OneLLM: One Framework to Align All Modalities with Language
OneLLM: One Framework to Align All Modalities with Language
Jiaming Han, Kaixiong Gong, Yiyuan Zhang, Jiaqi Wang, Kaipeng Zhang, Dahua Lin, Yu Qiao, Peng Gao, Xiangyu Yue.
ImageBind-LLM: Multi-modality Instruction Tuning
ImageBind-LLM: Multi-modality Instruction Tuning
Jiaming Han*, Renrui Zhang*, Wenqi Shao*, Peng Gao*, Peng Xu*, Han Xiao*, Kaipeng Zhang, Chris Liu, Song Wen, Ziyu Guo, Xudong Lu, Shuai Ren, Yafei Wen, Xiaoxin Chen, Xiangyu Yue, Hongsheng Li, Yu Qiao.
LLaMA-Adapter V2: Parameter-Efficient Visual Instruction Model
LLaMA-Adapter V2: Parameter-Efficient Visual Instruction Model
Peng Gao*, Jiaming Han*, Renrui Zhang*, Ziyi Lin*, Shijie Geng, Aojun Zhou, Wei Zhang, Pan Lu, Conghui He, Xiangyu Yue, Hongsheng Li, Yu Qiao.
LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention
LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention
Renrui Zhang*, Jiaming Han*, Aojun Zhou, Xiangfei Hu, Shilin Yan, Pan Lu, Hongsheng Li, Peng Gao, Yu Qiao.
Few-Shot Object Detection via Variational Feature Aggregation
Few-Shot Object Detection via Variational Feature Aggregation
Jiaming Han, Yuqiang Ren, Jian Ding, Ke Yan, Gui-Song Xia.
Expanding Low-Density Latent Regions for Open-Set Object Detection
Expanding Low-Density Latent Regions for Open-Set Object Detection
Jiaming Han, Yuqiang Ren, Jian Ding, Xingjia Pan, Ke Yan, Gui-Song Xia.
ReDet: A Rotation-Equivariant Detector for Aerial Object Detection
ReDet: A Rotation-Equivariant Detector for Aerial Object Detection
Jiaming Han, Jian Ding, Nan Xue, Gui-Song Xia.
Align Deep Features for Oriented Object Detection
Align Deep Features for Oriented Object Detection
Jiaming Han, Jian Ding, Jie Li, Gui-Song Xia.

Experience

06/2024 - Present
Bytedance Seed

Unified MLLM with Dr. Lu Jiang

10/2022 - 02/2024
Shanghai AI Lab

Multimodal LLM with Dr. Peng Gao

05/2021 - 05/2022
Tencent YouTu Lab

Object Detection with Dr. Yuqiang Ren

Educations

09/2023 - Present
PhD. The Chinese University of Hong Kong
09/2019 - 06/2022
M.E. Wuhan University
09/2015 - 06/2019
B.E. Central South University

Activities

  • Reviewer of CVPR’21-24, ICCV’21-23, NIPS’23, ICLR’24, WACV’22, ECCV’22-24, AAAI’23-24
  • Reviewer of TPAMI, IJCV, TIP, ISPRS, TGRS, TNNLS