avatar

Jiaming Han

be simple, be happy

Biography

I am a PhD student of MMLab@CUHK advised by Prof. Xiangyu Yue. Recently, I focus on efficient and unified multimodal LLMs, such as LLaMA-Adpater, OneLLM and Tar. I received my Master and Bachelor degree from Wuhan University and Central South University, respectively. I interned at Bytedance Seed, Tencent AI Lab, Shanghai AI Lab and Tencent YouTu Lab.

More about me: Email | Google Scholar | Github | Curriculum Vitae

News

  • 08/2025: Reflective Planning is accepted by CoRL 2025.
  • 02/2025: RAP is accepted by CVPR 2025.
  • 02/2024: OneLLM is accepted by CVPR 2024.
  • 01/2024: LLaMA-Adapter is accepted by ICLR 2024!.
  • 12/2023: We release OneLLM which aligns eight modalities to language using a unified framework.
  • 09/2023: ImageBind-LLM is released at arXiv.
  • 05/2023: We release ImageBind-LLM: a LLM connects Image, Video, Audio, Point Cloud and more! Check our demo.
  • 04/2023: We release multi-modal instruction model LLaMA-Adapter V2. Check our demo at OpenGVLab.
  • 03/2023: We release the paper and code of LLaMA-Adapter.
  • 11/2022: One paper on Few-Shot Object Detection is accepted by AAAI 2023.
  • 03/2022: We release the paper and code of OpenDet.
  • 03/2022: One paper on Open-Set Object Detection is accepted by CVPR 2022.
  • 02/2022: Our works S2A-Net and ReDet are included in OpenMMLab’s mmrotate.
  • 08/2021: Third-party implementation of S2A-Net with Jittor and PaddlePaddle.
  • 03/2021: We release the paper and code of ReDet.
  • 02/2021: One paper is accepted by CVPR 2021.

Publications

  1. Vision as a Dialect: Unifying Visual Understanding and Generation via Text-Aligned Representations
    Jiaming Han, Hao Chen, Yang Zhao, Hanyu Wang, Qi Zhao, Ziyan Yang, Hao He, Xiangyu Yue, Lu Jiang
    Project Page | Paper | Code | Models | Demo
  2. CrossLMM: Decoupling Long Video Sequences from LMMs via Dual Cross-Attention Mechanisms
    Shilin Yan, Jiaming Han, Joey Tsai, Hongwei Xue, Rongyao Fang, Lingyi Hong, Ziyu Guo, Ray Zhang
    Paper | Code
  3. Multimodal Long Video Modeling Based on Temporal Dynamic Context
    Haoran Hao*, Jiaming Han*, Yiyuan Zhang, Xiangyu Yue
    Project Page | Paper | Code
  4. Reflective Planning: Vision-Language Models for Multi-Stage Long-Horizon Robotic Manipulation
    Yunhai Feng, Jiaming Han, Zhuoran Yang, Xiangyu Yue, Sergey Levine, Jianlan Luo
    CoRL 2025 | Project Page | Paper | Code
  5. Retrieval-Augmented Personalization for Multimodal Large Language Models
    Haoran Hao*, Jiaming Han*, Changsheng Li, Yu-Feng Li, Xiangyu Yue
    CVPR 2025 | Project Page | Paper | Code
  6. OneLLM: One Framework to Align All Modalities with Language
    Jiaming Han, Kaixiong Gong, Yiyuan Zhang, Jiaqi Wang, Kaipeng Zhang
    Dahua Lin, Yu Qiao, Peng Gao, Xiangyu Yue.
    CVPR 2024 | Project Page | Paper | Code | Demo
  7. ImageBind-LLM: Multi-modality Instruction Tuning
    Jiaming Han*,Renrui Zhang*,Wenqi Shao*,Peng Gao*,Peng Xu*,Han Xiao*
    Kaipeng Zhang,Chris Liu,Song Wen,Ziyu Guo,Xudong Lu,Shuai Ren,Yafei Wen
    Xiaoxin Chen,Xiangyu Yue,Hongsheng Li,Yu Qiao.
    Paper | Code | Demo
  8. LLaMA-Adapter V2: Parameter-Efficient Visual Instruction Model
    Peng Gao*, Jiaming Han*, Renrui Zhang*, Ziyi Lin*, Shijie Geng, Aojun Zhou, Wei Zhang
    Pan Lu, Conghui He, Xiangyu Yue, Hongsheng Li, Yu Qiao.
    Paper | Code | Demo
  9. LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention
    Renrui Zhang*, Jiaming Han*, Aojun Zhou, Xiangfei Hu, Shilin Yan, Pan Lu, Hongsheng Li, Peng Gao, Yu Qiao.
    ICLR 2024 | Paper | Code | Demo
  10. Few-Shot Object Detection via Variational Feature Aggregation
    Jiaming Han, Yuqiang Ren, Jian Ding, Ke Yan, Gui-Song Xia.
    AAAI 2023 | Paper | Code
  11. Expanding Low-Density Latent Regions for Open-Set Object Detection
    Jiaming Han, Yuqiang Ren, Jian Ding, Xingjia Pan, Ke Yan, Gui-Song Xia.
    CVPR 2022 | Paper | Code
  12. ReDet: A Rotation-Equivariant Detector for Aerial Object Detection
    Jiaming Han, Jian Ding, Nan Xue, Gui-Song Xia.
    CVPR 2021 | Project | Paper | Code | Poster | Slide
  13. Align Deep Features for Oriented Object Detection
    Jiaming Han, Jian Ding, Jie Li, Gui-Song Xia.
    TGRS 2021 | Paper | Code

Experience

  • Bytedance Seed 06/2024 - Present
    Unified MLLM with Dr. Lu Jiang
  • Shanghai AI Lab 10/2022 - 02/2024
    Multimodal LLM with Dr. Peng Gao
  • Tencent YouTu Lab 05/2021 - 05/2022
    Object Detection with Dr. Yuqiang Ren

Activities

  • Reviewer of CVPR’21-24, ICCV’21-23, NIPS’23, ICLR’24, WACV’22, ECCV’22-24, AAAI’23-24
  • Reviewer of TPAMI, IJCV, TIP, ISPRS, TGRS, TNNLS

Educations

  • PhD. The Chinese University of Hong Kong. 09/2023 - Present
  • M.E. Wuhan University. 09/2019 - 06/2022
  • B.E. Central South University. 09/2015 - 06/2019