avatar

Jiaming Han

be simple, be happy

Biography

I am a PhD student of MMLab@CUHK advised by Prof. Xiangyu Yue. Recently, I focus on efficient and unified multimodal LLMs, such as LLaMA-Adpater, OneLLM and Tar. I received my Master and Bachelor degree from Wuhan University and Central South University, respectively. I interned at Bytedance Seed, Tencent AI Lab, Shanghai AI Lab and Tencent YouTu Lab.

More about me: Email | Google Scholar | Github | Curriculum Vitae

News

  • 02/2025: RAP is accepted by CVPR 2025.
  • 02/2024: OneLLM is accepted by CVPR 2024.
  • 01/2024: LLaMA-Adapter is accepted by ICLR 2024!.
  • 12/2023: We release OneLLM which aligns eight modalities to language using a unified framework.
  • 09/2023: ImageBind-LLM is released at arXiv.
  • 05/2023: We release ImageBind-LLM: a LLM connects Image, Video, Audio, Point Cloud and more! Check our demo.
  • 04/2023: We release multi-modal instruction model LLaMA-Adapter V2. Check our demo at OpenGVLab.
  • 03/2023: We release the paper and code of LLaMA-Adapter.
  • 11/2022: One paper on Few-Shot Object Detection is accepted by AAAI 2023.
  • 03/2022: We release the paper and code of OpenDet.
  • 03/2022: One paper on Open-Set Object Detection is accepted by CVPR 2022.
  • 02/2022: Our works S2A-Net and ReDet are included in OpenMMLab’s mmrotate.
  • 08/2021: Third-party implementation of S2A-Net with Jittor and PaddlePaddle.
  • 05/2021: Research intern in computer vision at Tencent YouTu Lab.
  • 03/2021: We release the paper and code of ReDet.
  • 02/2021: One paper is accepted by CVPR 2021.

Publications

  1. Vision as a Dialect: Unifying Visual Understanding and Generation via Text-Aligned Representations
    Jiaming Han, Hao Chen, Yang Zhao, Hanyu Wang, Qi Zhao, Ziyan Yang, Hao He, Xiangyu Yue, Lu Jiang
    Project Page | Paper | Code | Models | Demo

  2. CrossLMM: Decoupling Long Video Sequences from LMMs via Dual Cross-Attention Mechanisms
    Shilin Yan, Jiaming Han, Joey Tsai, Hongwei Xue, Rongyao Fang, Lingyi Hong, Ziyu Guo, Ray Zhang
    Paper | Code

  3. Multimodal Long Video Modeling Based on Temporal Dynamic Context
    Haoran Hao*, Jiaming Han*, Yiyuan Zhang, Xiangyu Yue
    Project Page | Paper | Code

  4. Reflective Planning: Vision-Language Models for Multi-Stage Long-Horizon Robotic Manipulation
    Yunhai Feng, Jiaming Han, Zhuoran Yang, Xiangyu Yue, Sergey Levine, Jianlan Luo
    Project Page | Paper | Code

  5. Retrieval-Augmented Personalization for Multimodal Large Language Models
    Haoran Hao*, Jiaming Han*, Changsheng Li, Yu-Feng Li, Xiangyu Yue
    Project Page | Paper | Code

  6. OneLLM: One Framework to Align All Modalities with Language
    Jiaming Han, Kaixiong Gong, Yiyuan Zhang, Jiaqi Wang, Kaipeng Zhang
    Dahua Lin, Yu Qiao, Peng Gao, Xiangyu Yue. CVPR 2024.
    Project Page | Paper | Code | Demo | Star: 450

  7. ImageBind-LLM: Multi-modality Instruction Tuning
    Jiaming Han*,Renrui Zhang*,Wenqi Shao*,Peng Gao*,Peng Xu*,Han Xiao*
    Kaipeng Zhang,Chris Liu,Song Wen,Ziyu Guo,Xudong Lu,Shuai Ren,Yafei Wen
    Xiaoxin Chen,Xiangyu Yue,Hongsheng Li,Yu Qiao. arXiv preprint arXiv:2309.03905
    Paper | Code | Demo | Star: 5.3K | Cite: 13

  8. LLaMA-Adapter V2: Parameter-Efficient Visual Instruction Model
    Peng Gao*, Jiaming Han*, Renrui Zhang*, Ziyi Lin*, Shijie Geng, Aojun Zhou, Wei Zhang
    Pan Lu, Conghui He, Xiangyu Yue, Hongsheng Li, Yu Qiao. arXiv preprint arXiv:2303.16199
    Paper | Code | Demo | Star: 5.3K | Cite: 144

  9. LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention
    Renrui Zhang*, Jiaming Han*, Aojun Zhou, Xiangfei Hu, Shilin Yan, Pan Lu
    Hongsheng Li, Peng Gao, Yu Qiao. ICLR 2024.
    Paper | Code | Demo | Star: 5.3K | Cite: 191

  10. Few-Shot Object Detection via Variational Feature Aggregation
    Jiaming Han, Yuqiang Ren, Jian Ding, Ke Yan, Gui-Song Xia.
    Thirty-Seventh AAAI Conference on Artificial Intelligence (AAAI), 2023.
    Paper | Code | Star: 60 | Oral Presentation

  11. Expanding Low-Density Latent Regions for Open-Set Object Detection
    Jiaming Han, Yuqiang Ren, Jian Ding, Xingjia Pan, Ke Yan, Gui-Song Xia.
    Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2022.
    Paper | Code | Star: 90 | Cite: 49

  12. ReDet: A Rotation-Equivariant Detector for Aerial Object Detection
    Jiaming Han, Jian Ding, Nan Xue, Gui-Song Xia.
    Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2021.
    Project | Paper | Code | Poster | Slide | Star: 366 | Cite: 374

  13. Align Deep Features for Oriented Object Detection
    Jiaming Han, Jian Ding, Jie Li, Gui-Song Xia.
    IEEE Trans. on Geoscience and Remote Sensing (TGRS), 2021.
    Paper | Code | Star: 356 | Cite: 443 | Top journal in Remote Sensing

Experience

  • Bytedance Seed 06/2024 - Present
    Mentor: Dr. Lu Jiang
    Unified MLLM for Understanding and Generation

  • Shanghai AI Lab 10/2022 - 02/2024
    Research Intern in Computer Vision
    Mentor: Dr. Peng Gao
    Vision-Language Models

  • Tencent YouTu Lab 05/2021 - 05/2022
    Research Intern in Computer Vision
    Open-set/Few-shot object detection

Activities

  • Reviewer of CVPR’21-24, ICCV’21-23, NIPS’23, ICLR’24, WACV’22, ECCV’22-24, AAAI’23-24
  • Reviewer of TPAMI, IJCV, TIP, ISPRS, TGRS, TNNLS

Educations

  • PhD. The Chinese University of Hong Kong. 09/2023 - Present
  • M.E. Wuhan University. 09/2019 - 06/2022
  • B.E. Central South University. 09/2015 - 06/2019

More

🍰My favorite things
👯‍♂️Friendship links