Bio
I am a final year CS Ph.D. student at Brown University working with Prof. Chen Sun. Previously, I also worked at Google and Meta as a research intern. I obtained my bachelor degree in software engineering at Tsinghua University in 2021.
My research interests involve building physically grounded, reasoning-capable vision-language models and exploring their effective integration into the physical world. Feel free to contact me for collaborations and casual chats.
I'm actively looking for industry full-time opportunities in 2026.
Education
- 09/2021 - NOW Ph.D. in Department of Computer Science, Brown University
- 08/2016 - 06/2021 B.S. in School of Software, Tsinghua University. (Outstanding Undergrad)
Selected Publications
-
Active Video Perception: Iterative Evidence Seeking for Agentic Long Video Understanding
[paper]
[website]
Ziyang Wang, Honglu Zhou, Shijie Wang, Junnan Li, Caiming Xiong, Silvio Savarese, Mohit Bansal, Michael S. Ryoo, Juan Carlos Niebles
CVPR Findings 2026
-
MotiF: Making Text Count in Image Animation with Motion Focal Loss
[paper]
[website]
[benchmark]
Shijie Wang, Samaneh Azadi, Rohit Girdhar, Sai Saketh Rambhatla, Chen Sun, and Xi Yin
CVPR 2025
-
How Can Objects Help Video-Language Understanding?
[paper]
Zitian Tang, Shijie Wang, Junho Cho, Jaewook Yoo, and Chen Sun
ICCV 2025
-
Learning Visual Grounding from Generative Vision and Language Model
[paper]
Shijie Wang, Dahun Kim, Ali Taalimi, Chen Sun, and Weicheng Kuo
WACV 2025
-
Vamos: Versatile Action Models for Video Understanding
[paper]
[website]
[code]
Shijie Wang, Qi Zhao, Minh Quan Do, Nakul Agarwal, Kwonjoon Lee, and Chen Sun
ECCV 2024
-
Do Pre-trained Vision-Language Models Encode Object States?
[paper]
Kaleb Newman, Shijie Wang, Yuan Zang, David Heffren, and Chen Sun
ECCV 2024 Workshop EVAL-FoMo
-
AntGPT: Can Large Language Models Help Long-term Action Anticipation from Videos?
[paper]
[website]
[code]
Qi Zhao*, Shijie Wang*, Ce Zhang, Changcheng Fu, Minh Quan Do, Nakul Agarwal, Kwonjoon Lee, and Chen Sun
ICLR 2024
-
Object-centric Video Representation for Long-term Action Anticipation
[paper]
[code]
Ce Zhang*, Changcheng Fu*, Shijie Wang, Nakul Agarwal, Kwonjoon Lee, Chiho Choi, and Chen Sun
WACV 2024
-
Goal-Conditioned Predictive Coding as an Implicit Planner for Offline Reinforcement Learning
[paper]
[website]
[code]
Zilai Zeng, Ce Zhang, Shijie Wang, and Chen Sun
NeurIPS 2023
-
Pose Recognition with Cascade Transformers
[paper]
[code]
Ke Li*, Shijie Wang*, Xiang Zhang*, Yifan Xu, Weijian Xu, and Zhuowen Tu
CVPR 2021
Experience
Awards
- 2022, 3rd Prize of Ego4D Object State Change Classification Challenge, ECCV 2022.
- 2021, Outstanding Undergrad Awards, Tsinghua University.
- 2018 & 2019 & 2020, Scholarship for Academic Excellence, Tsinghua University.
- 2019, First Prize in Student Research Training Program, Tsinghua University.
- 2019, Member of Tsinghua University Initiative Scientific Research Program (funding: 30,000¥).
- 2018, Champion of Yuehan Ma Campus Football Cup, Tsinghua University.
Service
Reviewer:
- IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)
- International Journal of Computer Vision (IJCV)
- The International Conference on Learning Representations (ICLR) 2024, 2025
- The International Conference on Machine Learning (ICML) 2024
- The Conference on Neural Information Processing Systems (NeurIPS) 2023, 2024, 2025
- The Conference on Computer Vision and Pattern Recognition (CVPR) 2022, 2023, 2025
- The International Conference on Computer Vision (ICCV) 2023
- The European Conference on Computer Vision (ECCV) 2022, 2024
- AAAI Conference on Artificial Intelligence (AAAI) 2023, 2024
- Winter Conference on Applications of Computer Vision (WACV) 2023, 2024