Publications

ICML Does Reasoning Improve Seeing? Understanding When Vision-Language Models Benefit from Thinking
Jing Bi, Luchuan Song, Dingxin Zhang, Pinxin Liu, Guangyu Sun, Lianggong Bruce Wen, Weidong Cai, Chen Chen, and Chenliang Xu
International Conference on Machine Learning, 2026

ICML AdvEvo-MARL: Shaping Internalized Safety through Adversarial Co-Evolution in Multi-Agent Reinforcement Learning
Zhenyu Pan, Yiting Zhang, Zhuo Liu, Yolo Y. Tang, Zeliang Zhang, Haozheng Luo, Chenwei Xu, Yuwei Han, Jianshu Zhang, Dennis Wu, Hong-Yu Chen, Haoran Lu, Haoyang Fang, Manling Li, Chenliang Xu, Philip S. Yu, and Han Liu
International Conference on Machine Learning, 2026

ACL Findings DRIFT: Transferring Reasoning Priors for Efficient MLLM Fine-Tuning
Chao Huang, Zeliang Zhang, Jiang Liu, Ximeng Sun, Jialian Wu, Xiaodong Yu, Ze Wang, Chenliang Xu, Emad Barsoum, and Zicheng Liu
Findings of the Association for Computational Linguistics, 2026 [ project ]

DIS Tuning the Face: Modulating Facial Expressions for Realistic Self-Avatars in Virtual Reality
Yang Lu, Jiamu Tang, Jiankun Yang, Shijian Luo, Chenliang Xu, and Yukang Yan
ACM Conference on Designing Interactive Systems, 2026

CVPR Asynchronous Temporal Modeling with Two-Agent Framework for Streaming Dense Video Captioning
Yolo Yunlong Tang, Chao Huang, Susan Liang, Jing Bi, Yicheng Wang, Daiki Shimada, and Chenliang Xu
IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2026

CVPR TDMM-LM: Bridging Facial Understanding and Animation via Language Models
Luchuan Song, Pinxin Liu, Haiyang Liu, Zhenchao Jin, Yolo Yunlong Tang, Zichong Xu, Susan Liang, Jing Bi, Jason J. Corso, and Chenliang Xu
IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2026 [ project ]

CVPR Video-R4: Reinforcing Text-Rich Video Reasoning with Visual Rumination
Yolo Yunlong Tang, Daiki Shimada, Hang Hua, Chao Huang, Jing Bi, Rogerio Feris, and Chenliang Xu
IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2026 (Findings) [ project ]

CVPR When to Think and When to Look: Uncertainty-Guided Lookback
Jing Bi, Filippos Bellos, Junjia Guo, Yayuan Li, Chao Huang, Yolo Yunlong Tang, Luchuan Song, Susan Liang, Zhongfei Zhang, Jason J. Corso, and Chenliang Xu
IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2026

ICASSP Multimodal Room Impulse Response Generation Through Latent Rectified Flow Matching
Ali Vosoughi, Yongyi Zang, Qihui Yang, Nathan Paek, Randal Leistikow, and Chenliang Xu
IEEE International Conference on Acoustics, Speech and Signal Processing, 2026 [ project ]

ICLR SEMA: Simple yet Effective Learning for Multi-Turn Jailbreak Attacks
Mingqian Feng, Xiaodong Liu, Weiwei Yang, Jialin Song, Xuekai Zhu, Chenliang Xu, and Jianfeng Gao
International Conference on Learning Representations, 2026

ICLR Training Large Reasoning Models Efficiently via Progressive Thought Encoding
Zeliang Zhang, Xiaodong Liu, Hao Cheng, Hao Sun, Chenliang Xu, and Jianfeng Gao
International Conference on Learning Representations, 2026

AAAI Caption Anything in Video: Fine-grained Object-centric Captioning via Spatiotemporal Multimodal Prompting
Yunlong Tang, Jing Bi, Chao Huang, Susan Liang, Daiki Shimada, Hang Hua, Yunzhong Xiao, Yizhi Song, Pinxin Liu, Mingqian Feng, Junjia Guo, Zhuo Liu, Luchuan Song, Ali Vosoughi, Jinxi He, Liu He, Zeliang Zhang, Jiebo Luo, and Chenliang Xu
AAAI Conference on Artificial Intelligence, 2026 (Demo) [ project ]
★ Best Demo Award Runner-up

Digital Discovery OPENXRD: A Comprehensive Benchmark Framework for LLM/MLLM XRD Question Answering
Ali Vosoughi, Ayoub Shahnazari, Yufeng Xi, Zeliang Zhang, Griffin Hess, Chenliang Xu, and Niaz Abdolrahim
Digital Discovery, Royal Society of Chemistry, 2026

J. Appl. Cryst. Machine Learning Approaches for Crystallographic Classification from Synthetic 2D X-ray Diffraction Data
Ayoub Shahnazari, Zeliang Zhang, Sachith E. Dissanayake, Chenliang Xu, and Niaz Abdolrahim
Journal of Applied Crystallography, Vol. 59, No. 1, 2026

TMLR MMCOMPOSITION: Revisiting the Compositionality of Pre-trained Vision-Language Models
Hang Hua, Yunlong Tang, Ziyun Zeng, Liangliang Cao, Zhengyuan Yang, Hangfeng He, Chenliang Xu, and Jiebo Luo
Transactions on Machine Learning Research, 2026

IJCV High-Quality Sound Separation Across Diverse Categories via Visually-Guided Generative Modeling
Chao Huang, Susan Liang, Yapeng Tian, Anurag Kumar, and Chenliang Xu
International Journal of Computer Vision, 2026

Workshop $I^2$: Generating Instructional Illustrations via Text-Conditioned Diffusion
Jing Bi, Yolo Y. Tang, Pinxin Liu, Chao Huang, Ali Vosoughi, Jiarui Wu, Jinxi He, and Chenliang Xu
CVPR 2026 Workshop on Generative AI for Storytelling (AISTORY), 2026

Workshop What Happens Next? Memory-Driven Action Planning for Egocentric Storytelling Characters
Jing Bi, Yolo Y. Tang, Chao Huang, and Chenliang Xu
CVPR 2026 Workshop on Generative AI for Storytelling (AISTORY), 2026

2025

NeurIPS Harnessing the Computation Redundancy in ViTs to Boost Adversarial Transferability
Jiani Liu, Zhiyuan Wang, Zeliang Zhang, Chao Huang, Susan Liang, Yunlong Tang, and Chenliang Xu
Neural Information Processing Systems, 2025

NeurIPS ZeroSep: Separate Anything in Audio with Zero Training
Chao Huang, Yuesheng Ma, Junxuan Huang, Susan Liang, Yunlong Tang, Jing Bi, Wenqiang Liu, Nima Mesgarani, and Chenliang Xu
Neural Information Processing Systems, 2025 [ project ]

NeurIPS MMPerspective: Do MLLMs Understand Perspective? A Comprehensive Benchmark for Perspective Perception, Reasoning, and Robustness
Yunlong Tang, Pinxin Liu, Mingqian Feng, Zhangyun Tan, Rui Mao, Chao Huang, Jing Bi, Yunzhong Xiao, Susan Liang, Hang Hua, Ali Vosoughi, Luchuan Song, Zeliang Zhang, and Chenliang Xu
Neural Information Processing Systems, 2025 [ project ]

ICCV π-AVAS: Can Physics-Integrated Audio-Visual Modeling Boost Neural Acoustic Synthesis?
Susan Liang, Chao Huang, Yunlong Tang, Zeliang Zhang, and Chenliang Xu
IEEE/CVF International Conference on Computer Vision, 2025 [ project ]

ICCV GestureLSM: Latent Shortcut based Co-Speech Gesture Generation with Spatial-Temporal Modeling
Pinxin Liu, Luchuan Song, Junhua Huang, Haiyang Liu, and Chenliang Xu
IEEE/CVF International Conference on Computer Vision, 2025

ACL Findings Diversifying the Expert Knowledge for Task-Agnostic Pruning in Sparse Mixture-of-Experts
Zeliang Zhang, Xiaodong Liu, Hao Cheng, Chenliang Xu, and Jianfeng Gao
Findings of the Association for Computational Linguistics, 2025

SIGGRAPH StreamME: Simplify 3D Gaussian Avatar within Live Stream
Luchuan Song, Yang Zhou, Zhan Xu, Yi Zhou, Deepali Aneja, and Chenliang Xu
SIGGRAPH, 2025 [ project ]

ICML BinauralFlow: A Causal and Streamable Approach for High-Quality Binaural Speech Synthesis with Flow Matching Models
Susan Liang, Dejan Markovic, Israel D. Gebru, Steven Krenn, Todd Keebler, Jacob Sandakly, Frank Yu, Samuel Hassel, Chenliang Xu, and Alexander Richard
International Conference on Machine Learning, 2025 [ project ]

TCSVT Video Understanding with Large Language Models: A Survey
Yunlong Tang, Jing Bi, Siting Xu, Luchuan Song, Susan Liang, Teng Wang, Daoan Zhang, Jie An, Jingyang Lin, Rongyi Zhu, Ali Vosoughi, Chao Huang, Zeliang Zhang, Feng Zheng, Jianguo Zhang, Ping Luo, Jiebo Luo, and Chenliang Xu
IEEE Transactions on Circuits and Systems for Video Technology, 2025 [ project ]

CVPR Targeted Forgetting of Image Subgroups in CLIP Models
Zeliang Zhang, Gaowen Liu, Charles Fleming, Ramana Rao Kompella, and Chenliang Xu
IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

CVPR VidComposition: Can LLMs Analyze Compositions in Compiled Videos?
Yunlong Tang, Junjia Guo, Hang Hua, Susan Liang, Mingqian Feng, Xinyang Li, Rui Mao, Chao Huang, Jing Bi, Zeliang Zhang, Pooyan Fazli, and Chenliang Xu
IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025 [ project ]

CVPR Learning to Highlight Audio by Watching Movies
Chao Huang, Ruohan Gao, J. M. F. Tsang, Jan Kurcius, Cagdas Bilen, Chenliang Xu, Anurag Kumar, and Sanjeel Parekh
IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025 [ project ]

CVPR Unveiling Visual Perception in Language Models: An Attention Head Analysis Approach
Jing Bi, Lianggong Bruce Wen, Zhang Liu, Junjia Guo, Yunlong Tang, Bingjie Wang, and Chenliang Xu
IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

ICLR Rethinking Audio-Visual Adversarial Vulnerability from Temporal and Modality Perspectives
Zeliang Zhang, Susan Liang, Daiki Shimada, and Chenliang Xu
International Conference on Learning Representations, 2025

AAAI Empowering LLMs with Pseudo-Untrimmed Videos for Audio-Visual Temporal Understanding
Yunlong Tang, Daiki Shimada, Jing Bi, Mingqian Feng, Hang Hua, and Chenliang Xu
AAAI Conference on Artificial Intelligence, 2025

AAAI CaRDiff: Video Salient Object Ranking Chain of Thought Reasoning for Saliency Prediction with Diffusion
Yunlong Tang, Zhan Gen, Li Yang, Yiting Liao, and Chenliang Xu
AAAI Conference on Artificial Intelligence, 2025

AAAI V2Xum-LLM: Cross-Modal Video Summarization with Temporal Prompt Instruction Tuning
Hang Hua, Yunlong Tang, Chenliang Xu, and Jiebo Luo
AAAI Conference on Artificial Intelligence, 2025 [ project ]

3DV GaussianStyle: Gaussian Head Avatar via StyleGAN
Pinxin Liu, Luchuan Song, Daoan Zhang, Hang Hua, Yunlong Tang, Huaijin Tu, Jiebo Luo, and Chenliang Xu
International Conference on 3D Vision, 2025 [ project ]

Workshop Generative AI for Cel-Animation: A Survey
Yunlong Tang, Junjia Guo, Pinxin Liu, Zhiyuan Wang, Hang Hua, Jia-Xing Zhong, Yunzhong Xiao, Chao Huang, Luchuan Song, Susan Liang, Yizhi Song, Liu He, Jing Bi, Mingqian Feng, Xinyang Li, Zeliang Zhang, and Chenliang Xu
ICCV 2025 Workshop on Generative AI for Storytelling, 2025

2024

ACCV High-Quality Visually-Guided Sound Separation from Diverse Categories
Chao Huang, Susan Liang, Yapeng Tian, Anurag Kumar, and Chenliang Xu
Asian Conference on Computer Vision, 2024 [ project ]
★ Oral Presentation ★ Best Paper Award Runner-Up

ACCV Language-Guided Audio-Visual Editing via One-Shot Adaptation
Susan Liang, Chao Huang, Yapeng Tian, Anurag Kumar, and Chenliang Xu
Asian Conference on Computer Vision, 2024 [ project ]

SIGGRAPH Asia TextToon: Real-Time Text Toonify Head Avatar from Single Video
Luchuan Song, Lele Chen, Celong Liu, Pinxin Liu, and Chenliang Xu
SIGGRAPH Asia, 2024 [ project ]

EMNLP Findings Can CLIP Count Stars? An Empirical Study on Quantity Bias in CLIP
Zeliang Zhang, Zhuo Liu, Mingqian Feng, and Chenliang Xu
Findings of the Conferences on Empirical Methods in Natural Language Processing, 2024

ACM MM EAGLE: Egocentric AGgregated Language-video Engine
Jing Bi, Yunlong Tang, Luchuan Song, Ali Vosoughi, Nguyen Nguyen, and Chenliang Xu
ACM International Conference on Multimedia, 2024

ECCV Tri²-plane: Thinking Head Avatar via Feature Pyramid
Luchuan Song, Pinxin Liu, Lele Chen, Guojun Yin, and Chenliang Xu
European Conference on Computer Vision, 2024 [ project ]

ECCV Modeling and Driving Human Body Soundfields through Acoustic Primitives
Chao Huang, Dejan Markovic, Chenliang Xu, and Alexander Richard
European Conference on Computer Vision, 2024 [ project ]

NAACL Findings OSCaR: Object State Captioning and State Change Representation
Nguyen Nguyen, Jing Bi, Ali Vosoughi, Yapeng Tian, Pooyan Fazli, and Chenliang Xu
Findings of the Association for Computational Linguistics: NAACL 2024, 2024

CVPR Discover and Mitigate Multiple Biased Subgroups in Image Classifiers
Zeliang Zhang, Mingqian Feng, Zhiheng Li, and Chenliang Xu
IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024 [ project ]

CVPR Learning to Transform Dynamically for Better Adversarial Transferability
Rongyi Zhu, Zeliang Zhang, Susan Liang, Zhuo Liu, and Chenliang Xu
IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024 [ project ]

TMM Cross Modality Bias in Visual Question Answering: A Causal View with Possible Worlds VQA
Ali Vosoughi, Shijian Deng, Yapeng Tian, Chenliang Xu, and Jiebo Luo
IEEE Transactions on Multimedia, 2024

EACL Findings Random Smooth-based Certified Defense against Text Adversarial Attack
Zeliang Zhang, Wei Yao, Susan Liang, and Chenliang Xu
Findings of the Association for Computational Linguistics: EACL 2024, 2024

ICLR One Forward is Enough for Neural Network Training via Likelihood Ratio Method
Jinyang Jiang, Zeliang Zhang, Chenliang Xu, Zhaofei Yu, and Yijie Peng
International Conference on Learning Representations, 2024 [ code ]

ICASSP Adaptive Super Resolution for One-Shot Talking Head Generation
Luchuan Song, Pinxin Liu, Guojun Yin, and Chenliang Xu
IEEE International Conference on Acoustics, Speech and Signal Processing, 2024 [ project ]

ICASSP Learning Audio Concepts from Counterfactual Natural Language
Ali Vosoughi, Luca Bondi, Ho-Hsiang Wu, and Chenliang Xu
IEEE International Conference on Acoustics, Speech and Signal Processing, 2024 [ project ]

2023

npj Computational Materials Automated Classification of Big X-Ray Diffraction Data using Deep Learning Models
Jerardo E. Salgado, Samuel Lerman, Zhaotong Du, Chenliang Xu, and Niaz Abdolrahim
npj Computational Materials, 2023

NeurIPS AV-NeRF: Learning Neural Fields for Real-World Audio-Visual Scene Synthesis
Susan Liang, Chao Huang, Yapeng Tian, Anurag Kumar, and Chenliang Xu
Neural Information Processing Systems, 2023 [ project ]

ICCV Emotional Listener Portrait: Neural Listener Head Generation with Emotion
Luchuan Song, Guojun Yin, Zhenchao Jin, Xiaoyi Dong, and Chenliang Xu
IEEE/CVF International Conference on Computer Vision, 2023 [ video ]

UIST PEANUT: A Human-AI Collaborative Tool for Annotating Audio-Visual Data
Zheng Zhang, Zheng Ning, Chenliang Xu, Yapeng Tian, and Toby Jia-Jun Li
ACM Symposium on User Interface Software and Technology, 2023

CVPR A Whac-AWole Dilemma: Shortcuts Come in Multiples Where Mitigating One Amplifies Others
Zhiheng Li, Ivan Evtimov, Albert Gordo, Caner Hazirbas, Tal Hassner, Cristian C. Ferrer, Chenliang Xu, and Mark Ibrahim
IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023 [ project | video ]

CVPR Egocentric Audio-Visual Object Localization
Chao Huang, Yapeng Tian, Anurag Kumar, and Chenliang Xu
IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023 [ video ]

Workshop Neural Acoustic Context Field: Rendering Realistic Room Impulse Response With Neural Fields
Susan Liang, Chao Huang, Yapeng Tian, Anurag Kumar, and Chenliang Xu
ICCV 2023 Workshop on Audio-Visual Learning for 4D Scene Understanding, 2023

Workshop MISAR: A Multimodal Instructional System with Augmented Reality
Jing Bi, Nguyen Manh Nguyen, Ali Vosoughi, and Chenliang Xu
ICCV 2023 Workshop on Audio-Visual Learning for 4D Scene Understanding, 2023

Workshop Separating Invisible Sounds Toward Universal Audio-Visual Scene-Aware Sound Separation
Yiyang Su, Ali Vosoughi, Shijian Deng, Yapeng Tian, and Chenliang Xu
ICCV 2023 Workshop on Audio-Visual Learning for 4D Scene Understanding, 2023

Workshop Audio-Visual Action Prediction with Soft-Boundary in Egocentric Videos
Luchuan Song, Jing Bi, Chao Huang, and Chenliang Xu
CVPR 2023 Workshop on Sight and Sound, 2023

2022

ECCV Discover and Mitigate Unknown Biases with Debiasing Alternate Networks
Zhiheng Li, Anthony Hoogs, and Chenliang Xu
European Conference on Computer Vision, 2022 [ project | video ]

ACM MM Face Forgery Detection via Symmetric Transformer
Luchuan Song, Xiaodan Li, Zheng Fang, Zhenchao Jin, Yuefeng Chen, and Chenliang Xu
ACM International Conference on Multimedia, 2022 [ video | code ]

ICPR Cross-Modal Contrastive Distillation for Instructional Activity Anticipation
Zhengyuan Yang, Jinggen Liu, Jing Huang, Xiaodong He, Tao Mei, Chenliang Xu, and Jiebo Luo
International Conference on Pattern Recognition, 2022

CVPR StyleT2I: Toward Compositional and High-Fidelity Text-to-Image Synthesis
Zhiheng Li, Martin Renqiang Min, Kai Li, and Chenliang Xu
IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022 [ project | video ]

CVPR SpaceEdit: Learning a Unified Editing Space for Open-Domain Image Editing
Jing Shi, Ning Xu, Haitian Zheng, Alex Smith, Jiebo Luo, and Chenliang Xu
IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

CVPR Learning to Answer Questions in Dynamic Audio-Visual Scenarios
Guangyao Li, Yake Wei, Yapeng Tian, Chenliang Xu, Ji-Rong Wen, and Di Hu
IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022 [ project ]
★ Oral Presentation

CVPR Transformer-Empowered Multi-scale Contextual Matching and Aggregation for Multi-contrast MRI Super-resolution
Guangyuan Li, Jun Lv, Yapeng Tian, Dou Qi, Chengyan Wang, Chenliang Xu, and Jing Qin
IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

2021

ICCV Discover the Unknown Biased Attribute of an Image Classifier
Zhiheng Li and Chenliang Xu
IEEE/CVF International Conference on Computer Vision, 2021 [ project | video ]

ICCV Procedure Planning in Instructional Videos via Contextual Modeling and Model-based Policy Learning
Jing Bi, Jiebo Luo, and Chenliang Xu
IEEE/CVF International Conference on Computer Vision, 2021
★ Oral Presentation

ICCV Explaining Local, Global, And Higher-Order Interactions In Deep Learning
Samuel Lerman, Charles Venuto, Henry Kautz, and Chenliang Xu
IEEE/CVF International Conference on Computer Vision, 2021 [ code ]

ICCV A Simple Baseline for Weakly-Supervised Scene Graph Generation
Jing Shi, Yiwu Zhong, Ning Xu, Yin Li, and Chenliang Xu
IEEE/CVF International Conference on Computer Vision, 2021

ICCV Learning to Generate Scene Graph from Natural Language Supervision
Yiwu Zhong, Jing Shi, Jianwei Yang, Chenliang Xu, and Yin Li
IEEE/CVF International Conference on Computer Vision, 2021

CVPR Can audio-visual integration strengthen robustness under multimodal attacks?
Yapeng Tian and Chenliang Xu
IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021 [ supp | code ]

CVPR Cyclic Co-learning of Sounding Object Visual Grounding and Sound Separation
Yapeng Tian, Di Hu, and Chenliang Xu
IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021 [ supp | code ]

CVPR High-fidelity Face Tracking for AR/VR via Deep Lighting Adaptation
Lele Chen, Chen Cao, Fernando De la Torre, Jason Saragih, Chenliang Xu, and Yaser Sheikh
IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021 [ supp | project ]

CVPR Learning by Planning: Language-Guided Global Image Editing
Jing Shi, Ning Xu, Yihang Xu, Trung Bui, Franck Dernoncourt, and Chenliang Xu
IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021 [ supp | project ]

WACV Improve CAM with Auto-adapted Segmentation and Co-supervised Augmentation
Ziyi Kou, Guofeng Cui, Shaojie Wang, Wentian Zhao, and Chenliang Xu
IEEE/CVF Winter Conference on Applications of Computer Vision, 2021 [ supp ]

WACV How to Make a BLT Sandwich? Learning VQA towards Understanding Web Instructional Videos
Shaojie Wang, Wentian Zhao, Ziyi Kou, Jing Shi, and Chenliang Xu
IEEE/CVF Winter Conference on Applications of Computer Vision, 2021 [ data ]

2020

Chemical Science Graph Neural Network based Coarse-Grained Mapping
Zhiheng Li, Geemi P. Wellawatte, Maghesree Chakraborty, Heta A. Gandhi, Chenliang Xu, and Andrew D. White
Chemical Science, 2020

ACCV A Benchmark and Baseline for Language-Driven Image Editing
Jing Shi, Ning Xu, Trung Bui, Franck Dernoncourt, Zheng Wen, and Chenliang Xu
Asian Conference on Computer Vision, 2020 [ supp ]
★ Oral Presentation

ECCV Unified Multisensory Perception: Weakly-Supervised Audio-Visual Video Parsing
Yapeng Tian, Dingzeyu Li, and Chenliang Xu
European Conference on Computer Vision, 2020 [ supp | project ]
★ Spotlight Presentation (5%)

ECCV Talking-head Generation with Rhythmic Head Motion
Lele Chen, Guofeng Cui, Celong Liu, Zhong Li, Ziyi Kou, Yi Xu, and Chenliang Xu
European Conference on Computer Vision, 2020 [ supp | project ]

CVPR Deep Grouping Model for Unified Perceptual Parsing
Zhiheng Li, Wenxuan Bao, Jiayang Zheng, and Chenliang Xu
IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

CVPR Learning a Weakly-Supervised Video Actor-Action Segmentation Model with a Wise Selection
Jie Chen, Zhiheng Li, Jiebo Luo, and Chenliang Xu
IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020
★ Oral Presentation (5.7%)

CVPR Zooming Slow-Mo: Fast and Accurate One-Stage Space-Time Video Super-Resolution
Xiaoyu Xiang, Yapeng Tian, Yulun Zhang, Yun Fu, Jan Allebach, and Chenliang Xu
IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020 [ supp | code | demo ]

CVPR TDAN: Temporally-Deformable Alignment Network for Video Super-Resolution
Yapeng Tian, Yulun Zhang, Yun Fu, and Chenliang Xu
IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020 [ code | demo ]

ICASSP End-to-End Generation of Talking Faces from Noisy Speech
Sefik Emre Eskimez, Ross K. Maddox, Chenliang Xu, and Zhiyao Duan
International Conference on Acoustics, Speech, and Signal Processing, 2020

WACV TailorGAN: Making User-Defined Fashion Design
Lele Chen, Justin Tian, Guo Li, Cheng-Haw Wu, Erh-Kan King, Kuan-Ting Chen, Shao-Hang Hsieh, and Chenliang Xu
IEEE/CVF Winter Conference on Applications of Computer Vision, 2020 [ project ]

AAAI Learning from Interventions Using Hierarchical Policies for Safe Learning
Jing Bi, Vikas Dhiman, Tianyou Xiao, and Chenliang Xu
AAAI Conference on Artificial Intelligence, 2020
★ Oral Presentation (under 5%)

Workshop Deep Audio Prior: Learning Sound Source Separation from a Single Audio Mixture
Yapeng Tian, Chenliang Xu, and Dingzeyu Li
CVPR 2020 Workshop on Sight and Sound, 2020

Workshop What comprises a good talking-head video generation?
Lele Chen, Guofeng Cui, Ziyi Kou, Haitian Zheng, and Chenliang Xu
CVPR 2020 Workshop on Sight and Sound, 2020

2019

VRCAI 3D Human Avatar Digitization from a Single Image
Zhong Li, Lele Chen, Celong Liu, Yu Gao, Yuanzhou Ha, Chenliang Xu, Shuxue Quan, and Yi Xu
ACM SIGGRAPH International Conference on Virtual Reality Continuum and Its Applications in Industry, 2019
★ Best Paper Award

CVPR Hierarchical Cross-modal Talking Face Generation with Dynamic Pixel-wise Loss
Lele Chen, Ross K. Maddox, Zhiyao Duan, and Chenliang Xu
IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019 [ code ]

CVPR Not All Frames Are Equal: Weakly-Supervised Video Grounding with Contextual Similarity and Visual Clustering Losses
Jing Shi, Jia Xu, Boqing Gong, and Chenliang Xu
IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019 [ code ]

IJCAI GAN-EM: GAN based EM Learning Framework
Wentian Zhao, Shaojie Wang, Zhihuai Xie, Jing Shi, and Chenliang Xu
International Joint Conference on Artificial Intelligence, 2019 [ supp | | code ]

BMVC Dynamic Graph Modules for Modeling Object-Object Interactions in Activity Recognition
Hao Huang, Luowei Zhou, Wei Zhang, Jason J. Corso, and Chenliang Xu
British Machine Vision Conference, 2019 [ project ]

JOV Can Multisensory Training Aid Visual Learning?: A Computational Investigation
Robert A. Jacobs and Chenliang Xu
Journal of Vision, 2019

IJCV A Weakly Supervised Multi-Task Ranking Framework for Actor-Action Semantic Segmentation
Yan Yan, Chenliang Xu, Dawen Cai, and Jason J. Corso
International Journal of Computer Vision, 2019

TASLP Noise-resilient Training Method for Face Landmark Generation from Speech
Sefik Emre Eskimez, Ross K. Maddox, Chenliang Xu, and Zhiyao Duan
IEEE/ACM Transcations on Audio, Speech and Language Processing, 2019

TISMIR Online Audio-Visual Source Association for Chamber Music Performances
Bochen Li, Karthik Dinesh, Chenliang Xu, Gaurav Sharma, and Zhiyao Duan
Transcations of the International Society for Music Information Retrieval, 2019

Workshop Audio-Visual Interpretable and Controllable Video Captioning
Yapeng Tian, Chenxiao Guan, Justin Goodman, Marc Moore, and Chenliang Xu
CVPR 2019 Workshop on Sight and Sound, 2019

2018

ECCV Lip Movements Generation at a Glance
Lele Chen, Zhiheng Li, Ross K. Maddox, Zhiyao Duan, and Chenliang Xu
European Conference on Computer Vision, 2018 [ demo | code ]
★ The system is also an accepted Demo at ECCV'18.

ECCV Audio-Visual Event Localization in Unconstrained Videos
Yapeng Tian, Jing Shi, Bochen Li, Zhiyao Duan, and Chenliang Xu
European Conference on Computer Vision, 2018 [ project]

CVPR Weakly-Supervised Action Segmentation with Iterative Soft Boundary Assignment
Li Ding and Chenliang Xu
IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018

AAAI Towards Automatic Learning of Procedures from Web Instructional Videos
Luowei Zhou, Chenliang Xu, and Jason J. Corso
AAAI Conference on Artificial Intelligence, 2018 [ YouCookII ]
★ Oral Presentation

WACV Improving Text-based Person Search by Spatial Matching and Adaptive Threshold
Tianlang Chen, Chenliang Xu, and Jiebo Luo
IEEE Winter Conference on Applications of Computer Vision, 2018

LVA-ICA Generating Talking Face Landmarks from Speech
Sefik Emre Eskimez, Ross K. Maddox, Chenliang Xu, and Zhiyao Duan
International Conference on Latent Variable Analysis and Signal Separation, 2018 [ arXiv ]

JCP Encoding and Selecting Coarse-Grain Mapping Operators with Hierarchical Graphs
Maghesree Chakraborty, Chenliang Xu, and Andrew D. White
Journal of Chemical Physics, 2018 [ arXiv ]

SPIE MRI Tumor Segmentation with Densely Connected 3D CNN
Lele Chen, Yue Wu, Adora M. DSouza, Anas Z. Abidin, Axel Wismuller, and Chenliang Xu
SPIE Medical Imaging: Image Processing, 2018

2017

ACMMMW Deep Cross-Modal Audio-Visual Generation
Lele Chen, Sudhanshu Srivastava, Zhiyao Duan, and Chenliang Xu
ACM International Conference on Multimedia Thematic Workshops, 2017

ACMMMW Watch What You Just Said: Image Captioning with Text-Conditional Attention
Luowei Zhou, Chenliang Xu, Parker Koch, and Jason J. Corso
ACM International Conference on Multimedia Thematic Workshops, 2017

SMC Audio-Visual Source Association for String Ensembles through Multi-Modal Vibrato Analysis
Bochen Li, Chenliang Xu, and Zhiyao Duan
Sound and Music Computing, 2017
★ Best Paper Award

CVPR Weakly Supervised Actor-Action Segmentation via Robust Multi-Task Ranking
Yan Yan, Chenliang Xu, Dawen Cai, and Jason J. Corso
IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2017

TMM Dancelets Mining for Video Recommendation Based on Dance Styles
Tingting Han, Hongxun Yao, Chenliang Xu, Xiaoshuai Sun, Yanhao Zhang and Jason J. Corso
IEEE Transactions on Multimedia, 2016

CVPR Actor-Action Semantic Segmentation with Grouping Process Models
Chenliang Xu and Jason J. Corso
IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2016 [ video ]
Spotlight Presentation (9.7%)

IJCV LIBSVX: A Supervoxel Library and Benchmark for Early Video Processing
Chenliang Xu and Jason J. Corso
International Journal of Computer Vision, 2016 [ project ]

CVPR Can Humans Fly? Action Understanding with Multiple Classes of Actors
Chenliang Xu, Shao-Hang Hsieh, Caiming Xiong, and Jason J. Corso
IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2015 [ video | project ]

IJSC A Study of Actor and Action Semantic Retention in Video Supervoxel Segmentation
Chenliang Xu, Richard F. Doell, Stephen José Hanson, Catherine Hanson, and Jason J. Corso
International Journal of Semantic Computing, 2013

ICCV Flattening Supervoxel Hierarchies by the Uniform Entropy Slice
Chenliang Xu, Spencer Whitt, and Jason J. Corso
IEEE/CVF International Conference on Computer Vision, 2013 [ video | project ]

ICSC Are Actor and Action Semantics Retained in Video Supervoxel Segmentation?
Chenliang Xu, Richard F. Doell, Stephen José Hanson, Catherine Hanson, and Jason J. Corso
IEEE International Conference on Semantic Computing, 2013

CVPR A Thousand Frames in Just a Few Words: Lingual Description of Videos through Latent Topics and Sparse Object Stitching
Pradipto Das, Chenliang Xu, Richard F. Doell, and Jason J. Corso
IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2013 [ project ]

ECCV Streaming Hierarchical Video Segmentation
Chenliang Xu, Caiming Xiong, and Jason J. Corso
European Conference on Computer Vision, 2012 [ oral video | spotlight video | project ]
★ Oral Presentation (2.8%)

CVPR Evaluation of Super-Voxel Methods for Early Video Processing
Chenliang Xu and Jason J. Corso
IEEE Conference on Computer Vision and Pattern Recognition, 2012 [ project ]

Selected Papers

2026