Publications

2025

SEAL: Semantic Attention Learning for Long Video Representation
Lan Wang, Yujia Chen, Du Tran, Vishnu Boddeti, Wen-Sheng Chu
IEEE Computer Vision and Pattern Recognition (CVPR), 2025.
(oral: acceptance rate 0.7% [96 / 13008], top 3.3% of accepted papers [96 / 2878]).
[paper] [code] [project]

2024

Open-world Instance Segmentation: Top-down Learning with Bottom-up Supervision
Tarun Kalluri, Weiyao Wang, Heng Wang, Manmohan Chandraker, Lorenzo Torresani, Du Tran
CVPR 2024 L3D-IVU Workshop, 2024.
[paper] [project]

2023

	Learning Space-Time Semantic Correspondences Du Tran, Jitendra Malik arXiv, 2023. [paper] [datesets] [project]
	MINOTAUR: Multi-task Video Grounding From Multimodal Queries Raghav Goyal, Effrosyni Mavroudi, Xitong Yang, Sainbayar Sukhbaatar, Leonid Sigal, Matt Feiszli, Lorenzo Torresani, Du Tran arXiv, 2023. [paper] [project]
	Relational Space-Time Query in Long-Form Videos Xitong Yang, Fu-Jen Chu, Matt Feiszli, Raghav Goyal, Lorenzo Torresani, Du Tran IEEE Computer Vision and Pattern Recognition (CVPR), 2023. (highlight: acceptance rate 2.5%). [paper] [datesets] [project]
	FLAVR: Flow-Agnostic Video Representations for Fast Frame Interpolation Tarun Kalluri, Deepak Pathak, Manmohan Chandraker, Du Tran IEEE Winter Conference on Applications of Computer Vision (WACV), 2023. (Best Paper Finalist: 12 out of 641 accepted papers). [paper] [demo] [project]

2022

	Open-World Instance Segmentation: Exploiting Pseudo Ground Truth From Learned Pairwise Affinity Weiyao Wang, Matt Feiszli, Heng Wang, Jitendra Malik, Du Tran IEEE Computer Vision and Pattern Recognition (CVPR), 2022. [paper] [project] [code]
	Long-short Temporal Contrastive Learning of Video Transformers Jue Wang, Gedas Bertasius, Du Tran, Lorenzo Torresani IEEE Computer Vision and Pattern Recognition (CVPR), 2022. [paper]

2021

Unidentified Video Objects: A Benchmark for Dense, Open-World Segmentation
Weiyao Wang, Matt Feiszli, Heng Wang, Du Tran
International Conference on Computer Vision (ICCV), 2021.
[paper] [video] [project]

2020

	Self-Supervised Learning by Cross-Modal Audio-Video Clustering. Humam Alwassel, Dhruv Mahajan, Bruno Korbar, Lorenzo Torresani, Bernard Ghanem, Du Tran. Neural Information Processing Systems (NeurIPS), 2020. (spotlight: acceptance rate 4.1%). [paper] [models] [project]
	What Makes Training Multi-modal Classification Networks Hard?. Weiyao Wang, Du Tran, Matt Feiszli. IEEE Computer Vision and Pattern Recognition (CVPR), 2020. [paper] [code]
	Video Modeling with Correlation Networks. Heng Wang, Du Tran, Lorenzo Torresani, Matt Feiszli. IEEE Computer Vision and Pattern Recognition (CVPR), 2020. [paper] [code]
	FASTER Recurrent Networks for Efficient Video Classification. Linchao Zhu, Laura Sevilla-Lara, Du Tran, Matt Feiszli, Heng Wang. AAAI Conference on Artificial Intelligence (AAAI), 2020. [paper] [code]

2019

	Video Classification with Channel-Separated Convolutional Networks. Du Tran, Heng Wang, Lorenzo Torresani, Matt Feiszli. International Conference on Computer Vision (ICCV), 2019. [paper] [code]
	SCSampler: Sampling Salient Clips from Video for Efficient Action Recognition. Bruno Korbar, Du Tran, Lorenzo Torresani. International Conference on Computer Vision (ICCV), 2019. (oral: acceptance rate 4.3%). [paper] [code]
	DistInit: Learning Video Representations without a Single Labeled Video. Rohit Girdhar, Du Tran, Lorenzo Torresani, Deva Ramanan. International Conference on Computer Vision (ICCV), 2019. [paper] [code]
	Learning Temporal Pose Estimation from Sparsely-Labeled Videos. Gedas Bertasius, Christoph Feichtenhofer, Du Tran, Jianbo Shi, Lorenzo Torresani. Neural Information Processing Systems (NeurIPS), 2019. [paper] [code]
	Leveraging the Present to Anticipate the Future in Videos. Antoine Miech, Ivan Laptev, Josef Sivic, Heng Wang, Lorenzo Torresani, Du Tran. IEEE Computer Vision and Pattern Recognition (CVPR) Precognition Workshop, 2019. (2nd place at CVPR'19 EPIC-KITCHEN Challenge). [paper] [code]
	Large-scale Weakly-Supervised Pre-training for Video Action Recognition. Deepti Ghadiyaram, Matt Feiszli, Du Tran, Xueting Yan, Heng Wang, Dhruv Mahajan. IEEE Computer Vision and Pattern Recognition (CVPR), 2019. (2nd place at CVPR'19 EPIC-KITCHEN Challenge). [paper] [code]

2018

	Cooperative Learning of Audio and Video Models from Self-Supervised Synchronization. Bruno Korbar, Du Tran, Lorenzo Torresani. Neural Information Processing Systems (NeurIPS), 2018. [paper] [code]
	Scenes-Objects-Actions: A Multi-Task, Multi-Label Video Dataset. Jamie Ray, Heng Wang, Du Tran, Yufei Wang, Matt Feiszli, Lorenzo Torresani, Manohar Paluri. European Conference on Computer Vision (ECCV), 2018. [paper] [code]
	A Closer Look at Spatiotemporal Convolutions for Action Recognition. Du Tran, Heng Wang, Lorenzo Torresani, Jamie Ray, Yann LeCun, Manohar Paluri. IEEE Computer Vision and Pattern Recognition (CVPR), 2018. [paper] [code]
	Detect-and-Track: Efficient Pose Estimation in Videos. Rohit Girdhar, Georgia Gkioxari, Lorenzo Torresani, Manohar Paluri, Du Tran. IEEE Computer Vision and Pattern Recognition (CVPR), 2018. (1st place at ICCV'17 PoseTrack Challenge). [paper] [code]

2017

Simple, Efficient and Effective Keypoint Tracking.
Rohit Girdhar, Georgia Gkioxari, Lorenzo Torresani, Deva Ramanan, Manohar Paluri, Du Tran.
International Conference on Computer Vision (ICCV) PoseTrack Workshop, 2017.
[paper] [code]

2016

	Deep End2End Voxel2Voxel Prediction. Du Tran, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, Manohar Paluri. IEEE Computer Vision and Pattern Recognition (CVPR) DeepVision Workshop, 2016. [paper] [code]
	EXMOVES: Mid-level Features for Efficient Action Recognition and Video Analysis. Du Tran, Lorenzo Torresani. International Journal on Computer Vision (IJCV), 2016. [paper] [code]

2015

Learning Spatiotemporal Features with 3D Convolutional Networks.
Du Tran, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, Manohar Paluri.
International Conference on Computer Vision (ICCV), 2015.
(the 3rd most cited paper of ICCV'15 link, link).
[paper] [code]

2014

	EXMOVES: Classifier-based Features for Scalable Action Recognition. Du Tran, Lorenzo Torresani. International Conference on Learning Representations (ICLR), 2014. [paper] [code]
	Video Event Detection: from Subvolume Localization to Spatio-Temporal Path Search. Du Tran, Junsong Yuan, David Forsyth. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2014. [paper] [code]

2012 and before

	Max-Margin Structured Output Regression for Spatio-Temporal Action Localization. Du Tran, Junsong Yuan. Neural Information Processing Systems (NIPS), 2012. [paper]
	Optimal Spatio-Temporal Path Discovery for Video Event Detection. Du Tran, Junsong Yuan. IEEE Computer Vision and Pattern Recognition (CVPR), 2011. [paper] [code] [data]
	Human Activity Recognition with Metric Learning. Du Tran, Alexander Sorokin. European Conference on Computer Vision (ECCV), 2008. [paper] [code]