Publications

2023

sym Learning Space-Time Semantic Correspondences
Du Tran, Jitendra Malik
arXiv, 2023.
[paper] [datesets] [project]
sym Open-world Instance Segmentation: Top-down Learning with Bottom-up Supervision
Tarun Kalluri, Weiyao Wang, Heng Wang, Manmohan Chandraker, Lorenzo Torresani, Du Tran
arXiv, 2023.
[paper] [project]
sym MINOTAUR: Multi-task Video Grounding From Multimodal Queries
Raghav Goyal, Effrosyni Mavroudi, Xitong Yang, Sainbayar Sukhbaatar, Leonid Sigal, Matt Feiszli, Lorenzo Torresani, Du Tran
arXiv, 2023.
[paper] [project]
sym Relational Space-Time Query in Long-Form Videos
Xitong Yang, Fu-Jen Chu, Matt Feiszli, Raghav Goyal, Lorenzo Torresani, Du Tran
IEEE Computer Vision and Pattern Recognition (CVPR), 2023.
(highlight: acceptance rate 2.5%).
[paper] [datesets] [project]
sym FLAVR: Flow-Agnostic Video Representations for Fast Frame Interpolation
Tarun Kalluri, Deepak Pathak, Manmohan Chandraker, Du Tran
IEEE Winter Conference on Applications of Computer Vision (WACV), 2023.
(Best Paper Finalist: 12 out of 641 accepted papers).
[paper] [demo] [project]

2022

sym Open-World Instance Segmentation: Exploiting Pseudo Ground Truth From Learned Pairwise Affinity
Weiyao Wang, Matt Feiszli, Heng Wang, Jitendra Malik, Du Tran
IEEE Computer Vision and Pattern Recognition (CVPR), 2022.
[paper] [project] [code]
sym Long-short Temporal Contrastive Learning of Video Transformers
Jue Wang, Gedas Bertasius, Du Tran, Lorenzo Torresani
IEEE Computer Vision and Pattern Recognition (CVPR), 2022.
[paper]

2021

sym Unidentified Video Objects: A Benchmark for Dense, Open-World Segmentation
Weiyao Wang, Matt Feiszli, Heng Wang, Du Tran
International Conference on Computer Vision (ICCV), 2021.
[paper] [video] [project]

2020

sym Self-Supervised Learning by Cross-Modal Audio-Video Clustering.
Humam Alwassel, Dhruv Mahajan, Bruno Korbar, Lorenzo Torresani, Bernard Ghanem, Du Tran.
Neural Information Processing Systems (NeurIPS), 2020.
(spotlight: acceptance rate 4.1%).
[paper] [models] [project]
sym What Makes Training Multi-modal Classification Networks Hard?.
Weiyao Wang, Du Tran, Matt Feiszli.
IEEE Computer Vision and Pattern Recognition (CVPR), 2020.
[paper] [code]
sym Video Modeling with Correlation Networks.
Heng Wang, Du Tran, Lorenzo Torresani, Matt Feiszli.
IEEE Computer Vision and Pattern Recognition (CVPR), 2020.
[paper] [code]
sym FASTER Recurrent Networks for Efficient Video Classification.
Linchao Zhu, Laura Sevilla-Lara, Du Tran, Matt Feiszli, Heng Wang.
AAAI Conference on Artificial Intelligence (AAAI), 2020.
[paper] [code]

2019

sym Video Classification with Channel-Separated Convolutional Networks.
Du Tran, Heng Wang, Lorenzo Torresani, Matt Feiszli.
International Conference on Computer Vision (ICCV), 2019.
[paper] [code]
sym SCSampler: Sampling Salient Clips from Video for Efficient Action Recognition.
Bruno Korbar, Du Tran, Lorenzo Torresani.
International Conference on Computer Vision (ICCV), 2019.
(oral: acceptance rate 4.3%).
[paper] [code]
sym DistInit: Learning Video Representations without a Single Labeled Video.
Rohit Girdhar, Du Tran, Lorenzo Torresani, Deva Ramanan.
International Conference on Computer Vision (ICCV), 2019.
[paper] [code]
sym Learning Temporal Pose Estimation from Sparsely-Labeled Videos.
Gedas Bertasius, Christoph Feichtenhofer, Du Tran, Jianbo Shi, Lorenzo Torresani.
Neural Information Processing Systems (NeurIPS), 2019.
[paper] [code]
sym Leveraging the Present to Anticipate the Future in Videos.
Antoine Miech, Ivan Laptev, Josef Sivic, Heng Wang, Lorenzo Torresani, Du Tran.
IEEE Computer Vision and Pattern Recognition (CVPR) Precognition Workshop, 2019.
(2nd place at CVPR'19 EPIC-KITCHEN Challenge).
[paper] [code]
sym Large-scale Weakly-Supervised Pre-training for Video Action Recognition.
Deepti Ghadiyaram, Matt Feiszli, Du Tran, Xueting Yan, Heng Wang, Dhruv Mahajan.
IEEE Computer Vision and Pattern Recognition (CVPR), 2019.
(2nd place at CVPR'19 EPIC-KITCHEN Challenge).
[paper] [code]

2018

sym Cooperative Learning of Audio and Video Models from Self-Supervised Synchronization.
Bruno Korbar, Du Tran, Lorenzo Torresani.
Neural Information Processing Systems (NeurIPS), 2018.
[paper] [code]
sym Scenes-Objects-Actions: A Multi-Task, Multi-Label Video Dataset.
Jamie Ray, Heng Wang, Du Tran, Yufei Wang, Matt Feiszli, Lorenzo Torresani, Manohar Paluri.
European Conference on Computer Vision (ECCV), 2018.
[paper] [code]
sym A Closer Look at Spatiotemporal Convolutions for Action Recognition.
Du Tran, Heng Wang, Lorenzo Torresani, Jamie Ray, Yann LeCun, Manohar Paluri.
IEEE Computer Vision and Pattern Recognition (CVPR), 2018.
[paper] [code]
sym Detect-and-Track: Efficient Pose Estimation in Videos.
Rohit Girdhar, Georgia Gkioxari, Lorenzo Torresani, Manohar Paluri, Du Tran.
IEEE Computer Vision and Pattern Recognition (CVPR), 2018.
(1st place at ICCV'17 PoseTrack Challenge).
[paper] [code]

2017

sym Simple, Efficient and Effective Keypoint Tracking.
Rohit Girdhar, Georgia Gkioxari, Lorenzo Torresani, Deva Ramanan, Manohar Paluri, Du Tran.
International Conference on Computer Vision (ICCV) PoseTrack Workshop, 2017.
[paper] [code]

2016

sym Deep End2End Voxel2Voxel Prediction.
Du Tran, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, Manohar Paluri.
IEEE Computer Vision and Pattern Recognition (CVPR) DeepVision Workshop, 2016.
[paper] [code]
sym EXMOVES: Mid-level Features for Efficient Action Recognition and Video Analysis.
Du Tran, Lorenzo Torresani.
International Journal on Computer Vision (IJCV), 2016.
[paper] [code]

2015

sym Learning Spatiotemporal Features with 3D Convolutional Networks.
Du Tran, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, Manohar Paluri.
International Conference on Computer Vision (ICCV), 2015.
(the 3rd most cited paper of ICCV'15 link, link).
[paper] [code]

2014

sym EXMOVES: Classifier-based Features for Scalable Action Recognition.
Du Tran, Lorenzo Torresani.
International Conference on Learning Representations (ICLR), 2014.
[paper] [code]
sym Video Event Detection: from Subvolume Localization to Spatio-Temporal Path Search.
Du Tran, Junsong Yuan, David Forsyth.
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2014.
[paper] [code]

2012 and before

sym Max-Margin Structured Output Regression for Spatio-Temporal Action Localization.
Du Tran, Junsong Yuan.
Neural Information Processing Systems (NIPS), 2012.
[paper]
sym Optimal Spatio-Temporal Path Discovery for Video Event Detection.
Du Tran, Junsong Yuan.
IEEE Computer Vision and Pattern Recognition (CVPR), 2011.
[paper] [code] [data]
sym Human Activity Recognition with Metric Learning.
Du Tran, Alexander Sorokin.
European Conference on Computer Vision (ECCV), 2008.
[paper] [code]