2023
Learning Space-Time Semantic Correspondences , arXiv, 2023. [paper] [datesets] [project] |
|
Open-world Instance Segmentation: Top-down Learning with Bottom-up Supervision , , , , , arXiv, 2023. [paper] [project] |
|
MINOTAUR: Multi-task Video Grounding From Multimodal Queries , , , , , , , arXiv, 2023. [paper] [project] |
|
Relational Space-Time Query in Long-Form Videos , , , , , IEEE Computer Vision and Pattern Recognition (CVPR), 2023. (highlight: acceptance rate 2.5%). [paper] [datesets] [project] |
|
FLAVR: Flow-Agnostic Video Representations for Fast Frame Interpolation , , , IEEE Winter Conference on Applications of Computer Vision (WACV), 2023. (Best Paper Finalist: 12 out of 641 accepted papers). [paper] [demo] [project] |
2022
Open-World Instance Segmentation: Exploiting Pseudo Ground Truth From Learned Pairwise Affinity , , , , IEEE Computer Vision and Pattern Recognition (CVPR), 2022. [paper] [project] [code] |
|
Long-short Temporal Contrastive Learning of Video Transformers , , , IEEE Computer Vision and Pattern Recognition (CVPR), 2022. [paper] |
2021
Unidentified Video Objects: A Benchmark for Dense, Open-World Segmentation , , , International Conference on Computer Vision (ICCV), 2021. [paper] [video] [project] |
2020
Self-Supervised Learning by Cross-Modal Audio-Video Clustering. , , , , , . Neural Information Processing Systems (NeurIPS), 2020. (spotlight: acceptance rate 4.1%). [paper] [models] [project] |
|
What Makes Training Multi-modal Classification Networks Hard?. , , . IEEE Computer Vision and Pattern Recognition (CVPR), 2020. [paper] [code] |
Video Modeling with Correlation Networks. , , , . IEEE Computer Vision and Pattern Recognition (CVPR), 2020. [paper] [code] |
FASTER Recurrent Networks for Efficient Video Classification. , , , , . AAAI Conference on Artificial Intelligence (AAAI), 2020. [paper] [code] |
2019
Video Classification with Channel-Separated Convolutional Networks. , , , . International Conference on Computer Vision (ICCV), 2019. [paper] [code] |
|
SCSampler: Sampling Salient Clips from Video for Efficient Action Recognition. , , . International Conference on Computer Vision (ICCV), 2019. (oral: acceptance rate 4.3%). [paper] [code] |
|
DistInit: Learning Video Representations without a Single Labeled Video. , , , . International Conference on Computer Vision (ICCV), 2019. [paper] [code] |
|
Learning Temporal Pose Estimation from Sparsely-Labeled Videos. , , , , . Neural Information Processing Systems (NeurIPS), 2019. [paper] [code] |
|
Leveraging the Present to Anticipate the Future in Videos. , , , , , . IEEE Computer Vision and Pattern Recognition (CVPR) Precognition Workshop, 2019. (2nd place at CVPR'19 EPIC-KITCHEN Challenge). [paper] [code] |
|
Large-scale Weakly-Supervised Pre-training for Video Action Recognition. , , , , , . IEEE Computer Vision and Pattern Recognition (CVPR), 2019. (2nd place at CVPR'19 EPIC-KITCHEN Challenge). [paper] [code] |
2018
Cooperative Learning of Audio and Video Models from Self-Supervised Synchronization. , , . Neural Information Processing Systems (NeurIPS), 2018. [paper] [code] |
|
Scenes-Objects-Actions: A Multi-Task, Multi-Label Video Dataset. , , , , , , . European Conference on Computer Vision (ECCV), 2018. [paper] [code] |
|
A Closer Look at Spatiotemporal Convolutions for Action Recognition. , , , , , . IEEE Computer Vision and Pattern Recognition (CVPR), 2018. [paper] [code] |
|
Detect-and-Track: Efficient Pose Estimation in Videos. , , , , . IEEE Computer Vision and Pattern Recognition (CVPR), 2018. (1st place at ICCV'17 PoseTrack Challenge). [paper] [code] |
2017
Simple, Efficient and Effective Keypoint Tracking. , , , , , . International Conference on Computer Vision (ICCV) PoseTrack Workshop, 2017. [paper] [code] |
2016
Deep End2End Voxel2Voxel Prediction. , , , , . IEEE Computer Vision and Pattern Recognition (CVPR) DeepVision Workshop, 2016. [paper] [code] |
|
EXMOVES: Mid-level Features for Efficient Action Recognition and Video Analysis. , . International Journal on Computer Vision (IJCV), 2016. [paper] [code] |
2015
Learning Spatiotemporal Features with 3D Convolutional Networks. , , , , . International Conference on Computer Vision (ICCV), 2015. (the 3rd most cited paper of ICCV'15 link, link). [paper] [code] |
2014
EXMOVES: Classifier-based Features for Scalable Action Recognition. , . International Conference on Learning Representations (ICLR), 2014. [paper] [code] |
|
Video Event Detection: from Subvolume Localization to Spatio-Temporal Path Search. , , . IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2014. [paper] [code] |
2012 and before
Max-Margin Structured Output Regression for Spatio-Temporal Action Localization. , . Neural Information Processing Systems (NIPS), 2012. [paper] |
|
Optimal Spatio-Temporal Path Discovery for Video Event Detection. , . IEEE Computer Vision and Pattern Recognition (CVPR), 2011. [paper] [code] [data] |
|
Human Activity Recognition with Metric Learning. , . European Conference on Computer Vision (ECCV), 2008. [paper] [code] |