CVPR 2018視頻分析論文關注
05-02
4/14 更新。
本文主要關注與視頻分析相關的論文,重點為:Spatial-Temporal feature, Temporal Reasoning, Relation Network, Representation for Spatial-Temporal feature。歡迎補充。
值得注意的是,今年視頻方面出現了許多討論/反思性質的文章。而針對Action Localization這塊,最大的問題還是標註不明確,時序間隔很難區分,也就是說數據集本身就不是很好,接下來應該會有相關的工作。
Video Tracking:
- End-to-end Flow Correlation Tracking with Spatial-temporal Attention
- A Twofold Siamese Network for Real-Time Object Tracking
- Learning Spatial-Temporal Regularized Correlation Filters for Visual Tracking
Video Captioning:
- Reconstruction Network for Video Captioning
Relation Network:
- Learning to Compare: Relation Network for Few-Shot Learning
- Relation Network for Object Detection
- Recurrent Residual Module for Fast Inference in Videos
- Iterative Visual Reasoning Beyond Convolutions (Feifei組)
- Referring Relationships (Feifei組)
Video Understanding:
- What Makes a Video a Video: Analyzing Temporal Information in Video Understanding Models and Datasets(spotlight,Feifei組)
- What have we learned from deep representations for action recognition?
- A Closer Look at Spatiotemporal Convolutions for Action Recognition
- Rethinking Spatiotemporal Feature Learning For Video Understanding
- On the Integration of Optical Flow and Action Recognition (很推薦的一篇文章, @林天威 寫了論文筆記)
- End-to-End Learning of Motion Representation for Video Understanding (Tencent AI Lab)
- Guess Where? Actor-Supervision for Spatiotemporal Action Localization
- A Unifying Contrast Maximization Framework for Event Cameras, with Applications to Motion, Depth, and Optical Flow Estimation
- Video Representation Learning Using Discriminative Pooling
- Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet?
- Fast End-to-End Trainable Guided Filter
- Density-aware Single Image De-raining using a Multi-stream Dense Network
Video Classification/Action Recognition:
- Non-local Neural Networks
- Appearance-and-Relation Networks for Video Classification
- Optical Flow Guided Feature: A Fast and Robust Motion Representation for Video Action Recognition
- Learning to Localize Sound Source in Visual Scenes
- Towards Universal Representation for Unseen Action Recognition
- Non-Linear Temporal Subspace Representations for Activity Recognition
- Fine-grained Activity Recognition in Baseball Videos (workshop)
- Learning Latent Super-Events to Detect Multiple Activities in Videos
Video Segmentation:
- Actor and Action Video Segmentation from a Sentence
- Dynamic Video Segmentation Network
- Low-Latency Video Semantic Segmentation (spotlight,Dahua Lin組)
- CNN in MRF: Video Object Segmentation via Inference in A CNN-Based Higher-Order Spatio-Temporal MRF
- Efficient Video Object Segmentation via Network Modulation
Video Question Answer:
- Motion-Appearance Co-Memory Networks for Video Question Answering 作者@高繼揚 ,有很多視頻方向的工作,有興趣的同學可以關注。
推薦閱讀:
※【人工智慧學習總結3】圖像的相似度衡量指標、二值化方法評估指標(二)
※KCF學習筆記 【目標跟蹤】
※機器視覺在不同行業的應用分析
※【論文筆記】Unlabeled Samples Generated by GAN Improve the Person Re-identification Baseline in
※【重磅】商湯科技 C 輪戰略融資 6 億美元,估值達45億美元成世界第一AI獨角獸!阿里領投
TAG:計算機視覺 |