本資料は2020年12月15日に社内共有資料として展開していたものを WEBページ向けにリニューアルした内容になります。
■Purpose
Purpose of this material
Explore a solution to the task of video summarization using attention.
■Agenda
Contents
●Introduction
Motivation
Contributions
●Dataset
●VASNet
Feature Extraction
Attention Network
Regressor Network
●Inference
Changepoint Detection
Kernel Temporal Segmentation
●Results
Measuring method
Dataset Results
■Introduction
Motivation
●Early video summarization methods were based on unsupervised methods,
leveraging low level spatio-temporal features and dimensionality reduction with clustering techniques.Success of these methods solely stands on the ability to define distance/cost functions between the keyshots/frames with respect to the original video.
●Current state of the art methods for video summarization are based on recurrent
encoder-decoder architectures, usually with bidirectional LSTM or GRU and soft
attention. They are computationally demanding, especially in the bi-directional configuration.
Contribution
A novel approach to sequence to sequence transformation for