Attention機構を使った動画要約


本資料は2020年12月15日に社内共有資料として展開していたものを WEBページ向けにリニューアルした内容になります。




■Purpose

 

Purpose of this material

  • Explore a solution to the task of video summarization using attention.



■Agenda

 

Contents

●Introduction

  • Motivation

  • Contributions

●Dataset

●VASNet

  • Feature Extraction

  • Attention Network

  • Regressor Network

●Inference

  • Changepoint Detection

  • Kernel Temporal Segmentation

●Results

  • Measuring method

  • Dataset Results


■Introduction

 

Motivation

●Early video summarization methods were based on unsupervised methods,

leveraging low level spatio-temporal features and dimensionality reduction with clustering techniques.Success of these methods solely stands on the ability to define distance/cost functions between the keyshots/frames with respect to the original video.


●Current state of the art methods for video summarization are based on recurrent

encoder-decoder architectures, usually with bidirectional LSTM or GRU and soft

attention. They are computationally demanding, especially in the bi-directional configuration.



Contribution

  • A novel approach to sequence to sequence transformation for