本資料は2020年12月15日に社内共有資料として展開していたものを WEBページ向けにリニューアルした内容になります。



Purpose of this material

  • Explore a solution to the task of video summarization using attention.





  • Motivation

  • Contributions



  • Feature Extraction

  • Attention Network

  • Regressor Network


  • Changepoint Detection

  • Kernel Temporal Segmentation


  • Measuring method

  • Dataset Results




●Early video summarization methods were based on unsupervised methods,

leveraging low level spatio-temporal features and dimensionality reduction with clustering techniques.Success of these methods solely stands on the ability to define distance/cost functions between the keyshots/frames with respect to the original video.

●Current state of the art methods for video summarization are based on recurrent

encoder-decoder architectures, usually with bidirectional LSTM or GRU and soft

attention. They are computationally demanding, especially in the bi-directional configuration.


  • A novel approach to sequence to sequence transformation for