YOLACT リアルタイムインスタンスセグメンテーション


本資料は2020年10月29日に社内共有資料として展開していたものを WEBページ向けにリニューアルした内容になります。



 

From MSeg (paper)



■Instance Segmentation

 

“Instance segmentation is the task of detecting and delineating each distinct object of interest appearing in an image” -- source


➔ Sub-task of:

◆ “Object Detection”

◆ “Semantic Segmentation”


➔Improvements in baselines (R-CNN, FCN) for the “parent” tasks do not automatically apply to the “daughter” task


➔ Typically combines:

◆ detection of boxes for all objects

◆ segmentation of pixels



■Methodology

 

➔ Based on the Mask R-CNN model:

◆Approach is “detect” and THEN “segment”: two-steps

◆A Region-based CNN (Faster R-CNN)

outputs class labels and bounding-box offset for each candidate

  • Start with a Region Proposal Network (RPN)

  • Extract features from RoI and predict class and bbox

◆Additionally adds a branch to output the pixel mask of the object

  • Uses Fully Convolutional Networks (FCN) sharing weights and maintaining spatial correspondence