"At-a-glance" visualization of highlights in raw personal videos

GRAIL LAB

"At-a-glance" Visualization of Highlights in Raw Personal Videos

Fig. 1.: Retrieving domain-specific highlights (e.g., for surfing) from unconstrained personal videos is an important step toward automatic video editing. Our system automatically learns how to rank the ``highlightness'' of every moment (a short 2 seconds clip) in a raw video by analyzing edited videos on Youtube. Here we show ranking results of our system on two raw videos (click to watch on Youtube link1, link2) captured by GoPro cameras, where each clip is represented by a frame sampled from the clip.

Fig. 2: Given a video (click to watch on Youtube: link) captured by a head-mounted camera (top row), we first automatically identify montageable moments (highlighted by the color-coded bounding boxes) containing the salient person (the little girl in pink) and ignore irrelevant frames. A set of salient montages ordered by our novel montageability scores is generated automatically. Here we show four typical examples.

Nowadays, people share tons of images and videos online. However, raw personal videos typically will rarely be watched again, since they are long and boring most of the time. Our research focuses on generating an "at-a-glance" visualization of highlights in raw person videos. We have two related work for this project. Our first work, "Ranking Domain-specific Highlights by Analyzing Edited Videos" focuses on automatically finding highlights in raw personal videos (see Fig. 1). Our second work, "Salient Montages from Unconstrained Videos" focuses on automatically generating an at-a-glance visualization (referred to as salient montages) of highlights in raw personal videos (see Fig. 2). Their abstracts are shown below.

Abstract of "Ranking Domain-specific Highlights by Analyzing Edited Videos": We present a fully automatic system for ranking domain-specific highlights in unconstrained personal videos by analyzing online edited videos. A novel latent linear ranking model is proposed to handle noisy data harvested online. Specifically, given a search query (domain) such as ``surfing'', our system mines the Youtube database to find pairs of raw and their corresponding edited videos. Leveraging the assumption that edited video is more likely to contain highlights than the trimmed parts of the raw video, we obtain pair-wise ranking constraints to train our model. The learning task is challenging due to the amount of noise and variation in the mined data. Hence, a latent loss function is incorporated to robustly deal with the noise. We efficiently learned the latent model on a large number of videos (about 700 minutes in all) using a novel EM-like self-paced model selection procedure. Our latent ranking model outperforms its classification counterpart and a fully-supervised ranking system that requires labels from Amazon Mechanical Turk. Finally, we show that impressive highlights can be retrieved without additional human supervision for domains like skating, surfing, skiing, gymnastics, parkour, and dog in unconstrained personal videos.

Abstract of "Salient Montages from Unconstrained Videos": We present a novel method to generate salient montages from unconstrained videos, by finding ``montageable moments'' and identifying the salient people and actions to depict in each montage. Our method addresses the need for generating concise visualizations from the increasingly large number of videos being captured from portable devices. Our main contributions are (1) the process of finding salient people and moments to form a montage, and (2) the application of this method to videos taken ``in the wild'' where the camera moves freely. As such, we demonstrate results on head-mounted cameras, where the camera moves constantly, as well as on videos downloaded from YouTube. Our approach can operate on videos of any length; some will contain many montageable moments, while others may have none. We demonstrate that a novel ``montageability'' score can be used to retrieve results with relatively high precision which allows us to present high quality montages to users.

Publications

Min Sun, Ali Farhadi, and Steve Seitz, "Ranking Domain-specific Highlights by Analyzing Edited Videos." ECCV 2014 (pdf) (tech ) (github).
Min Sun, Ali Farhadi, Ben Taskar, and Steve Seitz, "Salient Montages from Unconstrained Videos." ECCV 2014 (pdf) (tech ) (github).

Contact : sunmin at ee dot nthu dot edu dot tw

Last update : March 16th, 2014