Research

Personal Video Analysis (NEW)

Nowadays, people share tons of images and videos online. However, raw personal videos typically will rarely be watched again, since they are long and boring most of the time. Our research focuses on generating an “at-a-glance” visualization of highlights in raw person videos.

Min Sun, Ali Farhadi, and Steve Seitz, “Ranking Domain-specific Highlights by Analyzing Edited Videos.” ECCV’14 (pdf)
Min Sun, Ali Farhadi, Ben Taskar, and Steve Seitz, “Salient Montages from Unconstrained Videos.” ECCV’14 (pdf)

Please find codes and data here.

Large-scale Vision

The large number of visual concepts in the real world introduces the challenges for computers to make sense the visual world at a human-level. We propose novel methods to predict fine-grained object categories, and learn hierarchical linguistic descriptions of visual concepts.

M. Sun, W. Huang, and S. Savarese, “Find the Best Path: an Efficient and Accurate Classifier for Image Hierarchies.” ICCV’13 (pdf)
R. Mittelman, M. Sun, B. Kuipers, and S. Savarese, “Learning Hierarchical Linguistic Descriptions of Visual Datasets”. NAACL-HLT Workshop on Vision and Language 2013 (pdf) (bibtex)

3D Vision

Utilizing Image (RGB) + Depth (D) data is a promising direction in computer vision. We proposed methods to take advantage of a large number of RGBD data during training, while being flexible to handle both RGB only or RGBD data.

Ashutosh Saxena, Min Sun, Andrew Y. Ng, “Make3D: Learning 3-D Scene Structure from a Single Still Image”. TPAMI’08(pdf) (project)
M. Sun, G. Bradsky, B. Xu, and S. Savares, “Depth-Encoded Hough Voting for Joint Object Detection and Shape Recovery”. ECCV’10 (pdf) (bibtex) ( dataset )
M. Sun, P. Kohli, and J. Shotton, “Conditional regression forests for human pose estimation”. CVPR’12 (pdf)

3D Object Representation

Objects exist in a three dimensional physical world. However, from images and videos, computers only observe objects from a few samples of 2D projected views. We develope novel 3D object representation for pose invariant detection, object viewpoint classification, and view synthesis.

M. Sun, H. Su, Silvio Savarese, L. Fei-Fei, “A Multi-View Probabilistic Model for 3D Object Classes”. CVPR’09 (pdf) (bibtex)
M. Sun, H. Su, Silvio Savarese, L. Fei-Fei, “Learning a dense multi-view representation for detection, viewpoint classification and synthesis of object categories” (Oral). ICCV’09 (pdf) (bibtex)
L. Mei, M. Sun, K.M. Carter, A.O. Hero III, S. Savarese, “Unsupervised Object Pose Classification from Short Video Sequences”. BMVC’09 (pdf) (bibtex)

Human Pose Estimation

Human pose estimation (e.g., determining body part locations) given images and videos is a challenging problem in computer vision and it is critical in many applications such as human computer interaction, video surveillance and gaming. We aim at exploring hierarchical dependency and long-range interactions of body parts for tacklnig the challenge.

Min Sun and Silvio Savarese, “Articulated Part-based Model for Joint Object Detection and Pose Estimation”. ICCV’11 (pdf) (project).
Min Sun, Murali Telaprolu, Honglak Lee, and Silvio Savarese, “An Efficient Branch-and-Bound Algorithm for Optimal Human Pose EStimation”. CVPR’12 (pdf) (bibtex) (technical report) (project).
Min Sun, Murali Telaprolu, Honglak Lee, and Silvio Savarese, “Efficient and Exact MAP Inference using Branch and Bound”. AISTATS’12 (pdf) (bibtex) (technical report).

Scene Understanding

Human has the amazing ability to fully understand a scene such as the category of the scene, the layout of the scene, the objects configuration in the scene. We propose novel models to jointly capture the inter-dependency among layout, segments, and objects.

S. Yingze Bao, M. Sun, and S. Savarese, “Toward Coherent Object Detection And Scene Layout Understanding”. CVPR’10 (pdf) (bibtex)
M. Sun, S. Ying-Ze Bao, and S. Savarese, “Object Detection with Geometrical Context Feedback Loop”. BMVC’10 (pdf) (bibtex)
M. Sun, B. Kim, P. Kohli, and S. Savarese, “Relating Things and Stuff via Object Property Interactions”. TMAPI’13 (pdf) (bibtex)

My junior colleague Yingze Bao carries out the “Semantic Structure from Motion” after our joint CVPR’10 and BMVC’10 work.