Nowadays, people share tons of images and videos online. However, raw personal videos typically will rarely be watched again, since they are long and boring most of the time. Our research focuses on generating an “at-a-glance” visualization of highlights in raw person videos.

Large-scale Vision

The large number of visual concepts in the real world introduces the challenges for computers to make sense the visual world at a human-level. We propose novel methods to predict fine-grained object categories, and learn hierarchical linguistic descriptions of visual concepts.

3D Vision

Utilizing Image (RGB) + Depth (D) data is a promising direction in computer vision. We proposed methods to take advantage of a large number of RGBD data during training, while being flexible to handle both RGB only or RGBD data.

3D Object Representation

Objects exist in a three dimensional physical world. However, from images and videos, computers only observe objects from a few samples of 2D projected views. We develope novel 3D object representation for pose invariant detection, object viewpoint classification, and view synthesis.

Human Pose Estimation

Human pose estimation (e.g., determining body part locations) given images and videos is a challenging problem in computer vision and it is critical in many applications such as human computer interaction, video surveillance and gaming. We aim at exploring hierarchical dependency and long-range interactions of body parts for tacklnig the challenge.

Scene Understanding

Human has the amazing ability to fully understand a scene such as the category of the scene, the layout of the scene, the objects configuration in the scene. We propose novel models to jointly capture the inter-dependency among layout, segments, and objects.

My junior colleague Yingze Bao carries out the “Semantic Structure from Motion” after our joint CVPR’10 and BMVC’10 work.