Posts by Collection



MlioLight: Projector-camera Based Multi-layered Image Overlay System for Multiple Flashlights Interaction

Published in The 2018 ACM International Conference on Interactive Surfaces and Spaces (ISS 2018, Full paper), 2018

Recommended citation: Sato et al. "MlioLight: Projector-camera Based Multi-layered Image Overlay System for Multiple Flashlights Interaction." Proceedings of the 2018 ACM International Conference on Interactive Surfaces and Spaces. 2018.

Lightweight 3D Human Pose Estimation Network Training Using Teacher-Student Learning

Published in IEEE/CVF Winter Conference on Applications of Computer Vision 2020 (IEEE/CVF WACV 2020, Full paper), 2020

Recommended citation: Hwang et al. "Lightweight 3D human pose estimation network training using teacher-student learning." Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 2020.


Synthesizing Pseudo-2.5D Content from Monocular Videos for Mixed Reality (NAVER)


Free-viewpoint video (FVV) is a kind of advanced media that provides a more immersive user experience than traditional media. It allows users to interact with content because users can view media at the desired viewpoint and is becoming a next-generation media. In creating FVV content, existing systems require complex and specialized capturing equipment and has low end-user usability because it needs a lot of expertise to use the system. This becomes an inconvenience for individuals or small organizations who want to create content and limits the end user’s ability to create FVV-based user-generated content (UGC) and inhibits the creation and sharing of various created content. To tackle these problems, ParaPara is proposed in this work. ParaPara is an end-to-end system that uses a simple yet effective method to generate pseudo-2.5D FVV content from monocular videos, unlike the previously proposed systems. First, the system detects persons from the monocular video through a deep neural network, calculates the real-world homography matrix based on the minimal user interaction, and estimates the pseudo-3D positions of the detected persons. Then, person textures are extracted using general image processing algorithms and placed at the estimated real-world positions. Finally, the pseudo-2.5D content is synthesized from these elements. The content, which is synthesized by the proposed system, is implemented on Microsoft HoloLens; the user can freely place the generated content on the real world and watch it on a free viewpoint.

Mobile Human Pose Estimation (Microsoft Research Asia)


Microsoft Research Asia Invited Talk

We present MoVNect, a lightweight deep neural network to capture 3D human pose using a single RGB camera. To improve the overall performance of the model, we apply the teacher-student learning method based knowledge distillation to 3D human pose estimation. Real-time post-processing makes the CNN output yield temporally stable 3D skeletal information, which can be used in applications directly. We implement a 3D avatar application running on mobile in real-time to demonstrate that our network achieves both high accuracy and fast inference time. Extensive evaluations show the advantages of our lightweight model with the proposed training method over previous 3D pose estimation methods on the Human3.6M dataset and mobile devices.