Developing Human-like 3D Perception in Robots

The technology will enable machines to navigate safely in an environment with humans and other objects

Much work has been done in the area of robotics, from control and path planning to collision avoidance and mechanical design. But one thing remains elusive – giving robots the ability to perceive the 3D world the way humans do. And this is the challenge that Assistant Professor Lee Gim Hee from the Department of Computer Science, National University of Singapore (NUS), has set out to solve with his research into 3D computer vision.

Prof Lee’s research journey began during his undergraduate and Master’s studies at NUS, when he was working on mobile robots as part of his final year project and Master’s thesis. He noticed that the challenge of getting a robot to function autonomously in any given environment was far from being solved. And this was because the most important competency in this respect – 3D computer vision/perception – was still lacking.

Deciding to devote his time to solving this problem, he went to ETH Zurich to do his PhD in 3D computer vision.

“The main objective of 3D computer vision research is to recover geometric and semantic information about the 3D world that we live in, using sensory inputs such as RGB camera images and/or 3D point clouds from range scanners,” explained Prof Lee. This involves reconstructing 3D models and combining it with an object-level and/or scene-level understanding of the 3D world.

There are many exciting application possibilities for this. Prof Lee cites the example of a self-driving car or an industrial/domestic robot, which can use the technology to “see” its 3D surroundings and “know” its own position and orientation within that environment. This will enable it to move, navigate and interact safely in the environment with humans and other objects.

Armed with accurate 3D perception capabilities, a self-driving car can help to reduce traffic accidents, and a fully autonomous industrial/domestic robot can help lighten the workload for humans.

3D computer vision also has many applications in navigation and metrology. Prof Lee gave the example of a student who is lost on campus. The student can take a photo of his current location and send it to a cloud server which has a 3D model of the campus. 3D computer vision algorithms can then register his photo with the 3D model to identify the student’s location and suggest a possible to route to his destination.

The same 3D model and 3D computer vision algorithms can be used in construction metrology to monitor and analyse structural integrity and aid architectural design to prevent catastrophic failure in buildings. They can also be used to help archaeologists and historians preserve a digital copy of an ancient artefact, or law enforcers to preserve a crime scene for further investigation.

However, despite the progress that has been made, Prof Lee finds that there is still a gap in the research. “Studies of geometry and semantics are still largely decoupled,” he explained. “Algebra and physics are used to solve the geometry problems, while machine learning techniques that learn from big data are used to solve the semantic problems.

He is intent on finding the missing link between the two approaches, so that geometry and semantic information can complement and compensate for each other’s strengths and deficiencies. The ability to learn from big data can be used to mitigate difficulties in handcrafting sophisticated geometric models to model complex scenes; and prior knowledge of geometry can help reduce the reliance on large amounts of training data without sacrificing accuracy in semantic understanding. “This will take 3D computer vision into a whole new level, paving the way for a more holistic understanding of our 3D world by machines,” he said.