You’ve decided you’re in the market for a new couch but are not sure how it will look in your living room. You also can’t afford to purchase a couch, have it delivered and then decide if it is the right size and color for your space. Instead, you take out your smartphone and open the furniture app. When you point the camera at your living room wall, the new couch you are considering appears on screen, a simulation of what it would look like in your home. It looks like you’ve already bought the couch, and it’s a perfect fit.
Augmented reality (AR) is not a new technology—from the first-down line in a football broadcast to the ads displayed along the baseline of a basketball court, AR has been utilized for years. But now, the proliferation of smartphones and cameras has brought the technology into the hands of everyday consumers. AR lies at the intersection of computer vision (learning the three-dimensional geometry from a two-dimensional image) and computer graphics (creating a two-dimensional image from a three-dimensional model).
One of the challenges in AR is using a two-dimensional screen to interface with a three-dimensional world. Apps must create a two-dimensional representation of an object that fits in a three-dimensional world to properly trick the mind and match the user’s perspective. There are two steps to this process:
1. Computing the 3D space of the real world from the 2D camera image (i.e., mapping 2D space to 3D space—the computer vision problem)
2. Modeling an object in 3D and displaying it as 2D on top of the camera image (i.e., mapping 3D space to 2D space—the computer graphics problem)
The illusion only works if the 3D space in both steps is the same and the mapping in Step 2 is the reverse of Step 1. Therefore, the challenge is in computing the 3D transformation in Step 1.
There are several potential solutions to this problem. When available, two cameras can be used simultaneously, mimicking our own human stereo vision. But most smartphones only have a single camera available, so a different solution is needed.
Marker-based AR addresses the single-camera problem by placing a pre-scaled object in view of the camera. This pre-scaled object allows the camera to know the orientation of the room. Since a user may not have access to the special marker, enter Markerless AR, where everything about the world is learned from the world itself. Instead of using a single camera image, the video stream is analyzed to find flat planes and straight lines in the real world. The orientation of the room can then be computed using geometry and creating triangles between detected feature points.
Mines students are tackling all aspects of the augmented reality problem. The Intro to Computer Vision course teaches students how to compute the 3D space from a 2D image. The Computer Graphics class teaches students how to draw a 2D image from a 3D model. Mobile Application Development teaches students how to combine everything they’ve learned and create an app. Several students have even successfully deployed their app in the Google Play store. See examples at cs-courses.mines.edu/csci448/homework/appstore.html.
By Dr. Jeffrey Paone
Teaching Associate Professor, Computer Science