Two presentations in a session on urban modeling here delved into generating three-dimensional models of buildings and streets from casual sets of photographs.
Generating 3D models from 2D images isnâ''t a particularly advanced field, so these two new approaches definitely caught my eye. The state of the art requires a fair amount of user guidance to help the image-processing algorithms differentiate between a target object and visual clutter, such as trees, passing cars, and street signs. There's plenty of room for improvement in accuracy and detail, and users can always hope for a faster process and simpler interfaces.
Currently, the most accessible method of 3D modeling from photographs is probably Google SketchUpâ''s Photo Match feature. SketchUp is a modeling application that Google bought and then released almost three years ago. In Photo Match, a user imports an image and then traces over the lines of a buildingâ''the more sets of parallel lines, the better. Not surprisingly, those lines carry information about the perspective of the camera when the image was shot. The program uses that data to extrapolate the overall shape of the building. Once the rough outline is in place, the software can extract patterns from the photo to overlay texture detail. Voila, a quick-and-dirty 3D building. For better results, you can do the whole thing over again with another photo of the hidden sides.
The two methods presented here apply new methods to processing a collection of photos of a target scene.
One technique came out of a partnership between the University of North Carolinaâ''Chapel Hill, UC-Berkeley, ETH Zurich, and Microsoft Research. This approach starts with a jumble of images of a building or city. Preliminary image analysis identifies the imageâ''s vanishing points, similar to Photo Match. A user traces the rectangular outlines of the primary building walls, a geometric model is generated, and the textures from the original photograph are applied. My sense is that the main advances here over Photo Match are in the intelligent way that the photos are processed together to create a preliminary model, and in a simpler user experience. In ten to fifteen minutes, you can easily generate a model of a building from 8 or 9 photos. Give it an hour and 120 photographs and itâ''ll generate a fairly accurate model of a city. Of course, itâ''s a trade-off between the quantity of data needed to start off and the fidelity of the model.
The second method came from researchers at the Hong Kong University of Science and Technology and the National University of Singapore. It focused on facades rather than complete buildings. To start, a photographer drives down a street and takes successive shots of a continuous faÃ§ade (of a shopping street, for example). Those photos are automatically lined up, pattern-matched, and analyzed at a fairly deep level to generate a large mapping of points that capture the color, texture, and depth of various parts of a facade. The images are broken down into sections, analyzed for things such as embedded symmetries (to identify evenly spaced features that ought to be identical), then merged back together to speed up the rendering. A user helps the program identify the faÃ§adeâ''s salient features (this part of the talk was left unclear), and voila, an extremely detailed rendering of a street face pops up.
Neither approach is complete, but things move fast in the graphics world. It could be a matter of months before something along these lines gets incorporated into existing 3D modeling tools.