The December 2022 issue of IEEE Spectrum is here!

Close bar

Harnessing the Wild Power of AI Image Generation

A layout- and style-based architecture shows how to control AI capabilities to generate complex images

4 min read
computer generated images
A new AI method (nicknamed "LostGANs") creates and retains a background image, while also creating figures that are consistent from picture to picture, but show change or movement.
TIANFU WU & WEI SUN

AI has already shown off the capability to create photorealistic images of cats, dogs, and people’s faces that never existed before. More recently, researchers have been investigating how to train AI models to create more complex images that could include many different objects arranged in different poses and configurations.

The challenge involves figuring out how to get AI models—in this case typically a class of deep learning algorithms known as generative adversarial networks (GANs)—to generate more controlled images based on certain conditions rather than simply spitting out any random image. A team at North Carolina State University has developed a way for GANs to create such conditional images more reliably by using reconfigurable image layouts as the starting point.

“We want a model that is flexible enough such that when the input layout is reconfigurable, then we can generate an image that can be consistent,” says Tianfu Wu, an assistant professor in the department of electrical and computer engineering at North Carolina State University in Raleigh.

This layout- and style-based architecture for GANs (nicknamed “LostGANs”) came out of research by both Wu and Wei Sun, a former Ph.D. student in the department of electrical and computer engineering at North Carolina State University who is currently a research scientist at Facebook. Their paper on this work was published last month in the journal IEEE Transactions on Pattern Analysis and Machine Intelligence.

The starting point for the LostGANs approach involves a simple reconfigurable layout that includes rectangular bounding boxes showing where a tree, road, bus, sky, or person should be within the overall image. Yet previous AI models have generally failed to create photorealistic and perfectly proportioned images when they tried to work directly from such layouts.

This is why Wu and Sun trained their AI model to use the bounding boxes in the layout as a starting point to first create “object masks” that look like silhouettes of each object. This intermediate “layer-to-mask” step allows the model to further refine the general shape of such object silhouettes, which helps to make a more realistic and final “mask-to-image” result where all the visual details have been filled in.

The team’s approach also enables researchers to have the AI change the visual appearance of specific objects within the overall image layout based on reconfigurable “style codes.” For example, the AI can generate different versions of the same general wintry mountain landscape with people skiing by making specific style changes to the skiers’ clothing or even their body pose.

The results from the LostGANs approach are still not exactly photorealistic—such AI-generated images can sometimes resemble impressionistic paintings with strangely distorted proportions and poses. But LostGANs can synthesize images at a resolution of up to 512 x 512 pixels compared to prior layout-to-image AI models that usually generated lower-resolution images. The LostGANs approach also demonstrated some performance improvements over the competition during benchmark testing with the COCO-Stuff dataset and Visual Genome dataset.

A next step for LostGANs could involve better capturing the details of interactions between people and small objects, such as a person holding a tennis racket in a certain way. One way that LostGANs might improve here would be to use “part-level masks” that represent various components making up an object.

But just as importantly, Wu and Sun showed how to train LostGANs more efficiently using fewer labeled conditions without having to sacrifice the quality of the final image. Such semi-supervised training can rely on just 50 percent of the usual training images to bring LostGANs up to its usual performance standards. The source code and pretrained models of LostGANs are available online at GitHub for any other researchers interested in giving this approach a try.

Tech companies and organizations with much deeper pockets than academic labs have already begun showing the potential of harnessing AI-generated images. In 2019, NVIDIA demonstrated an AI art application called GauGAN that can convert rough sketches drawn by human artists into realistic-looking final images. In early 2021, OpenAI showed off a DALL·E version of its GPT-3 language model that can convert text prompts such as “an armchair in the shape of an avocado” into a realistic final image.

Still, the LostGANs research has a lot to offer despite not yet achieving as polished image results. By taking the layout-to-mask-to-image approach, LostGANs enables researchers to better understand how the AI model is generating the various objects within an image. Such transparency offered by LostGANs represents an improvement on the typical “black box” approach to many AI models that can leave even experts scratching their heads over how the final image was generated.

“For example, if you look at the image and the person doesn’t look correct, you can trace it back and see that it’s because the mask is not correctly computed,” Wu explains. “The mask is better for understanding what’s going on in the generated image and also makes it easier to control the image generation.”

The research could eventually help robots and AI agents to better envision the results of future interactions with objects within their immediate environment. Such image generation based on reconfigurable layouts could also potentially help generate different visual scenarios that could help train autonomous vehicles.

And in the near-term, LostGANs could play the role of an educational tool that invites students and other curious learners to interact with AI through setting up a simple image layout. During a departmental open house, an early version of LostGANs attracted the attention of local high school students with its still imperfect AI-generated images

“I think that will be fun for those students to play with,” Wu says. “Then they can get a rough understanding that ‘Oh, this is something where I can interact with an AI system through this simple painting.’”

The Conversation (0)

Will AI Steal Submarines’ Stealth?

Better detection will make the oceans transparent—and perhaps doom mutually assured destruction

11 min read
A photo of a submarine in the water under a partly cloudy sky.

The Virginia-class fast attack submarine USS Virginia cruises through the Mediterranean in 2010. Back then, it could effectively disappear just by diving.

U.S. Navy

Submarines are valued primarily for their ability to hide. The assurance that submarines would likely survive the first missile strike in a nuclear war and thus be able to respond by launching missiles in a second strike is key to the strategy of deterrence known as mutually assured destruction. Any new technology that might render the oceans effectively transparent, making it trivial to spot lurking submarines, could thus undermine the peace of the world. For nearly a century, naval engineers have striven to develop ever-faster, ever-quieter submarines. But they have worked just as hard at advancing a wide array of radar, sonar, and other technologies designed to detect, target, and eliminate enemy submarines.

The balance seemed to turn with the emergence of nuclear-powered submarines in the early 1960s. In a 2015 study for the Center for Strategic and Budgetary Assessment, Bryan Clark, a naval specialist now at the Hudson Institute, noted that the ability of these boats to remain submerged for long periods of time made them “nearly impossible to find with radar and active sonar.” But even these stealthy submarines produce subtle, very-low-frequency noises that can be picked up from far away by networks of acoustic hydrophone arrays mounted to the seafloor.

Keep Reading ↓Show less
{"imageShortcodeIds":["30133857"]}