Harnessing the Wild Power of AI Image Generation

AI has already shown off the capability to create photorealistic images of cats, dogs, and people’s faces that never existed before. More recently, researchers have been investigating how to train AI models to create more complex images that could include many different objects arranged in different poses and configurations.

The challenge involves figuring out how to get AI models—in this case typically a class of deep learning algorithms known as generative adversarial networks (GANs)—to generate more controlled images based on certain conditions rather than simply spitting out any random image. A team at North Carolina State University has developed a way for GANs to create such conditional images more reliably by using reconfigurable image layouts as the starting point.

“We want a model that is flexible enough such that when the input layout is reconfigurable, then we can generate an image that can be consistent,” says Tianfu Wu, an assistant professor in the department of electrical and computer engineering at North Carolina State University in Raleigh.

This layout- and style-based architecture for GANs (nicknamed “LostGANs”) came out of research by both Wu and Wei Sun, a former Ph.D. student in the department of electrical and computer engineering at North Carolina State University who is currently a research scientist at Facebook. Their paper on this work was published last month in the journal IEEE Transactions on Pattern Analysis and Machine Intelligence.

The starting point for the LostGANs approach involves a simple reconfigurable layout that includes rectangular bounding boxes showing where a tree, road, bus, sky, or person should be within the overall image. Yet previous AI models have generally failed to create photorealistic and perfectly proportioned images when they tried to work directly from such layouts.

This is why Wu and Sun trained their AI model to use the bounding boxes in the layout as a starting point to first create “object masks” that look like silhouettes of each object. This intermediate “layer-to-mask” step allows the model to further refine the general shape of such object silhouettes, which helps to make a more realistic and final “mask-to-image” result where all the visual details have been filled in.

The team’s approach also enables researchers to have the AI change the visual appearance of specific objects within the overall image layout based on reconfigurable “style codes.” For example, the AI can generate different versions of the same general wintry mountain landscape with people skiing by making specific style changes to the skiers’ clothing or even their body pose.

The results from the LostGANs approach are still not exactly photorealistic—such AI-generated images can sometimes resemble impressionistic paintings with strangely distorted proportions and poses. But LostGANs can synthesize images at a resolution of up to 512 x 512 pixels compared to prior layout-to-image AI models that usually generated lower-resolution images. The LostGANs approach also demonstrated some performance improvements over the competition during benchmark testing with the COCO-Stuff dataset and Visual Genome dataset.

A next step for LostGANs could involve better capturing the details of interactions between people and small objects, such as a person holding a tennis racket in a certain way. One way that LostGANs might improve here would be to use “part-level masks” that represent various components making up an object.

But just as importantly, Wu and Sun showed how to train LostGANs more efficiently using fewer labeled conditions without having to sacrifice the quality of the final image. Such semi-supervised training can rely on just 50 percent of the usual training images to bring LostGANs up to its usual performance standards. The source code and pretrained models of LostGANs are available online at GitHub for any other researchers interested in giving this approach a try.

Tech companies and organizations with much deeper pockets than academic labs have already begun showing the potential of harnessing AI-generated images. In 2019, NVIDIA demonstrated an AI art application called GauGAN that can convert rough sketches drawn by human artists into realistic-looking final images. In early 2021, OpenAI showed off a DALL·E version of its GPT-3 language model that can convert text prompts such as “an armchair in the shape of an avocado” into a realistic final image.

Still, the LostGANs research has a lot to offer despite not yet achieving as polished image results. By taking the layout-to-mask-to-image approach, LostGANs enables researchers to better understand how the AI model is generating the various objects within an image. Such transparency offered by LostGANs represents an improvement on the typical “black box” approach to many AI models that can leave even experts scratching their heads over how the final image was generated.

“For example, if you look at the image and the person doesn’t look correct, you can trace it back and see that it’s because the mask is not correctly computed,” Wu explains. “The mask is better for understanding what’s going on in the generated image and also makes it easier to control the image generation.”

The research could eventually help robots and AI agents to better envision the results of future interactions with objects within their immediate environment. Such image generation based on reconfigurable layouts could also potentially help generate different visual scenarios that could help train autonomous vehicles.

And in the near-term, LostGANs could play the role of an educational tool that invites students and other curious learners to interact with AI through setting up a simple image layout. During a departmental open house, an early version of LostGANs attracted the attention of local high school students with its still imperfect AI-generated images

“I think that will be fun for those students to play with,” Wu says. “Then they can get a rough understanding that ‘Oh, this is something where I can interact with an AI system through this simple painting.’”

gans software machine learning

Topics

Sections

More

For IEEE Members

For IEEE Members

IEEE Spectrum

Follow IEEE Spectrum

Support IEEE Spectrum

Harnessing the Wild Power of AI Image Generation

A layout- and style-based architecture shows how to control AI capabilities to generate complex images

7 Bell Labs Breakthroughs Honored as IEEE Milestones

Video Friday: Musculoskeletal Robot Dog

The Untold History of the RESISTORS

Related Stories

DeepMind's Robots Play Infinite Table Tennis

Why the Nobel Prize in Physics Went to AI Research

15 Graphs That Explain the State of AI in 2024

Topics

Sections

More

For IEEE Members

For IEEE Members

IEEE Spectrum

Follow IEEE Spectrum

Support IEEE Spectrum

Enjoy more free content and benefits by creating an account

Saving articles to read later requires an IEEE Spectrum account

The Institute content is only available for members

Downloading full PDF issues is exclusive for IEEE Members

Downloading this e-book is exclusive for IEEE Members

Access to Spectrum 's Digital Edition is exclusive for IEEE Members

Following topics is a feature exclusive for IEEE Members

Adding your response to an article requires an IEEE Spectrum account

Create an account to access more content and features on IEEE Spectrum , including the ability to save articles to read later, download Spectrum Collections, and participate in conversations with readers and editors. For more exclusive content and features, consider Joining IEEE .

Join the world’s largest professional organization devoted to engineering and applied sciences and get access to all of Spectrum’s articles, archives, PDF downloads, and other benefits. Learn more about IEEE →

Join the world’s largest professional organization devoted to engineering and applied sciences and get access to this e-book plus all of IEEE Spectrum’s articles, archives, PDF downloads, and other benefits. Learn more about IEEE →

Access Thousands of Articles — Completely Free

Create an account and get exclusive content and features: Save articles, download collections, and post comments — all free! For full access and benefits, subscribe to Spectrum.

Harnessing the Wild Power of AI Image Generation

A layout- and style-based architecture shows how to control AI capabilities to generate complex images

7 Bell Labs Breakthroughs Honored as IEEE Milestones

Video Friday: Musculoskeletal Robot Dog

The Untold History of the RESISTORS

Related Stories

DeepMind's Robots Play Infinite Table Tennis

Why the Nobel Prize in Physics Went to AI Research

15 Graphs That Explain the State of AI in 2024