Google’s Text-to-Image AI will create any crazy image you can imagine

pictures from imagen by google. Image credits: google (collage by info-tech.vision)

Imagen is a text-to-image AI-based generator based on large transformation language models that…ok, let’s slow it down and extract it quickly.

Text-to-Image templates accept text input such as “a dog on a bike” and produce a corresponding image, which has been done for years but has recently made great strides in quality and accessibility.

Part of this is diffusion techniques, which start with a pure noise image and gradually refine it until the model thinks it can no longer look like a dog on a bike than it already does. It was an improvement over top-down generators that could make it hilarious at first glance and others that could easily be ignored.

The other part is a better understanding of language through large linguistic models using the transformative approach, the technical aspects of which we will not (and cannot) address here, but these and a few other recent developments have led to linguistic models. 3 and others.

The image starts by generating a small image (64×64 pixels) and then goes through two “super-resolution” passes to bring it to 1024×1024. However, this is not like normal resizing, as the AI’s super-resolution creates new details in harmony with the smaller image, using the original as a base.

Let’s say you have a dog on a bicycle and the dog’s eye has a diameter of 3 pixels in the first image. There is not much room for expression! But in the second image, it is 12 pixels in diameter. Where does the data needed for this come from? Well, the AI ​​knows what a dog’s eye looks like, so it generates more detail as it draws. Then it happens again when the eye is recreated, but it is 48 pixels in diameter. But in the blink of an eye, AI 48 had to be all dog-eye pixels…say, suck the magic bag. Like many artists, he started with the equivalent of a rough sketch, compiled it in a studio, and then went to town on the final canvas.

However, it will be difficult to judge for yourself as Google does not make the Imagen model public. There’s a good reason for that too. While text-to-picture templates certainly have fantastic creative potential, they also have a variety of problematic uses. Imagine a system that generates just about any image you want, say, for fake news, fake news, or harassment. As Google points out, these systems also encode social bias, and their output is often racist or toxic in some other inventive way.

Much of this is due to the way these systems are programmed. Essentially, they are trained on large amounts of data (in this case: many pairs of images and legends) that they study for models and learn to replicate. But these models require a lot of data, and most researchers, even those working for well-funded tech giants like Google, have decided that it is too expensive to filter this input fully. They then pull massive amounts of data from the web, and as a result, their models soak up all the hate you’d expect online.

Google is warning other AI makers to be cautious about releasing text-to-image templates to the public without paying much attention to the information an AI is trained on.

for more information visit the official site page: Imagen

Translate »