Typical challenges and mistakes

mdsojolh444 · Post by **mdsojolh444** » Sun Jan 12, 2025 4:25 am

A mistake I see again and again: too many attempts are made to create photorealistic images. In my opinion, this is not ideal for two reasons:

The results often look even more artificial than the stock photos they are based on. In addition, the look of the images is often lacking. Stock photos are usually designed to be as neutral as possible, which makes them both flexible and boring. Photos become interesting through image composition, lighting, and the play with sharpness and blur. If you don't give any specifications, AI tools tend to produce something mediocre.
Problems and errors in the image are more noticeable, while in other styles they are seen as an expression of "creative freedom". A technical term here is "uncanny valley": the point at which, for example, an almost correct human face looks disturbing due to a small error.
That's why I often focus on illustrations and graphics. That doesn't mean that photorealistic images aren't useful at all. But it's good to have other options in mind.

Regardless of the style, it is important to understand the limitations of the tools. These can sometimes be surprising. For one motif, it works straight away, while another idea doesn't work even after dozens of attempts. This often has to do with what the AI knows from its training material. It can generate images that don't exist anywhere else.

But at the same time you have to be aware that these tools have no understanding whatsoever of what they are depicting in the image. They have no concept of the world in general or, for example, of human anatomy in particular.

AI Images Photorealistic Sample
Photorealism is not really working yet
A well-known example of this problem is hands. Dall-E or Stable Diffusion do not gambling data korea know what a human hand looks like or how it works. They have seen hands during training, but sometimes they are only visible from the side, partially hidden, or two hands are on top of each other. The AI does not understand that an average human hand has five fingers and that sometimes, due to perspective or other circumstances, you cannot see all of them.

Complex scenes are also difficult. For example, you want a picture that shows a team of five people and you have specific ideas about what each person should look like. Good luck with that! I hope you have time and patience...

It's similar when a person is supposed to take a clearly defined pose or you have an exact image composition in mind. Here it helps to create an image not just from a prompt, but also from a template (known as "image to image" as opposed to "text to image"). Stable Diffusion also has the ControlNet helper , which you can use to specify specific elements of a template that should appear in the new image.

At this point you will surely notice: the higher your demands and the more detailed your idea, the more difficult it becomes. However, it works well if you let the AI inspire you: for example, you describe to ChatGPT what purpose you need the image for and what it should represent and then you see how much you like the result and approach it step by step. With Stable Diffusion, on the other hand, you will experiment with the prompt, but also with numerous other options and settings.

The problematic aspects of image generators
However, this is not the only challenge. Another is: these AIs show what is in the training material. And that includes prejudices and clichés. This can include stereotypical gender roles or even racist worldviews. In the end, it is your responsibility to recognize and sort out such problematic representations. ChatGPT and Dall-E actively try to avoid this.

Another point concerns the "training material" mentioned several times already. Similar to text generators, these tools also learned their skills from human models. They were fed an enormous amount of data. Whether these photos, graphics, illustrations, paintings and other works were allowed to be used for this purpose is a hotly debated question.