Google’s new AI tool Whisk uses images as instructions

Google has yet another AI tool to add to the pile. Whisk is a Google Labs image resource that allows you to use an existing image as your reference. But the output only captures the “gist” of your original image rather than recreating it with new details. Therefore, it is better to discuss and visualize quickly than the editing of the source image.

The company describes Whisk as “a new kind of creative tool.” The installation screen starts with a blank interface with style and title input. This simple interface only allows you to choose from three predefined styles: sticker, enamel pin and plushie. I suspect that Google has found that those three are allowed in the type of results that are a tight framework for a survey tool that is very convenient in its current form.

As you can see in the image above, it produced a solid image of Wilford Brimley’s plushie. (Google’s terms prohibit celebrity photos, but Wilford entered the gates, Quaker Oats in tow, without alerting the guards.)

Whisk also includes a very advanced editor (available by clicking “Start from scratch” on the main screen). In this mode, you can use the source text or image in three categories: title, location and style. There is also an input bar for adding additional text for finishing touches. However, as it stands, the advanced controls did not produce the same results as my questions.

For example, check out my attempt to produce Mr. The late Brimley in a lightbox scene in the style of a walrus plushie I found online:

A screenshot of an AI generation tool that generates images of a man who looks like Wilford Brimley.

Google / Screenshot by Will Shanklin for Engadget

Flip to reveal what looks like a vaguely Wilford Brimley-esque character eating oatmeal inside a lightbox frame. As far as I know, that guy is not a plushie. Therefore, it is clear why Google recommends using the tool more for “quick visual inspection” and less for content that is ready to be produced.

Google admits that Whisk will only extract “a few important features” of your source image. “For example, a reproduced subject may have a different height, weight, hairstyle or skin tone,” the company warns.

To understand why, look no further than Google’s explanation of how Whisk works under the hood. It uses the Gemini language model to write detailed captions for the source image you are uploading. It then enters that definition into the Imagen 3 image generator. So, the result is a supported image Gemini’s words about your image – not the source image itself.

Whisk is only available in the US, at least for now. You can try it out on the Google Labs project site.


Source link

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top