At the top of the AI model pecking order, we have models like Gemini 1.5 Pro and GPT-4o with other major players like Llama 3, Claude 3, and others. Somewhere in the mix, we have xAI’s Grok model. It’s been gaining functionality over the past couple of months, and Elon Musk just announced a new capability for the model that seems like something out of a sci-fi movie. Grok can now understand images and even explain jokes.
Grok has been in a bit of trouble ever since it gained the ability to generate images. It, of course, lacked any safeguards to keep people from generating problematic material. Regardless, it’s been developing. If you want to use this mode, you’ll need to be an X Premium+ subscriber. This is the most expensive payment tier that the company offers. It costs $16/month ($22/month if you sign up through the app).
Grok can now understand images
In order for any AI model to carry us to the AGI (Artificial General Intelligence) era, it needs to be multi-modal. This means that it has the ability to understand and generate multiple forms of media. Grok gained the ability to generate images, but understanding them is a different story.
Elon Musk posted on X about Grok’s newfound ability. In the example, he shows that it can provide an explanation of an image that’s been uploaded to it. For his example, he showed Grok explaining a meme. The meme showed a pair of soldiers spotting soldiers pretending to be dead (who also happen to be physicists).
Grok gives a six-bullet-point explanation of what happens in the image and ends with a closing statement. So, how accurate was it? Let’s just say that Grok won’t be speaking at any comedy workshops anytime soon. The model was able to identify what was happening in the scene like the soldiers, the hill, and the people on the ground.
However, it says “The humor comes from the punchline in the second panel, where one guard says, “Isaac Newton invented gravity,” implying that the reason the physicists are not moving (and thus appear dead) is because of gravity, which Newton is famous for describing mathematically, not inventing.”
It also says that the humor comes from the fact that physicists “would be so dedicated to their work that they’d lie down to study or “discover” gravity,“. So, it clearly misunderstood the meaning of the joke.
Does this mean that Grok is bad?
No, it means that AI itself has some improvements to make. We’re talking about training a model to understand humor, one of the most human creations ever. Not only that, but we gave the same image to Gemini to understand, and it also got it wrong.
The models understand the individual elements in the image, and they have a surface-level understanding of comedy; albeit very analytical. However, they don’t understand the bone that the artist has to pick with scientists who are anal about correcting people on finer details of speech. We’re not sure how companies will be able to teach AI models about that.