How Grok Imagines AI Video Stacks Up to The Competition
Elon Musk's AI venture, xAI, recently rolled out Grok Imagine, a new tool for generating images and videos, now accessible to its paid subscribers. Musk has been actively promoting the tool on X, sharing user-generated content, some of which is labeled "Spicy" for its mildly NSFW nature.
The emergence of AI video is a major, and somewhat controversial, development in the tech world. Supporters see it as a revolutionary medium for artists and a way to cut production costs for animation and film. However, critics are concerned about the potential for misuse, including the creation of harmful sexual deepfakes and the spread of misinformation.
Setting that debate aside, how does Grok Imagine perform against its heavyweight rivals? The current leader in this space is arguably Google's Veo 3 AI video model, known for its lifelike results. Other major players include Sora from OpenAI and a new video feature from the popular image generator Midjourney.
Frankly, the initial results from Grok Imagine are not very impressive. While it's a new tool that Musk says "should get better every day," it currently appears to be significantly behind the competition.
Grok Imagine vs The Competition A Head to Head Test
To put these tools to the test, I used a simple prompt inspired by a viral AI video trend of animals on trampolines: "Security camera footage of rabbits jumping on a trampoline at night."
It's important to note a fundamental difference in how these tools work. Google's Veo 3 generates video directly from a text prompt. In contrast, Grok Imagine and Midjourney first create an image from text, which the user can then animate. This two-step process already puts Grok Imagine at a disadvantage compared to the more direct text-to-video models from Google and OpenAI.
The Underwhelming Results
My test prompt in Grok produced these rather disappointing images:
I picked the best of the bunch and animated it. The result was... okay. It can best be described as "mid" or just "meh."
You can see the Grok-generated video in this currently unavailable tweet.
When compared to the competition, the gap in quality becomes obvious. Both Google Veo 3 and OpenAI's Sora handled the same prompt with much greater success.
See the superior results from Veo 3 and Sora in this currently unavailable tweet.
Even Midjourney, which uses a similar image-to-animation process, produced a more convincing image and video with the grainy aesthetic of actual surveillance footage.
Watch the Midjourney animation in this currently unavailable tweet.
Strengths Weaknesses and Final Verdict
Audio is another significant weak point for Grok Imagine. While a tool like Veo 3 can generate relevant sound effects and even coherent dialogue, Grok Imagine's audio is often limited to basic sound effects and unintelligible noise.
Musk has compared Grok Imagine to a modern-day Vine, suggesting it is "optimized for most fun and shareable content." Based on initial tests, it seems best suited for creating memes and anime-style content. If your goal is to animate memes or create suggestive anime videos, it might suffice. For anything more sophisticated, it falls short.
However, Grok Imagine does have one clear advantage: speed. In my experience, it generates both images and videos much faster than its competitors.
Ultimately, while Grok Imagine is quick and may have a niche in creating simple, shareable content, it currently lags far behind its rivals in the race for high-quality AI video generation.
Disclosure: Ziff Davis, Mashable’s parent company, filed a lawsuit in April against OpenAI, alleging it infringed Ziff Davis' copyrights in training and operating its AI systems.