Upon the release of OpenAI’s video-generation model Sora in February 2024, a wave of fear and awe swept public discourse on the future of filmmaking. The sleek photorealistic videos shared by the company elicited a flurry of hyperbolic statements regarding Sora’s technical sophistication and its imminent impact on entertainment production. These clips were “good enough to freak us out” (Stern) and “unsettling, but hard not to be excited about” (Welk); they were like “we [Hollywood workers] were seeing our murder but it was beautiful” (Oxford).

At first glance, it seemed like our AI animation cup runneth over but, in doing so, threatened to drown us. No wonder, then, that OpenAI chose the following prompt for one of their demo videos: “Photorealistic closeup video of two pirate ships battling each other as they sail inside a cup of coffee” (Fig. 1).

AI-generated cup of coffee with ships
Figure 1. Result for the prompt: “Photorealistic closeup video of two pirate ships battling each other as they sail inside a cup of coffee”

Indeed, the scene of two ships being tossed to and fro in the middle of an incongruous landscape is an apt visual representation of the current state of text-to-video models, if not necessarily in the way OpenAI intended. The clip was meant to showcase Sora’s capacity “to generate complex scenes with multiple characters, specific types of motion, and accurate details of the subject and background” (Peebles et al.). Instead, it metaphorically plays out and self-reflexively animates Sora’s key constraints as a creative tool.

Uncontrollability

In global culture, seas and oceans often stand in for powers that are fickle, unpredictable, and difficult to tame. Despite its unusual choice of liquid, Sora’s video effectively renders the turbulent, erratic force of tidal waves. However, in Sora’s own workflow, uncontrollability has emerged as a key drawback. Shy Kids, the Canadian production studio behind the viral Sora short Air Head, has recently offered a detailed overview of the model’s current limitations, explaining that extensive post-production and editing was required to achieve the final cut. Throughout the interview, director Patrick Cederberg repeatedly emphasizes that “control is still the thing that is the most desirable and also the most elusive at this point” (Seymour). For example, he laments that there was “no explicit way” to have the yellow balloon head of the protagonist appear the same from shot to shot (Fig. 2). Since generative AI models engage in implicit, rather than explicit, shot generation, a tool like Sora can (and often does) interpret “yellow balloon” differently every time it is prompted (Seymour). These concerns are echoed by former Pixar animator Craig Good, who explains that Sora does not allow its user to “iterate in a controlled way” (Novak). As he points out, constant revisions and shot-level tweaks in response to feedback are essential to the animation process, but Sora does not allow for iterative changes, as it does not support multimodal input beyond the text prompt (Novak).

AI-generated man with a yellow balloon
Figure 2. Inconsistency with a yellow balloon in the shot.

Inconsistency

The absence of fine-tuning opportunities is compounded by Sora’s struggles with consistency, both within the videos themselves and in the production process. As John Naughton has noted, “Sora can be a bit hazy on cause and effect; ‘a person might take a bite out of a cookie, but afterward, the cookie may not have a bitemark’.” But while the model’s inability to ensure continuity is a known issue, OpenAI has yet to address it, leaving filmmakers to attempt workarounds such as hyper-descriptive prompting because “shot to shot / generation to generation, there isn’t [a] feature set in place yet for full control over consistency” (Seymour). For instance, as Cederberg reports, “you can put in a ‘Camera Pan’ and […] you’d get it six out of ten times” (Seymour). While such problems can be fixed by a team with post-production and visual effects experience, they undermine Sora’s claim to offer autonomous generation of sophisticated photorealistic footage based on prompts alone.

Unpredictability

Sora’s difficulties with consistency mean that it can be hard for filmmakers to predict what the model will output in response to a given prompt, or even on repeat entry of the same prompt. Cederberg reports that “many of the Air Head clips were generated as if shot in slow motion, while this was not requested in the prompt” (Seymour). Such persistent, yet inexplicable, deviations from the prompt have caused Cederberg to compare certain aspects of working with Sora to a “shot in the dark” and “a slot machine” (Seymour). While some degree of serendipity is not unheard-of during film production and may even push a piece in exciting unforeseen directions, Sora’s reliably high unreliability remains a hurdle to be addressed before the model can be effectively incorporated into time-based image-making.

Copyright Issues

Like the aroma of a freshly brewed cuppa, OpenAI’s decision to reference piracy is both bold and rich, given that the company has been sued for copyright infringement by eight daily newspapers and several popular writers, among a growing list of plaintiffs (Robertson). While these lawsuits target the illegal use of news articles to train chatbots, the lack of transparency around Sora’s training has also raised concerns. The technical paper shared upon Sora’s release did not specify the provenance of the training data, prompting suspicions of data scraping and copyright infringement (Mauran). In the absence of strict disclosure requirements and regulations, the use of AI models born of opaque business practices comes with a number of ethical and legal challenges and uncertainties.

Scale Mismatch

The image of two ships in a coffee cup is also a fitting representation of the mismatch between the Sora marketing hype and the model’s objective capabilities. While Sora’s videos are impressive, especially for such an early prototype, they are neither error-free, nor, as the Air Head example demonstrates, necessarily possible without extensive human edits. After the initial round of shock and awe, evaluations of Sora’s performance have been more measured in their critique of what the model can currently accomplish. For instance, Naughton’s review of the ships video notes that “while Sora may know a lot about the reflection of lights in fluids, it knows little or nothing about the physical laws that govern the movements of galleons.” In a recent Financial Times article, industry professionals were asked to provide feedback on Sora’s responses to their prompts. While all of them acknowledged that the tool shows promise and will likely have a range of future applications (such as generating stock footage and early concept clips), they also pointed out obvious animation mistakes, imperfections, and issues with shot transition and following the particulars of the prompt (Criddle and Griffiths). Given how new and untested Sora’s underlying technology is, it is hardly surprising that its galleon-sized ambitions have mostly yielded toy boat-sized deliverables, but this fundamental mismatch is nevertheless important to acknowledge. Sora and similar tools may promise to one day take the animation world by storm, but for now, their creative output remains a tempest in a teapot.

However, even if this early moment in the evolution of text-to-video models boils down to much ado about prompting, the discourse around Sora and its competitors is shaping the public’s understanding of contemporary animation production in ways that are worth paying attention to. While articles such as the aforementioned exposé on Air Head’s actual production process are contributing to the demystification of AI generators’ role in filmmaking and countering some of the most egregious examples of marketing speak, debates around AI and the moving image are inundated with misleading and hyperbolic promises that frame animation as an instant, made-to-order product. But it is not too late to change the narrative. Sora’s limitations are not simply a sign of insufficient scientific sophistication; they also serve as a reminder that human creative decision-making cannot be automated. AI approximates, predicts, and interpolates. It does not intuit, improvise, sense, and feel. It can’t put the anima in animation. In the wake of generative AI’s entry into media production, distinguishing skilled filmmaking from parlor tricks has never been more valuable and more urgent. Animation creators, educators, and scholars can – and should – spotlight the difference. It’s time to wake up and smell the coffee.

References

Criddle, Cristina, and Rory Griffiths. “How Good is OpenAI’s Sora video model – and will it transform jobs?” Financial Times, 1 May 2024, https://www.ft.com/content/ab70695f-584a-49bf-a635-38175be0718f

Mauran, Cecily. “What was Sora Trained On? Creatives Demand Answers.” Mashable, 16 Feb. 2024, https://mashable.com/article/openai-sora-ai-video-generator-training-data

Noughton, John. “OpenAI’s new video generation tool could learn a lot from babies.” The Guardian, 24 Feb. 2024, https://www.theguardian.com/commentisfree/2024/feb/24/openai-video-generation-tool-sora-babies-ai-artificial-intelligence

Novak, Matt. “Former Pixar Animator Gives One Big Reason AI Video won’t Work in Hollywood.” Gizmodo, 26 Apr. 2024, https://gizmodo.com/pixar-ai-openai-sora-animation-tiktok-hollywood-video-1851438404

Oxford, Dwayne. “Could OpenAI’s Sora text-to-video Generator Kill of Jobs in Hollywood?” Al Jazeera, 29 Mar. 2024, https://www.aljazeera.com/news/2024/3/29/what-is-openais-sora-text-to-video-generator

Peebles, Bill, et al. “Creating Video from Text.” Open AI, 15 Feb. 2024, https://openai.com/index/sora/

Robertson, Katie. “8 Daily Newspapers Sue OpenAI and Microsoft over A.I.” The New York Times, 30 Apr. 2024, https://www.nytimes.com/2024/04/30/business/media/newspapers-sued-microsoft-openai.html

Seymour, Mike. “Actually Using SORA.” FX Guide, 14 Apr. 2024, https://www.fxguide.com/fxfeatured/actually-using-sora/

Stern, Joanna. “OpenAI Made AI Videos for Us. These Clips are Good Enough to Freak us out.” The Wall Street Journal, 13 Mar. 2024, https://www.wsj.com/tech/personal-tech/openai-cto-sora-generative-video-interview-b66320bb

Welk, Brian. “Is OpenAI’s Sora the Filmmaking Apocalypse, or Just a Great Demo for a Tech Company?” Indie Wire, Feb. 26, 2024. https://www.indiewire.com/news/business/openai-sora-analysis-filmmaking-apocalypse-great-demo-tech-company-1234955252/


Mihaela Mihailova is an Assistant Professor in the School of Cinema at San Francisco State University. She is the editor of Coraline: A Closer Look at Studio LAIKA’s Stop-Motion Witchcraft (Bloomsbury, 2021). She has published in Journal of Cinema and Media Studies, The Velvet Light Trap, Journal of Japanese and Korean Cinema, Convergence, Feminist Media Studies, animation: an interdisciplinary journal, Studies in Russian and Soviet Cinema, and [in]Transition. Dr. Mihailova serves as editor of the open-access journal Animation Studies and as president of the Society for Animation Studies. Her current book project, Synthetic Creativity: Deepfakes in Contemporary Media, was recently awarded an NEH grant.