Earlier this week, the team behind Google’s advanced DeepMind neural network unveiled a new ability dubbed Transframer, which allows AI to generate 30-second videos from a single image input. It’s a nifty little trick at first glance, but the implications are much larger than an interesting .GIF file.
“Transframer is state-of-the-art on a variety of video generation benchmarks, and… can generate coherent 30 second videos from a single image without any explicit geometric information,” the DeepMind research team explains. Basically, all Transframer needs is a one photo, which it then analyzes and identities the picture’s framing, i.e. clues like a table, a hallway, or a street. After predicting a subject’s surroundings using these “context images,” it then envisions (and subsequently shows) what that target would look like from various angles. DeepMind’s team illustrates the procedure with targets like a chair, a laptop, a glass of water, and even a GRE textbook.
“Given a collection of context images with associated annotations (time-stamps, camera viewpoints, etc. ), and a query annotation, the task is to predict a probability distribution over the target image,” continues the team. “This framework supports a range of visual prediction tasks, including video modelling, novel view synthesis, and multi-task vision.”
As noted by Futurism, Transframer could one day offer an entirely new avenue within the video game industry by utilizing machine learning to build digital environments rather than relying on more time-consuming rendering methods. As the technology progresses, DeepMind’s Transframer training could open entirely new avenues for art, scientific analysis, and further AI development. Additionally, one Twitter user envisioned piggybacking their OpenAI’s DALL-E pictures on top of the Transframer program to create stacked AI creations—as if those images couldn’t get any more surreal.