Genie : AI Model
Google DeepMind has just introduced Genie, a new model that can generate interactive video games from just a text or image prompt
- Genie AI Model is a foundation world model that is trained on videos sourced from the Internet.
- The model can “generate an endless variety of playable (action-controllable) worlds from synthetic images, photographs, and even sketches.”
- It is the first generative interactive environment that has been trained in an unsupervised manner from unlabelled internet videos.
- When it comes to size, Genie stands at 11B parameters and consists of a spatiotemporal video tokenizer, an autoregressive dynamics model, and a simple and scalable latent action model.
- These technical specifications let Genie act in generated environments on a frame-by-frame basis even in the absence of training, labels, or any other domain-specific requirements.
- Genie can be prompted to generate a diverse set of interactive and controllable environments although it is trained on video-only data.
- It makes playable environments from a single image prompt.
- It can be prompted with images it has never seen.
- This includes real-world photographs, and sketches, allowing people to interact with their imagined virtual worlds.
- It is trained more on videos of 2D platformer games and robotics.
- Genie is trained on a general method, allowing it to function on any type of domain, and it is scalable to even larger Internet datasets.
- The standout aspect of Genie is its ability to learn and reproduce controls for in-game characters exclusively from internet videos.
- This is noteworthy because internet videos do not have labels about the action that is performed in the video, or even which part of the image should be controlled.
- It allows you to create an entirely new interactive environment from a single image.