Tiny worlds: A minimal implementation of DeepMind's Genie world model

44 points by simonpure 4 days ago

Really neat project!

Do we know if the 3B model shown in the twitter thread is saturated and we need to train a bigger one, or if it is still converging? 3B parameters seems light for this but I don’t have a good intuition!

(Nit: “Zelda Ocarina of Time” is definitely showing Zelda A Link To The Past sprites, which would make more sense as that is a top down 2d SNES game and Ocarina was a 3d N64 game)

quanto - 12 hours ago

What a great project.

What's interesting is that action tokens are learned from video. In other words, the training dataset does not include actions like "go left" and "go right"; and these actions are learned from the pixels that moved. This means that learned actions may not map exactly to the game actions available to the user. That means we (humans) cannot necessarily use this world model to play the game.

I suspect the inferred actions probably directly correspond to human-understandable actions; and after playing with the action tokens, a reasonable human can probably guess what, say, the third action token in the dictionary corresponds to ("jump"). This is likely as game actions are sparse (in both time and action spaces) and often independent/orthogonal (in action space).

nxobject - 16 hours ago

With the caveat that I know little about the implementation of NNs (I do embedded): what it take to go from working on predicting a 2D array of RGB pixels, to predicting more general 2D arrays of structs?

It might be fun, for example, to simulate the evolution of multiple coupled discretized fields.