Discussion about this post

User's avatar
jiajun's avatar

This is a fascinating.

I’ve been discussing a similar idea with friends: world models could unlock a completely new form of interacted entertainment expereiment. Imagine a TikTok where every video is not just watched but entered—something like Roblox, but with higher entertainment value, interactivity, and remixability.

In this paradigm, interaction itself becomes authorship. Your path, choices, and tweaks generate new experiences that others can view and explore, much like watching a gameplay recording on youtube but with the ability to “drop in” at any moment and continue playing with the cloud gaming technology, and start your game. With Word Models, these wouldn’t be limited to game loops—they’d become living, generative spaces.

Yondon Fu's avatar

Very interesting framing around the different layers!

Another related framing [1] I've been thinking about it is:

1. WMs can offer a non-deterministic simulation (the "brain"?) good for fast creative exploration

2. Game engines and 3D modeling tools can offer a deterministic simulation (the "memory"?) that is slower to construct, but gives fine grained control and persistence helpful for collaboration

3. 3DGS/NeRFs can help translate between the two simulations by helping transform multi-viewpoint video frames into 3D assets/scenes

If 1 is a sort of fluid, dynamic dream then 3 is a way to freeze parts of the dream and pull them into 2 for further play (with others) where you know that you won't lose it to the ephemerality of the dream.

WMs are only going to better, but perhaps the determinism (and the control/collaboration it enables) you get from actual 3D representations ends up being why they continue to be used even if we get increasingly capable and efficient WMs.

[1] Influenced by this great talk https://www.youtube.com/watch?v=hFlF33JZbA0

No posts

Ready for more?