You can literally run 3D algorithms like NeRF or COLMAP on those videos (check the tweet I sent), it's not my opinion, those videos are sufficiently 3D consistent that you can extract 3D geometry from them
Surely it's not perfect, but this was not the case for previous video generation algorithms
Surely it's not perfect, but this was not the case for previous video generation algorithms