GPU rendered text interfaces are pretty ubiquitous already. You can find that in IDEs, browsers, apps and GUIs of OSs. Drawing pixels is still a job the GPU excels at. No matter whether it’s just text. So I don’t see a point why we shouldn’t apply that to terminal emulators as well.
I don’t know about the equipment of Waymo cars, but I would be surprised if they didn’t have LIDARs or some other form of distance based environment detection.
And that should be sufficient to implement basic obstacle detection. You don’t need to use machine learning if you can use sensors telling you that “something is too close”.