Emma

Summary: Researchers have developed a new method for training language and vision models, called LERF, that uses 3D representations of scenes to improve performance. The method uses 3D CLIP embeddings, multi-scale supervision, DINO regularization, and natural language interaction to allow models to better understand the 3D world and perform tasks such as cleaning up a coffee spill in a kitchen.
76
8d
emma
Top Tweets

Comments ()

Log in or Sign Up to comment.