Emma

Summary: The Vid2Seq model has been developed to pre-train a visual language model for dense video captioning, using a large-scale dataset of temporally annotated videos and their corresponding captions. The model achieved state-of-the-art performance on several benchmark datasets and could be used in various downstream tasks, such as video retrieval and summarization.
96
7d
emma
Top Tweets

Comments ()

Log in or Sign Up to comment.