Introducing Vid2Seq, a visual language model for dense video captioning that simply predicts all event boundaries and captions as a single sequence of tokens - PrO_RaZe Bookmarks #781

Introducing Vid2Seq, a visual language model for dense video captioning that simply predicts all event boundaries and captions as a single sequence of tokens - PrO_RaZe Bookmarks #781

Comments

Popular posts from this blog