Introducing Vid2Seq, a visual language model for dense video captioning that simply predicts all event boundaries and captions as a single sequence of tokens - PrO_RaZe Bookmarks #781
Introducing Vid2Seq, a visual language model for dense video captioning that simply predicts all event boundaries and captions as a single sequence of tokens - PrO_RaZe Bookmarks #781
Comments
Post a Comment