filename : Dog16a.pdf entry : inproceedings conference : ECCV 2016 Workshop on Web-scale Vision and Social Media, Amsterdam, Netherlands, 8-16 October, 2016 pages : 605-620 year : 2016 month : October title : Label-Based Automatic Alignment of Video with Narrative Sentences subtitle : author : Pelin Dogan and Markus Gross and Jean-Charles Bazin booktitle : Computer Vision - ECCV 2016 Workshops ISSN/ISBN : editor : publisher : Springer publ.place : volume : issue : language : english keywords : video processing, natural language processing, video-text alignment abstract : In this paper we consider videos (e.g. Hollywood movies) and their accompanying natural language descriptions in the form of narrative sentences (e.g. movie scripts without timestamps). We propose a method for temporally aligning the video frames with the sentences using both visual and textual information, which provides automatic timestamps for each narrative sentence. We compute the similarity between both types of information using vectorial descriptors and propose to cast this alignment task as a matching problem that we solve via dynamic programming. Our approach is simple to implement, highly efficient and does not require the presence of frequent dialogues, subtitles, and character face recognition. Experiments on various movies demonstrate that our method can successfully align the movie script sentences with the video frames of movies.