Automatic Video Scene Annotation and Summarization Framework (AVSAS)

No Thumbnail Available



Journal Title

Journal ISSN

Volume Title


Addis Ababa University


With huge amount of video data being kept on the web, efficient utilization of these resources can be achieved with content based video indexing, grouping, searching and retrieval. For this content based video processing, automatic video scene annotation is required for the identification and labeling of events and objects in a video with a descriptive statement. Besides, annotations are important to provide information to viewers about content of a video. Video annotation requires large knowledge to define semantic meaning of events and objects in the video. With this regard both manual and semi-supervised video annotations fail as they require expertise for correct identification and labeling of video concepts. Individuals may also be biased with their interest while annotating a video which will put the video into an incorrect class while grouping or indexing. Annotation requires great deal of concept dependency and relatedness processing to give a descriptive statement about a scene in a video. This thesis introduces a novel scene level automatic video annotation and summarization framework where annotation provides scene level semantic description of videos. The framework uses speech content of a video to support event and object identification process with proper filtering and normalization process. It addresses the problem of concept relatedness and concept formulation process with new event-object affinity matrix, and delivers shot and scene level video annotations. It provides different algorithms and components starting from the video pre-processing stage to the final video summarization process for an efficient annotation and summarization result. A prototype showing a scene level video annotation is developed and evaluated according to standard video processing evaluation datasets for its accuracy in event and object prediction and overall efficiency of the system is evaluated using ratings of individuals from different professions. The analysis on the evaluation result shows that the system provides an efficient scene level annotation of video contents with 81% accuracy in event prediction and an average user rating of 3.47 out of 4 in overall system evaluation. Keywords: Video annotation, Scene level video annotation, Concept formulation, Concept normalization, Video summarization, Event prediction and Shot boundary detection



Video Annotation; Scene Level Video Annotation; Concept Formulation; Concept Normalization ;Video Summarization; Event Prediction and Shot Boundary Detection