We all know what the abstract of an article is: a short summary of a document, often used to preselect material relevant to the user. The medium of the abstract and the document are the same, namely text. In the age of multimedia, it would be desirable to use video abstracts in very much the same way: as short clips containing the essence of a longer video, without a break in the presentation medium. However, the state of the art is to use textual abstracts for indexing and searching large video archives. This media break is harmful since it typically leads to considerable loss of information. For example it is unclear at what level of abstraction the textual description should be; if we see a famous politician at a dinner table with a group of other politicians, what should the text say? Should it specify the names of the people, give their titles, specify the event, or just describe the scene as if it were a painting, emphasizing colors and geometry? An audio-visual abstract, to be interpreted by a human user, is semantically much richer than a text. We define a video abstract to be a sequence of moving images, extracted from a longer video, much shorter than the original, and preserving the essential message of the original.
The power of visual abstracts can be helpful in many application contexts. Let us look at some examples. Multimedia archives. With the advent of multimedia PCs and workstations, the World Wide Web and standardized video compression techniques, more and more video material is being digitized and archived worldwide. Wherever digital video material is stored, we can use video abstracts for indexing and retrieval. For instance, the on-line abstracts could support journalists when searching old video material, or when producing documentaries. Another example is the Internet movie database IMDb on the Web (http://uk.imdb.com/). It is indexed on the basis of "hand -made" textual information about the movies; sometimes, a short clip, selected at random, is also included. The index could easily be extended by automatically generated video abstracts. Movie marketing. Trailers are widely used for movie advertising in cinemas and on television. Currently the production of this type of abstract is quite costly and time-consuming. With our system we could produce trailers automatically. In order to tailor a trailer to a specific audience, we would set certain parameters such as the desirable amount of action or of violence. Another possibility would be a digital TV magazine. Instead of reading short textual descriptions of upcoming programs you could view the abstracts with out even having to get up from your couch (supposing you have an integrated TV set and Web browser). And for digital video-on-demand systems the content provider could supply video abstracts in an integrated fashion. Home entertainment. If you miss an episode of your favorite television series the abstracting could perform the task of telling you briefly what happened "in the meantime". Many more innovative applications could be built around the basic video abstracting technique.
We have implemented a system called VAbstract which is able to automatically produce a trailer from a longer film (just like the trailers that can be viewed in cinema as adverts for coming movies).
VAbstract takes into account different target groups and produces a concise summary according to quality requirements which we have set up. VAbstract uses only unchanged material from the original movie. The reason is quite clear: we assume that in a video on demand archive, only the original movie is available and no further material from the movie production process. VAbstract is a full-automatic abstracting system which depends on parameters passed on to it.
There are two basically different abstracts from the picture stream that can be produced: still and moving pictures abstracts. A still pictures abstract is a collection of single, salient pictures from different places from the original. If one frame is extracted per scene, these frames are called keyframes as they identify a scene. A moving picture abstract consists of a collection of sequences of pictures from the original movie and is thus a proper movie itself. VAbstract is such a moving pictures abstracting system.
Before extracting scenes, scene limits are detected with a cut detection algorithm. Cuts have to be considered in video and audio in order not to include senseless audio pieces. Therefore, a scene limit is defined through an audio and a video cut.
The most meaningful scenes are extracted and recomposed to an abstract in their natural sequence. In order to accelerate the extraction procedure, the video is departed in 5 parts of equal length and the first appropriate scene for each algorithm is used (Feature films are departed in 6 parts and the last part is not used such that the end of the film is not revealed in the abstract).
The following partial algorithms are used to extract scenes for the abstract from each part:
At the end of this whole procedure, we get a list of scenes that is used for the synthesis of the abstract.
Several films have been abstracted by VAbstract including feature films and documentaries. A lot of valuable information was gained from these tests, especially about the values of thresholds for the partial algorithms. We were even able to compare an automatically produced abstract from VAbstract with a professionally produced one, which proofed the quality of VAbstract.
VAbstract is implemented in about 1500 lines of ANSI C using the Vista library V2.1.3. The audio modules and title addition are still missing. The movies were recorded from German television, digitized by a parallax video card and stored as a collection of single JPEG pictures. Two example application interfaces were built in about another 2500 lines of Tcl/Tk code on top of VAbstract: a user interface which helps a video library user in selecting a video and a "provider" interface which supports a video library provider in construction and administration of a video library.
Short Demo Video (MPEG-1: 31MB)
Other example videos:
Recorded from German TV (without sound, converted to mpeg):
Fred Feuerstein (The Flintstones): "Fred, der Aufsteiger"
(This film is used for scientific purposes only. We hope not to break any law by presenting the entire film in really bad quality and an automatically produced abstract of it.)
Large video on demand databases consisting of thousands of digital movies are not easy to handle: the user must have an attractive means to retrieve his movie of choice. For analog video, movie trailers are produced to allow a quick preview and perhaps stimulate possible buyers. The following literature presents techniques to automatically produce such movie abstracts of digital videos.