Today, many video-on-demand databases containing thousands of films have been built. They contain a lot of data with very little information on the data. It is the goal of the MoCA-project (Movie content analysis) to extract the information hidden in the video and audio to enable the user to select specifically the movies he wants by automatic content analysis, a process which includes video, audio and text processing. A first step to approach that ambitious goal is MoCA genre recognition.
Our approach to the genre recognition task is to obtain a variety of statistical data from the films. In step one we gather raw statistics. These include frame to frame pixel differences, image homogeneity, hue, saturation and motion vectors in the video domain. In the audio domain, loudness and frequency statistics are obtained. In step two we derive information from the statistics of step 1. We calculate camera motion, cuts, fades and dissolves in the video domain. In the audio domain we calculate the amount of silence and human voice per scene and the overall loudness. In a last step we compare the data obtained in steps 1 and 2 to genre profiles using a chi square test. We identify a film to be part of a specific genre if the data is very similar to one of these profiles.