The search criterion used by each search engine depends on its nature and purpose of the searches.
Metadata Metadata is information about facts. It could be information about who is the author of the video, creation date, duration, and all the information that could be extracted and included in the same files. Internet is often used in a language called XML to encode metadata, which works very well through the web and is readable by people. Thus, through this information contained in these files is the easiest way to find data of interest to us. In the videos there are two types of metadata, that we can integrate in the video code itself and external metadata from the page where the video is. In both cases we optimize them to make them ideal when indexed.
Internal metadata All video formats incorporate their own metadata. The title, description, coding quality or transcription of the content are possible. To review these data exist programs like FLV MetaData Injector, Sorenson Squeeze or Castfire. Each one has some utilities and special specifications. Converting from one format to another can lose much of this data, so check that the new format information is correct. It is therefore advisable to have the video in multiple formats, so all search robots will be able to find and index it.
External metadata In most cases the same mechanisms must be applied as in the positioning of an image or text content.
Title and description They are the most important factors when positioning a video, because they contain most of the necessary information. The titles have to be clearly descriptive and should remove every word or phrase that is not useful.
Filename It should be descriptive, including keywords that describe the video with no need to see their title or description. Ideally, separate the words by dashes "-".
Tags On the page where the video is, it should be a list of keywords linked to the microformat "rel-tag". These words will be used by search engines as a basis for organizing information.
Transcription and subtitles Although not completely standard, there are two formats that store information in a temporal component that is specified, one for subtitles and another for transcripts, which can also be used for subtitles. The formats are SRT or SUB for subtitles and TTXT for transcripts.
Speech recognition Speech recognition consists of a transcript of the speech of the audio track of the videos, creating a text file. In this way and with the help of a phrase extractor can easily search if the video content is of interest. Some search engines apart from using speech recognition to search for videos, also use it to find the specific point of a multimedia file in which a specific word or phrase is located and so go directly to this point. Gaudi (Google Audio Indexing), a project developed by
Google Labs, uses voice recognition technology to locate the exact moment that one or more words have been spoken within an audio, allowing the user to go directly to exact moment that the words were spoken. If the search query matches some videos from YouTube, the positions are indicated by yellow markers, and must pass the mouse over to read the transcribed text.
Speaker recognition In addition to transcription, analysis can detect different speakers and sometime attribute the speech to an identified name for the speaker.
Text recognition The text recognition can be very useful to recognize characters in the videos through "chyrons". As with speech recognizers, there are search engines that allow (through character recognition) to play a video from a particular point. TalkMiner, an example of search of specific fragments from videos by text recognition, analyzes each video once per second looking for identifier signs of a slide, such as its shape and static nature, captures the image of the slide and uses
Optical Character Recognition (OCR) to detect the words on the slides. Then, these words are indexed in the
search engine of TalkMiner, which currently offers to users more than 20,000 videos from institutions such as Stanford University, the University of California at Berkeley, and TED.
Frame analysis Through the
visual descriptors we can analyze the frames of a video and extract information that can be scored as metadata. Descriptions are generated automatically and can describe different aspects of the frames, such as color, texture, shape, motion, and the situation.
Chaptering The video analysis can lead to automatic chaptering, using technics such as change of camera angle, identification of audio jingles. By knowing the typical structure of a video document, it is possible to identify starting and ending credits, content parts and beginning and ending of advertising breaks. == Ranking criterion ==