Overview The search engine consists of three main components: •
An agent is a search robot. It bypasses the network, downloads and analyzes documents. If a new link is found during site analysis, it falls into the list of web addresses of the robot. Search robots are of the following types:
spiders - download sites like the user's browsers; Crawler - discover new, still unknown links based on the analysis of already known documents;
indexers - analyze the detected web pages and add data to the
index. Many deflated documents are divided into disjoint parts and are cleared from the markup. •
Index is a database compiled by
search engine indexing robots. Documents are searched in the index. •
Search engine. The search request from the user is sent to the least loaded server after analyzing the load of the search system. To provide such an opportunity, Yandex servers are clustered. Then, the user request is processed by a program called "Metapoisk".
html,
pdf,
rtf,
doc,
xls,
ppt,
docx,
odt, odp, ods, odg,
xlsx,
pptx. The search engine is also able to index text inside Shockwave Flash objects (if the text is not placed on the image itself), if these elements are transferred as a separate page, which has the MIME type application/x-shockwave-flash, and files with the extension .swf Yandex has 2 scanning robots - the “main” and the “fast”. The first is responsible for the whole Internet, the second indexes sites with frequently changing and updating information (news sites and news agencies). In 2010, the “fast” robot received a new technology called “Orange”, developed jointly by the California and Moscow divisions of Yandex. • Mozilla/5.0 (compatible; YandexBot/3.0) is the main indexing bot. • Mozilla/5.0 (compatible; YandexBot/3.0; MirrorDetector) - a bot that detects site mirrors. If there are several sites with the same content, only one will be shown in the search results. • Mozilla/5.0 (compatible; YandexImages/3.0) - Yandex image indexer • Mozilla/5.0 (compatible; YandexVideo/3.0) - Yandex video indexer • Mozilla/5.0 (compatible; YandexMedia/3.0) - multimedia data indexer • Mozilla/5.0 (compatible; YandexBlogs/0.99; robot) is a search bot that indexes post comments. • Mozilla/5.0 (compatible; YandexAddurl/2.0) - is a search bot that indexes pages through the "Add
URL" form. • Mozilla/5.0 (compatible; YandexDirect/2.0; Dyatel) - checks Yandex Direct • Mozilla/5.0 (compatible; YandexMetrika/2.0) - Yandex Metrics indexer • Mozilla/5.0 (compatible; YandexCatalog/3.0; Dyatel) - checks Yandex Catalog • Mozilla/5.0 (compatible; YandexNews/3.0) - Yandex News indexer • Mozilla/5.0 (compatible; YandexAntivirus/2.0) - Yandex anti-virus bot
Query language The following operators are used for setting: • "" - exact quote • | - enter between words, if you need to find one of them • * - enter between words, if some word is missing • site: - search on a specific site • date: - search for documents by date, for example, date: 2007 • + - enter before the word, that should be in the document
Search results Yandex, automatically, along with the original “exact form” of the query, searches for its various variations and formulations. The Yandex search takes into account the morphology of the Russian language, therefore, regardless of the form of the word in the search query, the search will be performed for all word forms. If
morphological analysis is undesirable, you can put an exclamation mark (!) Before the word — the search in this case will show only the specific form of the word. In addition, the search query practically does not take into account the so-called
stop-words, that is,
prepositions,
punctuation, pronouns, etc., due to their wide distribution Relevance is determined based on a ranking formula, which is constantly updated based on machine learning algorithms. The search is performed in
Russian,
English,
French,
German,
Ukrainian,
Belarusian,
Tatar,
Kazakh. Search results can be sorted by relevance and by date (buttons below the search results). The page with the search results consists of 10 links with short annotations - “snippets”. The snippets includes a text comment, link, address, popular sections of the site, pages on social networks, etc. As an alternative to snippets, Yandex introduced in 2014 a new interface called “Islands”. Yandex implements the “parallel searches” mechanism, when together with a web search, a search is performed on Yandex services, such as Catalog, News, Market, Encyclopedias, Images, etc. As a result, in response to a user's request, the system shows not only textual information, but also links to video files, pictures, dictionary entries, etc. A distinctive feature of the search engine is also the technology of "intent search" that mean a search for solving a problem. Intent search elements are - dialog prompts in case of ambiguous request, automatic text translation, information about the characteristics of the requested car, etc. For example, when you request “
Boris Grebenshchikov - Golden City”, the system will show a form for online listening to music from the
Yandex Music service, at the request of "st. Koroleva 12 " will be shown a fragment of the
map with the marked object on it.
Promotion of misinformation and propaganda Search results from the Yandex search engine tend to favor Russian media sources, including state media, and Yandex-delivered ads tend to promote misinformation and propaganda produced by more than half a dozen Russian-language news sites. One study found that Yandex-delivered adverts ran alongside false stories about US bioweapons labs in Ukraine, claims that Ukrainian President Volodymyr Zelenskiy is a drug user, and reports repeating Kremlin claims that the war against Ukraine is going entirely to plan. Other fake news promoted by Yandex ads referred to the Russian invasion by using Kremlin talking points, calling the war an “operation to denazify and demilitarise Ukraine”. By 2016, Yandex had slipped down to third with Google being first. Checking web pages and warning users appeared on Yandex in 2009: since then, on the search results page, next to a dangerous site there is a note “This site may threaten the security of your computer”. Two technologies at once are used to detect threats. The first was purchased from the American antivirus
Sophos and based on a signature approach: that means, when accessing a web page, the
antivirus system also accesses a
database of already known viruses and
malware. This approach is fast, but practically powerless against new viruses that have not yet entered the database. Therefore, Yandex along with the signature also uses its own antivirus complex, based on an analysis of the behavioral factor. The Yandex program, when accessing the site, checks whether the latter requested additional files from the browser, redirected it to an extraneous resource, etc. Thus, if information is received that the site begins to perform certain actions (cascading style sheets,
JavaScript modules are launched and complete programs) without user permission, it is placed in the “black list” and in the database of virus signatures. Information about the infection of the site appears in the search results, and through the Yandex.Webmaster service the owner of the site receives a notification. After the first check, Yandex does the second, and if the infection information is confirmed a second time, the checks will be more frequent until the threat is eliminated. The total number of infected sites in the Yandex database does not exceed 1%. Approximately one billion sites are checked monthly.
Search hints As the user types the query in the search bar, the search engine offers hints in the form of a drop-down list. Hints appear even before the search results appears and allow you to refine the query, correct the layout or typo, or go directly to the site you are looking for. For each user, hints are generated based on the history of their search queries using the My Finds service. In 2012, the so-called “Smart Search Hints” appeared, which instantly give out information about the main constants (equator length, speed of light, and so on), traffic jams, and have a built-in calculator. In addition, a translator was integrated in the “Hints” (the query “love in French” instantly gives out
amour, affection), the schedule and results of football matches, exchange rates, weather forecasts and more. You can find out the exact time by asking "what time is it." In 2011, Hints in the search for Yandex became completely local to 83 regions of Russia. In addition to the actual search, Hints are built into Yandex search engines. Dictionaries ”,“
Yandex. Market ”,“
Yandex. Maps "and other Yandex services. The hint function is a consequence of the development of the technology of intent search and first appeared on Yandex.Bar in August 2007, and in October 2008 it was introduced on the main page of the search engine. Available both in the desktop and mobile versions of the site, Yandex shows its users more than a billion search hints per day == History ==