Automated Similarity Judgment Program

The Automated Similarity Judgment Program (ASJP) is a collaborative project applying computational approaches to comparative linguistics using a database of word lists. The database is open access and consists of 40-item basic-vocabulary lists for well over half of the world's languages. It is continuously being expanded. In addition to isolates and languages of demonstrated genealogical groups, the database includes pidgins, creoles, mixed languages, and constructed languages. Words of the database are transcribed into a simplified standard orthography (ASJPcode). The database has been used to estimate dates at which language families have diverged into daughter languages by a method related to but still different from glottochronology, to determine the homeland (Urheimat) of a proto-language, to investigate sound symbolism, to evaluate different phylogenetic methods, and several other purposes.

History

Original goals ASJP was originally developed as a means for objectively evaluating the similarity of words with the same meaning from different languages, with the ultimate goal of classifying languages computationally, based on the lexical similarities observed. In the first ASJP paper So subsequently word lists gathered contain only 40 items (or less, when attestations for some are lacking). Levenshtein distance In papers published since 2008, ASJP has employed a similarity judgment program based on Levenshtein distance (LD). This approach was found to produce better classificatory results measured against expert opinion than the method used initially. LD is defined as the minimum number of successive changes necessary to convert one word into another, where each change is the insertion, deletion, or substitution of a symbol. Within the Levenshtein approach, differences in word length can be corrected for by dividing LD by the number of symbols of the longer of the two compared words. This produces normalized LD (LDN). An LDN divided (LDND) between the two languages is calculated by dividing the average LDN for all the word pairs involving the same meaning by the average LDN for all the word pairs involving different meanings. This second normalization is intended to correct for chance similarity. == Word list ==

Word list

The ASJP uses the following 40-word list. It is similar to the Swadesh–Yakhontov list, but has some differences. ;Body parts • eye • ear • nose • tongue • tooth • hand • knee • blood • bone • breast (woman's) • liver • skin ;Animals and plants • louse • dog • fish (noun) • horn (animal part) • tree • leaf ;People • person • name (noun) ;Nature • sun • star • water • fire • stone • path • mountain • night (dark time) ;Verbs and adjectives • drink (verb) • die • see • hear • come • new • full ;Numerals and pronouns • one • two • I • you • we ==ASJPcode==

ASJPcode

ASJP version from 2016 uses the following symbols to encode phonemes: p b f v m w 8 t d s z c n r l S Z C j T 5 y k g x N q X h 7 L 4 G ! i e E 3 a u o They represent 7 vowels and 34 consonants, all found on the standard QWERTY keyboard. A mark follows two consonants so that they are considered to be in the same position. Thus, becomes . Syllables like , , and are considered lexically similar to . Similarly, a mark follows three consonants so that they are considered to be in the same position. is considered similar to , and . marks the preceding consonant as glottalized. == See also ==

Source: Wikipedia ↗

tickerdossier.com tickerdossier.substack.com