The affiliation of Bai is obscured by over two millennia of influence from
varieties of Chinese, leaving most of its lexicon related to Chinese
etyma of various periods. To determine its origin, researchers must first identify and remove from consideration the various layers of
loanwords and then examine the residue. In his survey of the field, Wang (2006) notes that early work was hampered by a lack of data on Bai and uncertainties in the reconstruction of early forms of Chinese. Recent authors have suggested that Bai is an early offshoot from Chinese, a sister language to Chinese, or more distantly related (though usually still
Sino-Tibetan). There are different tonal correspondences in the various layers. Many words can be identified as later Chinese loans because they display Chinese
sound changes from the last two millennia: •
labiodental fricatives, which developed from earlier labial stops in certain environments. •
palatal affricates from earlier
velar stops in palatal environments. •
aspirated stops from earlier voiced stops in words having the
Middle Chinese level tone. • the initial , which developed from
Old Chinese *r-. Some of these changes date back to the first centuries AD. The oldest layer of Bai vocabulary with Chinese cognates, of which Wang lists some 250 words, includes common Bai words that were also common in
Classical Chinese, but are not used in modern
varieties of Chinese. Its features have been compared with current ideas on
Old Chinese phonology: • The voiceless nasals and lateral postulated for Old Chinese are absent, though in some cases the reflexes match those in western dialects of Han Chinese, rather than those of eastern dialects from which
Middle Chinese and most modern varieties are descended. • Where Middle Chinese has
l-, thought to be a reflex of Old Chinese *r, Bai varieties have before , before a nasal final, and elsewhere. However, in words where Middle Chinese
l- corresponds to in inland
Min dialects, Bai often has a stop initial, providing support for Baxter and Sagart's suggestion that such initials derive from clusters. • Old Chinese *l- generally has similar palatal and dental reflexes in Bai and Middle Chinese, but seems to be preserved in a few Bai words. • The Old Chinese finals *-aw and *-u merged in Middle Chinese syllables without a palatal medial by the 4th century AD, but are still distinguished in Bai. • Several words with Old Chinese *-ts, which developed to
-j with the departing tone in Middle Chinese, produce tonal reflexes in Bai corresponding to an original stop coda.
Sergei Starostin suggests that these facts indicate a split from mainstream Chinese around the 2nd century BC, corresponding to the
Western Han period. Wang argues that a few of the correspondences between his reconstructed Proto-Bai and Old Chinese cannot be explained by the Old Chinese forms, and that Chinese and Bai therefore form a Sino-Bai group. However, Gong suggests that at least some of these cases can be accounted for by refining the Proto-Bai reconstruction to take account of
complementary distribution within Bai. Starostin and
Zhengzhang Shangfang have separately argued that the oldest Chinese layer accounts for all but an insignificant residue of Bai vocabulary, and that Bai is therefore an early branching from Chinese. On the other hand, Lee and Sagart (1998) argued that the various layers of Chinese vocabulary are loans, and that when they are removed, a significant non-Chinese residue remains, including 15 entries from the 100-word
Swadesh list of basic vocabulary. They suggest that this residue shows similarities with
Proto-Loloish.
James Matisoff (2001) argued that the comparison with Loloish is less persuasive when considering other Bai varieties than the Jianchuan dialect used by Lee and Sagart, and that it is safer to consider Bai as an independent branch of Sino-Tibetan, though perhaps close to the neighbouring Loloish. Lee and Sagart (2008) refined their analysis, presenting the residue as a non-Chinese form of Sino-Tibetan, though not necessarily Loloish. They also note that this residue includes the Bai vocabulary relating to pig rearing and rice agriculture. Lee and Sagart's analysis has been further discussed by List (2009). Gong (2015) suggests that the residual layer may be
Qiangic, pointing out that the Bai, like the Qiang, call themselves "white", whereas the
Lolo use "black". ==Phonology==