Different linguists analyze the Japanese inventory of consonant phonemes in significantly different ways. recognizes only 12 underlying consonants (/m p b n t d s dz r k ɡ h/), whereas recognizes 16, equivalent to Smith's 12 plus the following 4 (/j w ts ɴ/), and recognizes 21, equivalent to Smith's 12 plus the following 9 (/j w ts tɕ (d)ʑ ɕ ɸ N Q/). Consonants inside parentheses in the table can be analyzed as
allophones of other phonemes, at least in native words. In loanwords, sometimes occur phonemically. In some analyses, the glides/semivowels are not interpreted as consonant phonemes. In non-loanword vocabulary, they generally occur only in the sequences and , which are sometimes analyzed as
rising diphthongs rather than as consonant-vowel sequences. analyzes the glides as non-syllabic variants of the high vowel phonemes , arguing that the use of vs. may be predictable if both phonological and morphological context is taken into account.
Phonetic notes Details of articulation • are variously described as
lamino-alveolar (),
apico-alveolar () or apico-
dental (), or simply dental or
denti-alveolar. • are lamino-alveolar . • are lamino(
dorso)-alveolopalatal . The affricates are sometimes transcribed broadly as (standing for prepalatal ). The palatalized allophone of before or is also lamino-alveolopalatal or prepalatal, and so can be transcribed as , or more broadly as . reports its place of articulation as dentoalveolar or alveolar. • is traditionally described as a
velar approximant or
labialized velar approximant or something between the two, or as the
semivocalic equivalent of with little to no rounding, while a 2020
real-time MRI study found it is better described as a
bilabial approximant . • is before and , and before , coarticulated with the labial compression of that vowel. When not preceded by a pause, it often may be breathy-voiced rather than voiceless . • Realization of the liquid phoneme varies greatly depending on environment and dialect. The prototypical and most common pronunciation is an
apical tap, either alveolar or postalveolar . Utterance-initially and after , the tap is typically articulated in such a way that the tip of the tongue is at first momentarily in light contact with the alveolar ridge before being released rapidly by airflow. This sound is described variably as a tap, a "variant of ", "a kind of weak plosive", and "an affricate with short friction, ". The apical
alveolar or postalveolar lateral approximant is a common variant in all conditions, particularly utterance-initially and before . According to , utterance-initially and intervocalically (that is, except after ), the lateral variant is better described as a
tap rather than an approximant. The
retroflex lateral approximant is also found before . In Tokyo's Shitamachi dialect, the
alveolar trill is a variant marked with vulgarity. Other reported variants include the
alveolar approximant , the
alveolar stop , the
retroflex flap , the
lateral fricative , and the
retroflex stop .
Voice onset time At the start of a word, the voiceless stops are slightly
aspirated—less so than English stops, but more than those in Spanish. Word-medial seem to be unaspirated on average. Phonetic studies in the 1980s observed an effect of accent as well as word position, with longer
voice onset time (greater aspiration) in accented syllables than in unaccented syllables. A 2019 study of young adult speakers found that after a pause, word-initial may be pronounced as plosives with zero or low positive voice onset time (categorizable as voiceless unaspirated or "short-lag" plosives); while significantly less aspirated on average than word-initial , some overlap in voice onset time was observed. A secondary cue to the distinction between and in word-initial position is a pitch offset on the following vowel: vowels after word-initial (but not word-medial) start out with a higher pitch compared to vowels after , even when the latter are phonetically devoiced. Word-medial are normally fully
voiced (or prevoiced), but may become non-plosives through lenition.
Lenition The phonemes have
weakened non-plosive pronunciations that can be broadly transcribed as voiced fricatives , although they may be realized instead as voiced
approximants . There is no context where the non-plosive pronunciations are consistently used, but they occur most often between vowels: : These weakened pronunciations can occur after a vowel in the middle of a word, or when a word starting with follows a vowel-final word with no intervening pause. found that, as with the pronunciation of as vs. , the use of plosive vs. non-plosive realizations of is closely correlated with the time available to a speaker to articulate the consonant, which is affected by speech rate as well as the identity of the preceding sound. All three show a high (over 90%) rate of plosive pronunciations after or after a pause; after , plosive pronunciations occur at high (over 80%) rates for and , but less frequently for , probably because word-medial after is often pronounced instead as a
velar nasal (although the use of here may be declining for younger speakers). Across contexts, generally has a higher rate of plosive realizations than and .
Moraic consonants Certain consonant sounds are called "moraic" because they count for a
mora, a unit of timing or prosodic length. The phonemic analysis of moraic consonants is disputed. One approach, particularly popular among Japanese scholars, analyzes moraic consonants as the phonetic realization of special "mora phonemes" (): a mora nasal , called the
hatsuon, and a mora obstruent consonant , called the
sokuon. The pronunciation of these sounds varies depending on context: because of this, they may be analyzed as "placeless" phonemes with no phonologically specified
place of articulation. A competing approach rejects the transcriptions and and the identification of moraic consonants as their own phonemes, treating them instead as the syllable-final realizations of other consonant phonemes (although some analysts prefer to avoid using the concept of syllables when discussing Japanese phonology).
Moraic nasal The moraic nasal or mora nasal (
hiragana ,
katakana , romanized as or ) can be interpreted as a syllable-final nasal consonant. Aside from
certain marginal exceptions, it is found only after a vowel, which is phonetically
nasalized in this context. It can be followed by a consonant, a vowel, or the end of a word: : Its pronunciation varies depending on the sound that follows it (including across a word boundary). • Before a plosive, affricate, nasal, or liquid, it is pronounced as a nasal consonant
assimilated to the place of the following consonant: : • Before a vowel, approximant , or voiceless fricative , it is a
nasalized vowel or moraic semivowel that can be broadly transcribed as (its specific quality depends on the surrounding sounds). This pronunciation may also occur before the voiced fricatives , although more often, they are pronounced as affricates when preceded by the moraic nasal. At the end of an utterance, the moraic nasal is pronounced as a nasal segment with a variable place of articulation and variable degree of constriction. Its pronunciation in this position is traditionally described and transcribed as uvular , sometimes with the qualification that it is, or approaches, velar after front vowels. Some descriptions state that it may have incomplete occlusion and can potentially be realized as a nasalized vowel, as in intervocalic position. Instrumental studies in the 2010s showed that there is considerable variability in its pronunciation and that it often involves a lip closure or constriction. A study of
real-time MRI data collected between 2017 and 2019 found that the pronunciation of the moraic nasal in utterance-final position most often involves vocal tract closure with a tongue position that can range from uvular to alveolar: it is assimilated to the position of the preceding vowel (for example, uvular realizations were observed only after the back vowels ), but the range of overlap observed between similar vowel pairs suggests this assimilation is not a categorical allophonic rule, but a gradient phonetic process. 5% of the utterance-final samples of the moraic nasal were realized as nasalized vowels with no closure: in this case, appreciable tongue raising was observed only when the preceding vowel was . There are a variety of competing phonemic analyses of the moraic nasal. It may be transcribed with the non-
IPA symbol and analyzed as a "placeless" nasal. Some analysts do not categorize it as a phonological consonant. Alternatively, it may be analyzed as a
uvular nasal , based on the traditional description of its pronunciation before a pause. It is sometimes analyzed as a syllable-final allophone of the coronal nasal consonant , but this requires treating syllable or mora boundaries as potentially distinctive, because there is a clear contrast in pronunciation between the moraic nasal and non-moraic before a vowel or before : : Alternatively, in an analysis that treats syllabification as distinctive, the moraic nasal can be interpreted as an
archiphoneme (a contextual neutralization of otherwise contrastive phonemes), since there is no contrast in syllable-final position between and . Thus, depending on the analysis, a word like , pronounced phonetically as , could be phonemically transcribed as , , or .
Moraic obstruent There is a contrast between short (or singleton) and long (or
geminate) consonant sounds. Compared to singleton consonants, geminate consonants have greater phonetic duration (realized for plosives and affricates in the form of a longer
hold phase before the release of the consonant, and for fricatives in the form of a longer period of frication). A geminate can be analyzed phonologically as a syllable-final consonant followed by a syllable-initial consonant (although the hypothesized syllable boundary is not evident at the phonetic level) and can be transcribed phonetically as two occurrences of the same consonant phone in sequence: a geminate plosive or affricate is pronounced with just one release, so the first portion of such a geminate may be transcribed as an
unreleased stop. As discussed above, geminate nasal consonants are normally analyzed as sequences of a
moraic nasal followed by a non-moraic nasal, e.g. , = , . In the case of non-nasal consonants,
gemination is mostly restricted by Japanese phonotactics to the voiceless
obstruents /p t k s/ and their allophones. (However, other consonant phonemes can appear as geminates in special contexts, such as in loanwords.) Geminate consonants can also be phonetically transcribed with a length mark, as in , but this notation obscures mora boundaries. uses the length marker to mark a moraic nasal, as , based on the fact that a moraic consonant by itself has the same prosodic weight as a consonant-vowel sequence: consequently, Vance transcribes Japanese geminates with two length markers, e.g. , , and refers to them as "extra-long" consonants, on the grounds that there is no acoustic boundary between two halves of a geminate. In the following transcriptions, geminates will be phonetically transcribed as two occurrences of the same consonant across a syllable boundary, the first being unreleased. : A common phonemic analysis treats all geminate obstruents as sequences starting with the same consonant: a "mora obstruent", called the in Japanese, which can be phonemically transcribed with the non-IPA character . According to this analysis, , , are phonemically , , . This analysis seems to be supported by the intuition of native speakers and matches the use in kana spelling of a single symbol, a small version of the
tsu sign (
hiragana ,
katakana ) to write the first half of any geminate obstruent. Some analyses treat as an underlyingly placeless consonant. Another approach dispenses with and treats geminate consonants as double consonant phonemes, that is, as sequences consisting of a consonant phoneme followed by itself. According to this analysis, , , are phonemically , , . Alternatively, since the contrast between different obstruent consonants such as , , is neutralized in syllable-final position, the first half of a geminate obstruent can be interpreted as an
archiphoneme (just as the moraic nasal can be interpreted as an archiphoneme representing the neutralization of the contrast between the nasal consonants , in syllable-final position). It has been suggested that the underlying phonemic representation of the
sokuon might be a
glottal stop . The sound is used in certain marginal forms that can be interpreted as containing not followed by another obstruent. For example, can be found
at the end of an exclamation, or before a sonorant in forms with
emphatic gemination, and is used as a written representation of in these contexts. This suggests that Japanese speakers identify as the default form of , or the form it takes when it is not possible for it to share its place and manner of articulation with a following obstruent. According to this analysis, , , are phonemically , , . Even if it can be phonemically analyzed as , the
sokuon is not always phonetically glottal. A study by used a video recording system and observed no glottal constriction during the pronunciation of Japanese geminate consonants. These results stand in conflict with the impressionistic descriptions of some authors, such as , who ascribes glottal tension to the first half of geminate consonants. An acoustic study by reported some evidence of
creaky voice being more frequent for vowels following geminate consonants in Japanese (although only one of three measures of creakiness showed a significant difference). concludes that the role of glottal tension in Japanese geminates requires further research.
Voiced affricate vs. fricative The distinction between the voiced fricatives (originally allophones of ) and the voiced affricates (originally allophones of ) is
neutralized in Standard Japanese and in most (although not all) regional Japanese dialects. (Some dialects, e.g.
Tosa, retain the distinctions between and and between and , while others distinguish only and but not and . Yet others merge all four, e.g. north
Tōhoku.) argues that the difference between and may be marginally contrastive for some speakers, whereas denies that are ever distinguished in pronunciation from in adapted forms, regardless of whether the spellings and are used in writing. The sequence (as opposed to either or ) also has some marginal use in loanwords. An example is . In many cases a variant adaptation with exists.
Alternations involving Aside from arguments based on loanword phonology, there is also disagreement about the phonemic analysis of native Japanese forms. Some verbs can be analyzed as having an underlying stem that ends in either or ; these become or respectively before inflectional suffixes that start with : In addition, notes that in casual speech, or in verb forms may undergo coalescence with a following (marking the conditional), forming and respectively, as in for 'if (I) lend' and for 'if (I) win.' On the other hand, per , (more narrowly, ) can occur instead of for some speakers in contracted speech forms, such as for 'saying', for 'if one waits', and for 'if one speaks'; Vance notes these could be dismissed as non-phonemic rapid speech variants. argues that alternations in verb forms do not prove is phonemically , citing (with ) vs. , , , etc. as evidence that a stem-final consonant is not always maintained without phonemic change throughout a verb's conjugated forms, and ~ '(must not) read' as evidence that palatalization produced by vowel coalescence can result in alternation between different consonant phonemes.
Competing phonemic analyses There are several alternatives to the interpretation of as allophones of before or . Some interpretations agree with the analysis of as an allophone of and as an allophone of (or ), but treat as the palatalized allophone of a
voiceless coronal affricate phoneme (to clarify that it is analyzed as a single phoneme, some linguists phonemically transcribe this affricate as or with the non-IPA symbol ). In this sort of analysis, = . Other interpretations treat as their own phonemes, while treating other palatalized consonants as allophones or clusters. The status of as phonemes rather than clusters ending in is argued to be supported by the stable use of the sequences in loanwords; in contrast, is somewhat unstable (it may be variably replaced with or ), and other consonant + sequences such as , are generally absent. (Aside from loanwords, also occur marginally in native vocabulary in certain exclamatory forms.) It has alternatively been suggested that pairs like vs. could be analyzed as vs. . objects to analyses like on the basis that the sequence is otherwise forbidden in Japanese phonology.
Voiceless bilabial fricative In core vocabulary, the
voiceless bilabial fricative occurs only before . In this context, can be analyzed as an allophone of . Examples include () and (), which can be phonemically transcribed as , . Some descriptions of Japanese phonetics state that the initial sound of is not consistently produced as , but is sometimes a sound with weak or no bilabial friction that could be transcribed as (a voiceless approximant similar to the start of English "who"). In loanwords, can occur before other vowels or before . Examples include (), (), (), (), and (). Because of loanwords like these, the consonant is distinguished from before , as in the minimal pair () and () from English
fork and
hawk; likewise, is distinguished from before . Even in loanwords, is not distinguished from before : for example, English
hood and
food are both adopted as Japanese (). The integration of , , , and into contemporary spoken Standard Japanese seems to have been completed at some point after the middle of the twentieth century, in the post-war period: before then, these sequences of sounds seem to have been commonly used only in educated pronunciation. Loanwords borrowed more recently than around 1890 fairly consistently show as an adaptation of foreign . Some older borrowed forms show adaptation of foreign to Japanese before a vowel other than , such as and . Another old adaptation pattern replaced foreign with before a vowel other than , e.g.
film > . Both of these replacement strategies are largely obsolete nowadays, although certain old adapted forms continue to be used, sometimes with specialized meanings compared to a variant pronunciation: for example, tends to be restricted in modern use to photographic films, whereas is used for other senses of "film" such as movie films.
Voiced bilabial fricative Spellings with the kana have been used in narrow
transcriptions into Japanese, in an attempt to render a
voiced labiodental fricative, , in other languages, which most Japanese speakers find difficult. The actual pronunciation of a foreign "
v sound" is normally not distinguished from a Japanese : for example, there is no meaningful phonological or phonetic difference in pronunciation between and , or between and considers an attempt at rendering to be a "foreignism," in other words, if an innovative Japanese speaker tries to pronounce it, they are treating it as part of a foreign word, rather than of a word that is fully integrated into Japanese lexicon. According to and , the foreign is realized in Japanese as a
voiced bilabial fricative, , which already exists as an allophone of in the Yamato and Sino-Japanese strata, although it "seems to be much less fricative than the corresponding
Castillan Spanish sound in
lobo for instance". Thus, can be phonetically transcribed as . Irwin is non-committal on the phonemic status of . suggests a different realization, a "
voiced labiodental spirant," thus , which is questioned by and rejected by . Depending on the source language, a foreign "
v sound" can alternatively be rendered (in Hepburn romanization) as
b,
v or
w.
Velar nasal onset For some speakers, the
velar nasal can occur as an onset in place of the
voiced velar plosive in certain conditions. Onset , called , is generally restricted to word-internal position, where it may occur either after a vowel (as in ) or after a moraic nasal (as in ). It is debated whether onset constitutes a separate phoneme or an allophone of . They are written the same way in kana, and native speakers have the intuition that the two sounds belong to the same phoneme. Speakers can be divided in three groups based on the extent to which they use in contexts where is not required: some consistently use , some never use , and some show variable use of versus (or ). Speakers who consistently use are a minority. The distribution of versus for these speakers mostly follows predictable rules (as described below): however, a number of complications and exceptions exist, and as a result, some linguists analyze as a distinct phoneme for consistent nasal speakers. The contrast has very low functional load, but it is possible to find or construct some pairs of words that are segmentally identical aside from the use of versus for consistent nasal speakers, such as () versus (). Another commonly cited pair is versus , although aside from the segmental difference in the consonant, these are prosodically distinct: the first is normally pronounced as two accent phrases, , whereas the second is pronounced as a single accent phrase (either or ).
Distribution of vs. At the start of an independent word, all speakers use in almost all circumstances. However, postpositional particles, such as the subject marker , are pronounced with by consistent nasal speakers. In addition, a few words may be pronounced with even when they occur at the start of an utterance: examples include the conjunction and the word . In the middle of a native morpheme, consistent nasal speakers always use . But in the middle of foreign-stratum morphemes, may be used even by consistent nasal speakers. It is also possible for foreign morphemes to be pronounced with medial : there is considerable variability, but this may be more common in older borrowings (such as , from Portuguese ) or in borrowings that contained in the source language (such as , from Portuguese ). At the start of a morpheme in the middle of a word, either or may be possible, depending on the word. Only is possible after the honorific prefix (as in ) or at the start of a reduplicated mimetic morpheme (as in ). Consistent nasal speakers typically use at the start of the second morpheme of a bimorphemic Sino-Japanese word, or at the start of a morpheme that has undergone
rendaku (that is, one that begins with when pronounced as an independent word). In cases where the second morpheme in a compound starts with when used independently, the compound might be pronounced with either or by consistent nasal speakers: factors such as the lexical stratum of the morpheme may play a role, but it seems difficult to establish precise rules predicting which pronunciation occurs in this context, and the pronunciation of some words varies even among consistent nasal speakers, such as . The morpheme , is pronounced with when it is used as part of a compound numeral, as in (accented as ), although can potentially be pronounced as when it occurs non-initially in certain proper nouns or lexicalized compound words, such as (a male given name), (the name of a
festival for children aged seven, five or three), or (a night of the full moon). To summarize: :
Sociolinguistics of The frequency of onset in Tokyo Japanese speech was falling as of 2008, and seems to have already been on the decline in 1940. Pronunciations with are generally less frequent for younger speakers, and even though the use of was traditionally prescribed as a feature of standard Japanese, pronunciations with seem in practice to have acquired a more prestigious status, as shown by studies that find higher rates of usage when speakers read words from a list. The frequency of also varies by region: it is rare in the southwestern
Kansai dialects, but more common in the northeastern
Tohoku dialects, with an intermediate frequency in the
Kanto dialects (which includes the Tokyo dialect). ==Vowels==