Santali, like all
Munda languages, is a suffixing
agglutinating language. It remains a subject of intense linguistic debate over whether Santali and related languages such as
Mundari and
Kherwarian lects have recognizable
parts of speech (verbs, nouns, adjectives,...). Traditional grammatical descriptions often treat lexemes that take cases in a syntactical unit as parts of the nominal system, and those that take TAM/Person/Number as verbal. However, deeper analyses by Neukom (2001),
Hengeveld & Rijkhoff (2005), Peterson (2005), Rau (2013) suggest that in fact Santali is a flexible language; that is, the lexemes are inherently underspecified for lexical category and can either function in referential ("noun"), predicative ("verb"), or attributive ("modifier") roles; while
Evans & Osada (2005) and
Croft (2005) argue that the Kherwarian languages do possess, but fluid, defined word classes. According to Neukom (2001), about one-third of all the Santali lexemes ("contentives") are rigid, un-derived verbs, which means they are syntactically restricted to the predicative function. The rest of the lexicon (nominals, proforms, adpositions, derived "nominals" etc) are purely contentive and syntactically flexible. Currently, the
Oxford Handbook of Word Classes (2023) rates Santali as a Type I Flexible language.
Nouns Nouns are inflected for number and case.
Number Three numbers are distinguished: singular, dual and plural.
Case The case suffix follows the number suffix. The following cases are distinguished:
Possession Santali has possessive suffixes which are only used with kinship terms: 1st person
-ɲ, 2nd person
-m, 3rd person
-t. The suffixes do not distinguish possessor number.
Definiteness To mark nominals as definite, Santali morphology uses suffixes
-tɛtˀ for nouns, and
-ʈakˀ for pronouns, respectively.
Gender and noun class True gender distinction marking on nominals and verbs (like in Sanskrit, Hindi, other
Indo-Aryan and
Dravidian languages) does not exist in Santali. Native peripheral markers such as the genitive, locative markers, and nominalizers can be used to distinguish between animate and inanimate noun classes. For lexicalized gender distinction, there are several ways to mark the contrast between female and male: - Morphologically-marked modifiers borrowed from Indo-Aryan such as
-i for feminine, and
-a for masculine are found in certain lexemes: • kuɽa ("boy") – kuɽi ("girl") • bhola ("dog") – bholi ("bitch") • mama ("maternal uncle") – məni ("maternal aunt") • caɖra ("bald man") – cəɖri ("bald woman") • bheɖa ("ram") – bheɖi ("sheep") - Sex-based gender lexemes. These words are inherently gendered and cannot be inflected for gender, unlike the words listed above. • dʒãwaj "husband" – bəhu "wife" • bɔeha "brother" – misɛra "sister" • ənɖiə "ox" – gəi "cow" • kaɖa "male buffalo" – bitkil "female buffalo" - Compounded sex-based gender. The head noun is compounded with a gender-denoting modifying word. Masculine compounds go with
ənɖiə,
sanɖi,
pɛ̄ʈhar,
kuɖu, and feminine objects go with
ɛŋga,
bətʃhi, and
pəʈhi. • ənɖiə pusi "male cat" – ɛŋga pusi "female cat" • sənɖi sim "rooster" – ɛŋga sim "hem" • pɛ̄ʈhar mihu "male calf" – bətʃhi mihu "do" FEM • kuɖu sukri "boar" – pəʈhi sukri "pig"
Pronouns The personal pronouns in Santali distinguish inclusive and exclusive first person and anaphoric and demonstrative third person. The interrogative pronouns have different forms for animate ('who?') and inanimate ('what?'), and referential ('which?') vs. non-referential. The indefinite pronouns are: The demonstratives distinguish three degrees of deixis (proximate, distal, remote) and simple ('this', 'that', etc.) and particular ('just this', 'just that') forms.
Numerals The basic cardinal numbers (transcribed into Latin script IPA) are: The numerals are used with
numeral classifiers. Distributive numerals are formed by reduplicating the first consonant and vowel, e.g. 'two each'. Numbers basically follow a
base-10 pattern. Numbers from 11 to 19 are formed by addition, ('10') followed by the single-digit number (1 through 9). Multiples of ten are formed by multiplication: the single-digit number (2 through 9) is followed by ('10'). Some numbers are part of a base-20 number system. 20 can be or . {{fs interlinear|lang=sat|indent=3
Adpositions Santali has a quite large number of postpositional words that can be added to either the bare nominals or to the number suffixes and the definitive marker. Some of them require the genitive case. There are complex forms that use combinations of a postposition and a case suffix.
Derivation To derive new nominals, the stems of lexical verbs, adjectives, and other nouns can employ many different methods, including
affixation,
reduplication, and
compounding.
Suffixation: Two nominalising suffixes
-itʃˀ for animate, and
-akˀ for inanimate noun class, are used to form referential nominals. Verbs → nouns:
jɔm ('eat') >
jɔmakˀ ('food') adjectives → nouns:
nɔtɛ ('this side') >
nɔtɛn ('belonging to this side') >
nɔtɛnakˀ ('thing of this side') /
nɔtɛnitʃˀ ('one of this side')
ponɖ ('white') >
ponɖakˀ ('white thing') /
ponɖitʃˀ ('white one') suffixes → nouns:
ɔl-tɛ (write-INS) >
ɔltɛakˀ ('that with which is written(pen)') An entire verbal construction can be nominalised:
Infixation is the most productive derivation method in Santali. Infixes
-tV-,
-nV-,
-mV-,
-ɽV-, and
-pV- are often inserted into nouns, verbs, adjectives to derive new words.
ɛhɔp ('begin') >
ɛtɔhɔp ('beginning')
rakap ('rise', 'ascend') >
ranakap ('development')
Prefixation in North Munda has been reduced to a very few restricted exceptions.
tʃɛt ('teach') >
matʃɛt ('teacher') Despite bearing noun-like semantics, the derived forms remain precategorial and can appear in other functions in probably seldom-attested contexts.
Verbs Verbs in Santali inflect for tense, aspect and mood, voice and the person and number of the subject and sometimes of the object. However, defining
parts of speech in traditional linguistic terms, such as "
verbs" and "
nouns" in
Jharkhandi
Munda languages more generally (including most
Kherwarian varieties and
Kharia) is a highly controversial issue, since the evidence for discrete lexical categories like nouns, verbs, and adjectives is often extremely weak or even virtually absent, at least in the basic lexical level. From this perspective, it may be nearly unfeasible to apply the conventional parts-of-speech framework to North Munda. A single element with apparently nominal semantics (may be metonymic in nature) may function as the predicate base in one sentence (typically in clause-final position), while appearing elsewhere as an argument in the identical form, with no phonological and morphological change. In fact, predicates and their complements may be primarily defined by syntactic configurations rather than by inherent lexical categories. For further theoretical and empirical discussions on word classes in Mundari, see , , , ; for Kharia, see . Similarly, Santali has been described as a language with a regular degree of lexical flexibility. posits that "nouns" don't exist in Santali, but instead there are "flexible lexemes" that can function either as arguments (=referential role) or as predicates within phrasal units, with no profound categorical distinction between these uses. In everyday speech, Santali flexibility may show even more idiosyncrasies than those documented for Mundari. provides attested examples showing that, within accepted usage, even proper names—cross-linguistically often treated as purely referential expressions denoting inherent properties may frequently occur as predicates in Santali without eliciting objections. For instance, the sentence
unkin-dɔ Kaɽa ar Guja-wa-kin-a 'Their names were Kara and Guja' (lit. "they were Kara-and-Guja-ed") uses the second proper name directly as an active applicative predicate, while the first name precedes the conjunctive element, producing a distributive interpretation of the predication. further notes that almost any type of lexeme—including nominals, interrogatives, and indefinites—can function predicatively, but when is combined with either a light verb copula (
kan "COP.IPFV" or
tahɛ̃kan "COP.IMPREF") or an applicative suffix
-a/-wa (often glossed as "for/to someone") plus the indicative/finite suffix. Together, these elements act as a compositional verbalising operator, yielding a structure that exhibits characteristics of a nominal sentence. For discussion on the flexibility of Southern Santali, see .
Santali TAMs The Santali TAM system is very complicated. In fact, categories of tense-aspects and voices always fuse into an interlocked system consisting of a series of verbal subtemplates, so it is impossible for analyses to single out a morpheme that marks a single TAM category accordingly. TAM paradigms interact with
active and
middle voice intricately: Active TAMs denote senses of UNMARKED, transitive, volitional, and outwardly directed, mostly employed in polyvalent predicates; Middle TAMs signify the status of intransitive, self-directed, and avolitional, mostly found in monovalent predicates. There are two subtemplates for the imperfective and perfective. Two recognisable tense categories are non-past and past, and the past is further divided into two tenses: anterior and
aorist. The
imperative/prohibitive do not have any markers but possess their own unique verbal templates.
Applicative TAMs Applicative voice in Santali is represented by adding the applicative marker
-a- to four tenses (Future, Imperfective, Past 1, Perfect) with an additional and rare Past 2 tense in the cases of inanimate objects. The active set serve polyvalent predicates, while the middle set mark for monovalent ones.
Subject markers Object markers Transitive verbs with pronominal objects take infixed object markers. In applicative constructions, inanimate objects are marked with a pronominal suffix, a checked
-kˀ.
Possessor argument indexing Transitive verbs may form agreements with non-arguments/outside/indirect objects. To denote inalienable possession of the concerned indirect object, prefix
-t- is attached to the applicative forms of the pronouns; otherwise it is marked in the noun phrase and functions as an attribute.
Dual person as honorific In specific contexts nowadays, Santali speakers have been increasingly using the pronominal duals to express
honorific in a generalised sense to show respect to the addressed interactants, such as senior, highly-regarded, or unfamiliar persons.
To be and to have Two verbs
mena ("to be") and
hena ("to have") have irregular templates. The subject pronominal marker, instead of being an enclitic form, appears as a suffix in the slot where the object marker normally would be placed. All constructions involving these two verbs are conjugated in the middle voice to express existence, possession, and location. Santali
mena seems to be stemmed out from a small number of originally middle, intransitive predicate bases that have an inversed pronominalized pattern. Some other inherently intransitive, low agency, and non-volitional verbs such as
rɛnɛtʃ ("be hungry") may display similar irregular behaviors like that of
mena. Below is the paradigm of non-negated, non-past, fully finite existential/locative copula
mena:
Semantics and pragmatics in Santali verb indexation In Santali as well as
Kherwarian languages, the pronominal subject markers are mobile
clitics that may encompass the whole clause. In most of the cases, except the stems
mena and
hena mentioned above, the pronominal subject clitics have two placements: (1) attach to the word preceding the verb stem, or, (2), enclitic to the final position of the verbal complex: (1) X=S Verb (2) X verb=S According to MacPhail (1957), (1) occurs more frequently than (2). In complicated predicates, where there are more than one lexeme constitutes the sentence, such as the glossed one below, the subject clitic follow the (2) indexation pattern, not the (1) as expected: The placement of the subject clitic can also distinguish the type of nominal sentences (sentences with copulae). In a
predicational sentence where the subject is
referential and the complement is non-referential, the host of the clitic is the subject. In an
equative sentence where both the subject and the complement are referential, the subject clitic is placed at the end of the sentence. Indexing arguments in Santali is essentially intertwined with the distinction of
animacy of arguments. Distinction between animate/inanimate is not marked on nouns at all, but is conveyed through morphosyntax, such as in genitive and locative
cases and verbal agreement. That is, if an argument of the verb does not belong to the animate noun class, the verb will not index that argument. Inanimate entities such as flower, tree, rice, book, food,... and objects that cannot move by themselves like vehicles (eg. motorbike, car, aeroplane) are never indexed by the verb. However, there are some notable exceptions of inanimate objects that are significant ('sun', 'moon', 'star') or culturally important ('doll') are considered animate in Santali: Likewise, 'Government' is also considered a single body of animate entities and is marked with third person singular. Even mushroom, thorn being pricked, puff-ball, earwax are perceived as animate and are indexed by pronominal markers as such, showing the unpredictability of the Santali animacy-based indexation system. In negative formations, the negation particle may show indexation of an inanimate subject, while other Kherwarian languages suppress it.
Imperative As described by , there are no specific markers for the imperative series. However, in the affirmative imperative, the indicative/finite marker
-a is replaced by second person markers. In the negative imperative, verb (TAM/person-syntagma) takes
-a while the imperative subject marker moves to the enclitic position behind the negative particle, right before the verb (See ##Negation).
Finiteness Any finite predicates will attach
-a, except the imperative and in the subordinate clause. This suffix also marks the predicate as indicative (real, default, narrative), while unmarked predicates can be interpreted as partially finite and non-finite, in which they can take the infinitive, converb, or a case marker.
Causative There are two causative markers:
a- and
-otʃo.
-otʃo is attached on every type of verb stems, and
a- is restricted to two transitive verbs
jɔm ('eat') and
ɲu ('drink').
Permissive While both the causative and the permissive share the same suffix
-otʃo, the permissive is different as an applicative marker is combined with the causative morpheme, resulting in the shift of the concerned person from the accusative to the dative position.
Reciprocal Infix
-pV- turns transitive and ditransitive verb roots into reciprocal meaning, but in many verbs it also conveys that the action is done together by two participants.
dal ('beat') >
dapal ('beat each other')
landa ('laugh') >
lapanda ('laugh together')
Benefactive The benefactive for transitive and ditransitive stems is
-ka in Northern Santali dialect and
-ka-k in Southern Santali. In Southern Santali, if the object is animate, the last
-k will be replaced by pronominal clitics. All benefactive stems are conjugated with active TAM markers.
tɔl ('bind') >
tɔlka ('to bind for somebody')
Medio-passive Transitive verbs and a limited number of intransitive and intransitive-transitive verb roots will take
-jɔn to form the Medio-passive voice.
Passive and Reflexive Transitive roots, transitive-intransitive roots, and causative stems will take
-ok to derive passive stems. In the transitive-intransitive roots, it denotes the prominence of transitivity. Attaching it to transitive verbs will create reflexivity.
ɲɛl ('see') >
ɲɛlok ('be seen') (passive)
ranotʃo ('cause to medicate') >
ranotʃok ('be caused to medicate') (causative > passive)
mak ('cut') >
makok ('cut oneself') (reflexive) The intransitive applicative TAM set is also interpreted as expressing reflexivity and used to emphasise the action directed toward the subject themselves.
Noun incorporation Noun incorporation is not a feature of Santali.
Nominal "verbalisation" In daily speeches, nominal roots can be found functioning as verbs with appropriate inflection. The verbalisation of nominals extends to interrogatives and indefinites. Adjectives that are derived from nominals can take inflection as well as person indexation, too. It is said that virtually every entity-denoting lexeme is capable of functioning the predicative role in Santali. (1) "medicine" (2) "king" (3) "orphan" (4) Pronoun In the (1) example, the "verbalized" predicate structure of the lexeme
ɔdʒɔn bears the identical semantics as of the free lexeme itself, with an additional applicative (
to give DATIVE) sense. The (2) sentence with middle TAM suffix also shows compositional semantics, producing an inchoative meaning
to become X (X here is entity/state/property-denoting semantics). The (3) sentence exemplifies an active TAM suffixed predicate using a "noun-like" lexeme
ʈuər ("orphan") as the semantic base, which brings up a subtle shift to causative theme
to make X/make someone be X, but the semantics is still mostly uniform (orphan–motherless). Similar "verbalization/recategorization" via
zero derivation like these can occur in English (eg. gun–gunned "get shot by gunfire", ice–iced "become ice", empty–emptied "become empty, make something empty",...). However, English has both idiosyncratic verbalization (unpredictable semantic outcome) and compositional verbalization (predictable semantic outcome), while in Santali it displays extreme regularity and predictability as they have direct semantic correspondence with their nominal counterparts and very little idiosyncrasies. (5) "big" (6) "kind" (7) superlative comparison The existence of an independent adjective class in Santali is invalidated by sentences (5), (6), (7), since these adjective-like lexemes can occur in predicate position, take TAM/Person/Number and semantically/syntactically behave like the aforementioned examples (1), (2), (3). Further more, mimetic sounds, such as
ãã (animal groan) (8), complex units, such as the postpositional phrase
kombɽo tuluj "with thieves" (9), and even proper names (10) can function as the semantic heads of the predicates. These examples below provide a compelling argument against analyzing the flexibility as a lexical derivational process by . This perspective on "verbalization" support the implication that rather than a linguistic anomaly, flexibility is in fact the nature of the language itself. (8) mimetic sound (9) phrase (10) proper name In the cases of proper names, when an active applicative suffix is applied, it expresses that
x is caused to be the individual named N, which translates into
being called N. In nonpast active form, the construction describes the (temporal) property of
being the individual named N to the subject.
Serial verb constructions Two or more verbs and modifiers can combine together to derive a compound verb. Normally they are combinations of two transitive verbs or two intransitive verbs and limited numbers of transitive+intransitive and intransitive+transitive combinations.
Auxiliary verb constructions Complex predicates are pervasive in Munda clause structure. Simple verbs like go, become, finish, come, try,... are often employed as auxiliary verbs (v2 in South Asian linguistics) to add or embolden modality, aktionsart, and orientations to the predicates. In Santali, there are univerbated auxiliary constructions to mark many functions. One example show below, the verb
gɔt ("pluck") is often used as auxiliary verb to denote
telicity, that is, a quick, sudden, or intense action. Santali AVCs exhibit split-doubled pattern: the lexical verb may index the object argument, and the auxiliary verb may index the subject argument. Some auxiliary constructions may exhibit behaviours of compound verbs. Two most common used auxiliary verbs in Santali are
daɽe ("can") and
lega ("try"). The first one is often combined with an active applicative suffix, while the latter mostly found with the middle TAMs.
Negation There are three particles in Santali used to express negation:
baŋ,
ɔhɔ and
alo.
baŋ and
ba (shortened form) are the negatives for interrogative and declarative sentences;
ɔhɔ is the emphatic negative of declarative sentences;
alo is the prohibitive negative in the imperative. These negative particles will take away the subject marker from the verb. In existential/locative copular formations, negation is different in present tense and past tense. Below is the chart of negative, non-past, fully finite existential/locative copula paradigm: In negative past copular constructions, the negative particle
ban encodes the subject, and the past tense is indicated by the separate copula
taheken.
Expressives Expressives arguably can be justified as an independent lexical category in Santali.
Echo-word formation can be constructed by three processes: (1) generating masdar in an identical form; (2) augmenting a consonant in the repeated element; (3) vowel mutation. Sometimes masdars co-occur with vowel mutation simultaneously. Expressives can express highly detailed semantics depicting complex sound symbolisms, emotions, attitudes, sensory imageries, et cetera, and are not constrained by syntactic rules. (1) masdars. These expressives are formed by simply reduplicating the first element. • ahal ahal "distressed" • atrɔm atrɔm "incompletely" • baɖgɔˀt baɖgɔˀt "rough" • datʃaŋ datʃaŋ "ubiquitous" • halaˀt halaˀt "slightly" • kãˀtʃ kãˀtʃ "whine as a dog" • adʒaˀk adʒaˀk "clamour for" • baɖgaˀk baɖgaˀk "sharp painful sensation" • tʃəɖuˀk tʃəɖuˀk "noise of pumping into water" • gab gab "sink deeply" • dʒeleˀp dʒeleˀp "flashing" • məkur məkur "sound of crunching" (2) (∅VX CVX) masdars with augmenting a consonant • əbuˀk tʃəbuˀk "here and there" • abɛ tabɛ "just at the time of" • adha padha "unfinished" • əɖəi bəɖəi "arrogant" • albaʈ salbaʈ "contradictory" (3) (∅V1X CV2X) with vowel mutation • adha padhə "half" • agaɽ bigəɽ "topsy turvy" • əhir kuhir "fix the eyes upon" • ə̃iʈhə̃ dʒithə̃ "leavings of food" • əril kuril "stare as smoke nips the eyes" (4) CV1X CV2X with vowel mutation • batʃaˀk botʃoˀk "nonsensical" • badha bidhi "occult adverse influence" • bhaɽ bhuɽ "crashing noise" • baɖaˀk buɖuˀk "move the lips as if speaking" • tʃaʈa tʃuʈu "crackle" (5) V1CV1C V2CV2C with vowel mutation • adaˀtʃ uduˀtʃ "unwieldly through corpulence" • agar ogor "dumby" • araˀk oroˀk "stare vacantly" • asam usum "leisurely" (6) V1CV1 V1CV2 (V1 is invariably a and V2 is i) • ãʈa aʈi "dispute" • adra ədri "be ill-humoured" • ahka əhki "painting" • andka əndki "a strong smell" • ankha ənkhi "disgusting" • aɽsa əɽsi "plead an excuse" The initial and medial consonants of the first element may be alternated in masdars. • kadar kapar "rubbish" • hadraˀk gasraˀk "stumbingly" The Santals categorize expressives as a form of "twisted speech" (
benta katha), a discourse mode characterized by profound metaphorical depth. These items occupy a central role in Santali daily communication and cultural life. Expressives are especially high prevalent within performance traditions—including music, storytelling, folktales, and poetry—with an extensive presence in the oral genres of performances. == Syntax ==