Kismet's social intelligence software system, or
synthetic nervous system (SNS), was designed with human models of intelligent behavior in mind. It contains six subsystems as follows.
Low-level feature extraction system This system processes raw visual and auditory information from cameras and microphones. Kismet's
vision system can perform eye detection, motion detection and, albeit controversial, skin-color detection. Whenever Kismet moves its head, it momentarily disables its motion detection system to avoid detecting self-motion. It also uses its stereo cameras to estimate the distance of an object in its visual field, for example to detect threats—large, close objects with a lot of movement. Kismet's
audio system is mainly tuned towards identifying affect in
infant-directed speech. In particular, it can detect five different types of affective speech: approval, prohibition, attention, comfort, and neutral. The affective intent classifier was created as follows. Low-level features such as pitch mean and energy (volume) variance were extracted from samples of recorded speech. The classes of affective intent were then modeled as a
gaussian mixture model and trained with these samples using the
expectation-maximization algorithm. Classification is done with multiple stages, first classifying an utterance into one of two general groups (e.g. soothing/neutral vs. prohibition/attention/approval) and then doing more detailed classification. This architecture significantly improved performance for hard-to-distinguish classes, like
approval ("You're a clever robot") versus
attention ("Hey Kismet, over here"). At any given moment, Kismet can only be in one emotional state at a time. However, Breazeal states that Kismet is not conscious, so it does not have feelings.
Motor system Kismet speaks a proto-language with a variety of phonemes, similar to a baby's babbling. It uses the
DECtalk voice synthesizer, and changes pitch, timing, articulation, etc. to express various emotions. Intonation is used to vary between question and statement-like utterances. Lip synchronization was important for realism, and the developers used a strategy from animation: "simplicity is the secret to successful lip animation." Thus, they did not try to imitate lip motions perfectly, but instead "create a visual shorthand that passes unchallenged by the viewer." ==See also==