Semantic compression is basically achieved in two steps, using
frequency dictionaries and
semantic network: • determining cumulated term frequencies to identify target lexicon, • replacing less frequent terms with their hypernyms (
generalization) from target lexicon. Step 1 requires assembling word frequencies and information on semantic relationships, specifically
hyponymy. Moving upwards in word hierarchy, a cumulative concept frequency is calculating by adding a sum of hyponyms' frequencies to frequency of their hypernym: cum f(k_{i}) = f(k_{i}) + \sum_{j} cum f(k_{j}) where k_{i} is a hypernym of k_{j}. Then a desired number of words with top cumulated frequencies are chosen to build a target lexicon. In the second step, compression mapping rules are defined for the remaining words in order to handle every occurrence of a less frequent hyponym as its hypernym in output text. ;Example The below fragment of text has been processed by the semantic compression. Words in bold have been replaced by their hypernyms. They are both
nest building
social insects, but
paper wasps and honey
bees organize their
colonies in very different
ways. In a new study, researchers report that despite their
differences, these insects
rely on the same network of genes to guide their
social behavior.The study appears in the Proceedings of the
Royal Society B: Biological Sciences. Honey
bees and
paper wasps are separated by more than 100 million years of
evolution, and there are
striking differences in how they divvy up the work of
maintaining a
colony. The procedure outputs the following text: They are both
facility building
insect, but
insects and honey
insects arrange their
biological groups in very different
structure. In a new study, researchers report that despite their
difference of opinions, these insects
act the same network of genes to
steer their
party demeanor. The study appears in the proceeding of the
institution bacteria Biological Sciences. Honey
insects and
insect are separated by more than hundred million years of
organic processes, and there are
impinging differences of opinions in how they divvy up the work of
affirming a
biological group. ==Implicit semantic compression==