Rough set

The following section contains an overview of the basic framework of rough set theory, as originally proposed by Zdzisław I. Pawlak, along with some of the key definitions. More formal properties and boundaries of rough sets can be found in and cited references. The initial and basic theory of rough sets is sometimes referred to as "Pawlak Rough Sets" or "classical rough sets", as a means to distinguish it from more recent extensions and generalizations. Information system framework Let I = (\mathbb{U},\mathbb{A}) be an information system (attribute–value system), where \mathbb{U} is a non-empty, finite set of objects (the universe) and \mathbb{A} is a non-empty, finite set of attributes such that I:\mathbb{U} \rightarrow V_a for every a \in \mathbb{A}. V_a is the set of values that attribute a may take. The information table assigns a value a(x) from V_a to each attribute a and object x in the universe \mathbb{U}. With any P \subseteq \mathbb{A} there is an associated equivalence relation \mathrm{IND}(P): : \mathrm{IND}(P) = \left\{(x,y) \in \mathbb{U}^2 \mid \forall a \in P, a(x)=a(y)\right\} The relation \mathrm{IND}(P) is called a P-indiscernibility relation. The partition of \mathbb{U} is a family of all equivalence classes of \mathrm{IND}(P) and is denoted by \mathbb{U}/\mathrm{IND}(P) (or \mathbb{U}/P). If (x,y)\in \mathrm{IND}(P), then x and y are indiscernible (or indistinguishable) by attributes from P . The equivalence classes of the P-indiscernibility relation are denoted [x]_P. Example: equivalence-class structure For example, consider the following information table: : When the full set of attributes P = \{P_{1},P_{2},P_{3},P_{4},P_{5}\} is considered, we see that we have the following seven equivalence classes: : \begin{cases} \{O_{1},O_{2}\} \\ \{O_{3},O_{7},O_{10}\} \\ \{O_{4}\} \\ \{O_{5}\} \\ \{O_{6}\} \\ \{O_{8}\} \\ \{O_{9}\} \end{cases} Thus, the two objects within the first equivalence class, \{O_{1},O_{2}\}, cannot be distinguished from each other based on the available attributes, and the three objects within the second equivalence class, \{O_{3},O_{7},O_{10}\}, cannot be distinguished from one another based on the available attributes. The remaining five objects are each discernible from all other objects. It is apparent that different attribute subset selections will in general lead to different indiscernibility classes. For example, if attribute P =\{ P_{1}\} alone is selected, we obtain the following, much coarser, equivalence-class structure: : \begin{cases} \{O_{1},O_{2}\} \\ \{O_{3},O_{5},O_{7},O_{9},O_{10}\} \\ \{O_{4},O_{6},O_{8}\} \end{cases} Definition of a rough set Let X \subseteq \mathbb{U} be a target set that we wish to represent using attribute subset P; that is, we are told that an arbitrary set of objects X comprises a single class, and we wish to express this class (i.e., this subset) using the equivalence classes induced by attribute subset P. In general, X cannot be expressed exactly, because the set may include and exclude objects which are indistinguishable on the basis of attributes P. For example, consider the target set X = \{O_{1},O_{2},O_{3},O_{4}\}, and let attribute subset P = \{P_{1}, P_{2}, P_{3}, P_{4}, P_{5}\}, the full available set of features. The set X cannot be expressed exactly, because in [x]_P,, objects \{O_{3}, O_{7}, O_{10}\} are indiscernible. Thus, there is no way to represent any set X which includes O_{3} but excludes objects O_{7} and O_{10}. However, the target set X can be approximated using only the information contained within P by constructing the P-lower and P-upper approximations of X: : {\underline P}X= \{x \mid [x]_P \subseteq X\} : {\overline P}X = \{x \mid [x]_P \cap X \neq \emptyset \} Lower approximation and positive region The P-lower approximation, or positive region, is the union of all equivalence classes in [x]_P which are contained by (i.e., are subsets of) the target set – in the example, {\underline P}X = \{O_{1}, O_{2}\} \cup \{O_{4}\}, the union of the two equivalence classes in [x]_P which are contained in the target set. The lower approximation is the complete set of objects in \mathbb{U}/P that can be positively (i.e., unambiguously) classified as belonging to target set X. Upper approximation and negative region The P-upper approximation is the union of all equivalence classes in [x]_P which have non-empty intersection with the target set – in the example, {\overline P}X = \{O_{1}, O_{2}\} \cup \{O_{4}\} \cup \{O_{3}, O_{7}, O_{10}\}, the union of the three equivalence classes in [x]_P that have non-empty intersection with the target set. The upper approximation is the complete set of objects that in \mathbb{U}/P that cannot be positively (i.e., unambiguously) classified as belonging to the complement (\overline X) of the target set X. In other words, the upper approximation is the complete set of objects that are possibly members of the target set X. The set \mathbb{U}-{\overline P}X therefore represents the negative region, containing the set of objects that can be definitely ruled out as members of the target set. Boundary region The boundary region, given by set difference {\overline P}X - {\underline P}X, consists of those objects that can neither be ruled in nor ruled out as members of the target set X. In summary, the lower approximation of a target set is a conservative approximation consisting of only those objects which can positively be identified as members of the set. (These objects have no indiscernible "clones" which are excluded by the target set.) The upper approximation is a liberal approximation which includes all objects that might be members of target set. (Some objects in the upper approximation may not be members of the target set.) From the perspective of \mathbb{U}/P, the lower approximation contains objects that are members of the target set with certainty (probability = 1), while the upper approximation contains objects that are members of the target set with non-zero probability (probability > 0). The rough set The tuple \langle{\underline P}X,{\overline P}X\rangle composed of the lower and upper approximation is called a rough set; thus, a rough set is composed of two crisp sets, one representing a lower boundary of the target set X, and the other representing an upper boundary of the target set X. The accuracy of the rough-set representation of the set X can be given Unlike other methods, as those given above, classical rough set analysis requires no additional information, external parameters, models, functions, grades or subjective interpretations to determine set membership – instead it only uses the information presented within the given data. More recent adaptations of rough set theory, such as dominance-based, decision-theoretic and fuzzy rough sets, have introduced more subjectivity to the analysis. Definability In general, the upper and lower approximations are not equal; in such cases, we say that target set X is undefinable or roughly definable on attribute set P. When the upper and lower approximations are equal (i.e., the boundary is empty), {\overline P}X = {\underline P}X, then the target set X is definable on attribute set P. We can distinguish the following special cases of undefinability: • Set X is internally undefinable if {\underline P}X = \emptyset and {\overline P}X \neq \mathbb{U}. This means that on attribute set P, there are no objects which we can be certain belong to target set X, but there are objects which we can definitively exclude from set X. • Set X is externally undefinable if {\underline P}X \neq \emptyset and {\overline P}X = \mathbb{U}. This means that on attribute set P, there are objects which we can be certain belong to target set X, but there are no objects which we can definitively exclude from set X. • Set X is totally undefinable if {\underline P}X = \emptyset and {\overline P}X = \mathbb{U}. This means that on attribute set P, there are no objects which we can be certain belong to target set X, and there are no objects which we can definitively exclude from set X. Thus, on attribute set P, we cannot decide whether any object is, or is not, a member of X. Reduct and core An interesting question is whether there are attributes in the information system (attribute–value table) which are more important to the knowledge represented in the equivalence class structure than other attributes. Often, we wonder whether there is a subset of attributes which can, by itself, fully characterize the knowledge in the database; such an attribute set is called a reduct. Formally, a reduct is a subset of attributes \mathrm{RED} \subseteq P such that • [x]_{\mathrm{RED}} = [x]_P, that is, the equivalence classes induced by the reduced attribute set \mathrm{RED} are the same as the equivalence class structure induced by the full attribute set P. • the attribute set \mathrm{RED} is minimal, in the sense that [x]_{(\mathrm{RED}-\{a\})} \neq [x]_P for any attribute a \in \mathrm{RED}; in other words, no attribute can be removed from set \mathrm{RED} without changing the equivalence classes [x]_P. A reduct can be thought of as a sufficient set of features – sufficient, that is, to represent the category structure. In the example table above, attribute set \{P_3,P_4,P_5\} is a reduct – the information system projected on just these attributes possesses the same equivalence class structure as that expressed by the full attribute set: : \begin{cases} \{O_{1},O_{2}\} \\ \{O_{3},O_{7},O_{10}\} \\ \{O_{4}\} \\ \{O_{5}\} \\ \{O_{6}\} \\ \{O_{8}\} \\ \{O_{9}\} \end{cases} Attribute set \{P_3,P_4,P_5\} is a reduct because eliminating any of these attributes causes a collapse of the equivalence-class structure, with the result that [x]_{\mathrm{RED}} \neq [x]_P. The reduct of an information system is not unique: there may be many subsets of attributes which preserve the equivalence-class structure (i.e., the knowledge) expressed in the information system. In the example information system above, another reduct is \{P_1,P_2,P_5\}, producing the same equivalence-class structure as [x]_P. The set of attributes which is common to all reducts is called the core: the core is the set of attributes which is possessed by every reduct, and therefore consists of attributes which cannot be removed from the information system without causing collapse of the equivalence-class structure. The core may be thought of as the set of necessary attributes – necessary, that is, for the category structure to be represented. In the example, the only such attribute is \{P_5\}; any one of the other attributes can be removed singly without damaging the equivalence-class structure, and hence these are all dispensable. However, removing \{P_5\} by itself does change the equivalence-class structure, and thus \{P_5\} is the indispensable attribute of this information system, and hence the core. It is possible for the core to be empty, which means that there is no indispensable attribute: any single attribute in such an information system can be deleted without altering the equivalence-class structure. In such cases, there is no essential or necessary attribute which is required for the class structure to be represented. Attribute dependency One of the most important aspects of database analysis or data acquisition is the discovery of attribute dependencies; that is, we wish to discover which variables are strongly related to which other variables. Generally, it is these strong relationships that will warrant further investigation, and that will ultimately be of use in predictive modeling. In rough set theory, the notion of dependency is defined very simply. Let us take two (disjoint) sets of attributes, set P and set Q, and inquire what degree of dependency obtains between them. Each attribute set induces an (indiscernibility) equivalence class structure, the equivalence classes induced by P given by [x]_P, and the equivalence classes induced by Q given by [x]_Q. Let [x]_Q = \{Q_1, Q_2, Q_3, \dots, Q_N \}, where Q_i is a given equivalence class from the equivalence-class structure induced by attribute set Q. Then, the dependency of attribute set Q on attribute set P, \gamma_{P}(Q), is given by : \gamma_{P}(Q) = \frac{\sum_{i=1}^N \left | {\underline P}Q_i \right |} {\left | \mathbb{U} \right |} \leq 1 That is, for each equivalence class Q_i in [x]_Q, we add up the size of its lower approximation by the attributes in P, i.e., {\underline P}Q_i. This approximation (as above, for arbitrary set X) is the number of objects which on attribute set P can be positively identified as belonging to target set Q_i. Added across all equivalence classes in [x]_Q, the numerator above represents the total number of objects which – based on attribute set P – can be positively categorized according to the classification induced by attributes Q. The dependency ratio therefore expresses the proportion (within the entire universe) of such classifiable objects. The dependency \gamma_{P}(Q) "can be interpreted as a proportion of such objects in the information system for which it suffices to know the values of attributes in P to determine the values of attributes in Q". Another, intuitive, way to consider dependency is to take the partition induced by Q as the target class C, and consider P as the attribute set we wish to use in order to "re-construct" the target class C. If P can completely reconstruct C, then Q depends totally upon P; if P results in a poor and perhaps a random reconstruction of C, then Q does not depend upon P at all. Thus, this measure of dependency expresses the degree of functional (i.e., deterministic) dependency of attribute set Q on attribute set P; it is not symmetric. The relationship of this notion of attribute dependency to more traditional information-theoretic (i.e., entropic) notions of attribute dependence has been discussed in a number of sources, e.g. Pawlak, Wong, & Ziarko (1988), Yao & Yao (2002), Wong, Ziarko, & Ye (1986), and Quafafou & Boussouf (2000). ==Rule extraction==