Schema matching

The terms schema matching and mapping are often used interchangeably for a database process. For this article, we differentiate the two as follows: schema matching is the process of identifying that two objects are semantically related while mapping refers to the transformations between the objects. For example, in the two schemas DB1.Student and DB2.Grad-Student ; possible matches would be: DB1.Student ≈ DB2.Grad-Student; DB1.SSN = DB2.ID etc. and possible transformations or mappings would be: DB1.Marks to DB2.Grades.

Impediments

Among others, common challenges to automating matching and mapping have been previously classified in especially for relational DB schemas; and in – a fairly comprehensive list of heterogeneity not limited to the relational model recognizing schematic vs semantic differences/heterogeneity. Most of these heterogeneities exist because schemas use different representations or definitions to represent the same information (schema conflicts); OR different expressions, units, and precision result in conflicting representations of the same data (data conflicts). • Syntactic heterogeneity – differences in the language used for representing the elements • Structural heterogeneity – differences in the types, structures of the elements • Model / Representational heterogeneity – differences in the underlying models (database, ontologies) or their representations (key-value pairs, relational, document, XML, JSON, triples, graph, RDF, OWL) • Semantic heterogeneity – where the same real world entity is represented using different terms or vice versa ==Schema matching==

Schema matching

Methodology Discusses a generic methodology for the task of schema integration or the activities involved. Hybrid matchers directly combine several matching approaches to determine match candidates based on multiple criteria or information sources. Most of these techniques also employ additional information such as dictionaries, thesauri, and user-provided match or mismatch information Reusing matching information Another initiative has been to re-use previous matching information as auxiliary information for future matching tasks. The motivation for this work is that structures or substructures often repeat, for example in schemas in the E-commerce domain. Such a reuse of previous matches however needs to be a careful choice. It is possible that such a reuse makes sense only for some part of a new schema or only in some domains. For example, Salary and Income may be considered identical in a payroll application but not in a tax reporting application. There are several open ended challenges in such reuse that deserves further work. Sample Prototypes Typically, the implementation of such matching techniques can be classified as being either rule based or learner based systems. The complementary nature of these different approaches has instigated a number of applications using a combination of techniques depending on the nature of the domain or application under consideration. Several state of the art matching tools today are capable of identifying many such simple (1:1 / 1:n / n:1 element level matches) and complex matches (n:1 / n:m element or structure level matches) between objects. Evaluation of quality The quality of schema matching is commonly measured by precision and recall. While precision measures the number of correctly matched pairs out of all pairs that were matched, recall measures how many of the actual pairs have been matched. ==See also==

Source: Wikipedia ↗

tickerdossier.com tickerdossier.substack.com