The concept of data warehousing dates back to the late 1980s when IBM researchers Barry Devlin and Paul Murphy developed the "business data warehouse". In essence, the data warehousing concept was intended to provide an architectural model for the flow of data from operational systems to
decision support environments. The concept attempted to address the various problems associated with this flow, mainly the high costs associated with it. In the absence of a data warehousing architecture, an enormous amount of redundancy was required to support multiple decision support environments. In larger corporations, it was typical for multiple decision support environments to operate independently. Though each environment served different users, they often required much of the same stored data. The process of gathering, cleaning and integrating data from various sources, usually from long-term existing operational systems (usually referred to as
legacy systems), was typically in part replicated for each environment. Moreover, the operational systems were frequently reexamined as new decision support requirements emerged. Often new requirements necessitated gathering, cleaning and integrating new data from "
data marts" that was tailored for ready access by users. Additionally, with the publication of The IRM Imperative (Wiley & Sons, 1991) by James M. Kerr, the idea of managing and putting a dollar value on an organization's data resources and then reporting that value as an asset on a balance sheet became popular. In the book, Kerr described a way to populate subject-area databases from data derived from transaction-driven systems to create a storage area where summary data could be further leveraged to inform executive decision-making. This concept served to promote further thinking of how a data warehouse could be developed and managed in a practical way within any enterprise. Key developments in early years of data warehousing: • 1960s –
General Mills and
Dartmouth College, in a joint research project, develop the terms
dimensions and
facts. • 1970s –
ACNielsen and IRI provide dimensional data marts for retail sales. • 1975 –
Sperry Univac introduces
MAPPER (
maintain, prepare, and produce executive reports), a database management and reporting system that includes the world's first
4GL. It is the first platform designed for building information centers (a forerunner of contemporary data warehouse technology). • 1983 –
Teradata introduces the
DBC/1012 database computer specifically designed for decision support. • 1984 –
Metaphor Computer Systems, founded by
David Liddle and Don Massaro, releases a hardware/software package and GUI for business users to create a database management and analytic system. • 1988 – Barry Devlin and Paul Murphy publish the article "An architecture for a business and information system" where they introduce the term "business data warehouse". • 1990 – Red Brick Systems, founded by
Ralph Kimball, introduces Red Brick Warehouse, a database management system specifically for data warehousing. • 1991 – James M. Kerr authors "The IRM Imperative", which suggests data resources could be reported as an asset on a balance sheet, furthering commercial interest in the establishment of data warehouses. • 1991 – Prism Solutions, founded by
Bill Inmon, introduces Prism Warehouse Manager, software for developing a data warehouse. • 1992 –
Bill Inmon publishes the book
Building the Data Warehouse. • 1995 – The Data Warehousing Institute, a for-profit organization that promotes data warehousing, is founded. • 1996 –
Ralph Kimball publishes the book
The Data Warehouse Toolkit. • 1998 – Focal modeling is implemented as an ensemble (hybrid) data warehouse modeling approach, with Patrik Lager as one of the main drivers. • 2000 –
Dan Linstedt releases in the public domain the
data vault modeling, conceived in 1990 as an alternative to Inmon and Kimball to provide long-term historical storage of data coming in from multiple operational systems, with emphasis on tracing, auditing and resilience to change of the source data model. • 2008 –
Bill Inmon, along with Derek Strauss and Genia Neushloss, publishes "DW 2.0: The Architecture for the Next Generation of Data Warehousing", explaining his top-down approach to data warehousing and coining the term, data-warehousing 2.0. • 2008 –
Anchor modeling was formalized in a paper presented at the International Conference on Conceptual Modeling, and won the best paper award • 2012 –
Bill Inmon develops and makes public technology known as "textual disambiguation". Textual disambiguation applies context to raw text and reformats the raw text and context into a standard data base format. Once raw text is passed through textual disambiguation, it can easily and efficiently be accessed and analyzed by standard business intelligence technology. Textual disambiguation is accomplished through the execution of textual ETL. Textual disambiguation is useful wherever raw text is found, such as in documents, Hadoop, email, and so forth. • 2013 – Data vault 2.0 was released, having some minor changes to the modeling method, as well as integration with best practices from other methodologies, architectures and implementations including agile and CMMI principles ==Data organization==