In
computing, raw data may have the following attributes: it may possibly contain human, machine, or instrument errors, it may not be validated; it might be in different area (
colloquial) formats;
uncoded or unformatted; or some entries might be "suspect" (e.g.,
outliers), requiring
confirmation or
citation. For example, a data input sheet might contain dates as raw data in many forms: "31st January 1999", "31/01/1999", "31/1/99", "31 Jan", or "today". Once captured, this raw data may be
processed stored as a normalized format, perhaps a
Julian date, to make it easier for computers and humans to interpret during later processing. Raw data (sometimes colloquially called "sources" data or "eggy" data, the latter a reference to the data being "uncooked", that is, "unprocessed", like a raw
egg) are the data input to processing. A distinction is made between
data and
information, to the effect that information is the
end product of
data processing. Raw data that has undergone processing are sometimes referred to as "cooked" data in a colloquial sense. Although raw data has the potential to be transformed into "
information," extraction, organization, analysis, and formatting for presentation are required before raw data can be transformed into usable information. For example, a
point-of-sale terminal (POS terminal, a computerized
cash register) in a busy supermarket collects huge volumes of raw data each day about customers' purchases. However, this list of grocery items and their prices and the time and date of purchase does not yield much information until it is processed. Once processed and analyzed by a
software program or even by a researcher using a pen and paper and a
calculator, this raw data may indicate the particular items that each customer buys, when they buy them, and at what price; as well, an analyst or manager could calculate the average total sales per customer or the average expenditure per day of the week by hour. This processed and analyzed data provides information for the manager, that the manager could then use to help her determine, for example, how many cashiers to hire and at what times. Such
information could then become
data for further processing, for example as part of a predictive
marketing campaign. As a result of processing, raw data sometimes ends up being put in a
database, which enables the raw data to become accessible for further processing and analysis in any number of different ways.
Tim Berners-Lee (inventor of the
World Wide Web) argues that sharing raw data is important for society. Inspired by a post by
Rufus Pollock of the
Open Knowledge Foundation his call to action is "Raw Data Now" , meaning that everyone should demand that governments and businesses share the data they collect as raw data. He points out that "data drives a huge amount of what happens in our lives… because somebody takes the data and does something with it." To Berners-Lee, it is essentially from this sharing of raw data, that advances in science will emerge. Advocates of
open data argue that once citizens and civil society organizations have access to data from businesses and governments, it will enable citizens and NGOs to do their
own analysis of the data, which can empower people and civil society. For example, a government may claim that its policies are reducing the
unemployment rate, but a
poverty advocacy group may be able to have its staff
econometricians do their own analysis of the raw data, which may lead this group to draw different conclusions about the data set. == Critiques of raw data ==