The "wrangler" non-technical term is often said to derive from work done by the
United States Library of Congress's
National Digital Information Infrastructure and Preservation Program (NDIIPP) and their program partner the
Emory University Libraries based MetaArchive Partnership. The term "mung" has roots in
munging as described in the
Jargon File. The term "data wrangler" was also suggested as the best analogy to describe someone working with data. One of the first mentions of data wrangling in a scientific context was by Donald Cline during the NASA/NOAA Cold Lands Processes Experiment. Cline stated the data wranglers "coordinate the acquisition of the entire collection of the experiment data." Cline also specifies duties typically handled by a
storage administrator for working with large amounts of
data. This can occur in areas like major
research projects and the making of
films with a large amount of complex
computer-generated imagery. In research, this involves both
data transfer from research instrument to storage grid or storage facility as well as data manipulation for re-analysis via high-performance computing instruments or access via cyberinfrastructure-based
digital libraries. With the upcoming of artificial intelligence in
data science it has become increasingly important for automation of data wrangling to have very strict checks and balances, which is why the munging process of data has not been automated by
machine learning. Data munging requires more than just an automated solution, it requires knowledge of what information should be removed and artificial intelligence is not to the point of understanding such things. ==Connection to data mining==