One of the most common applications of data capture is extracting information from paper documents and saving it into databases (CMS, ECM, etc.). Basic technologies used for data capture vary by data type: •
Optical character recognition (OCR) – for printed text recognition •
Intelligent character recognition (ICR) – for hand-printed text recognition •
Optical mark recognition (OMR) – for marks recognition • Barcode recognition (BCR/OBR) • Document layer recognition (DLR) These technologies enable data extraction from paper documents for processing in enterprise systems like
enterprise resource planning (ERP) and
customer relationship management (CRM). The documents for data capture can be divided into 3 groups: structured, semi-structured, and
unstructured.
Structured documents (e.g., questionnaires, tests, tax returns, insurance forms, ballots) have identical layouts, making data capture straightforward since fields are always in the same location.
Semi-structured documents (e.g., invoices, purchase orders, waybills) follow a general format, but layout varies by vendor or parameters. Capturing data requires more advanced methods.
Unstructured documents (letters, contracts, articles, etc.) could be flexible with structure and appearance. ==The Internet and the future==