As the
state of the art advanced, document processing transitioned to handling "document components ... as database entities." A technology called automatic document processing or sometimes intelligent document processing (IDP) emerged as a specific form of
Intelligent Process Automation (IPA), combining
artificial intelligence such as
Machine Learning (ML),
Natural Language Processing (NLP) or
Intelligent Character Recognition (ICE) to extract data from several types documents. Advancements in automatic document processing, also called Intelligent Document Processing, improve the ability to process
unstructured data with fewer exceptions and greater speeds.
Applications Automatic document processing applies to a whole range of documents, whether structured or not. For instance, in the world of business and finance, technologies may be used to process paper-based invoices, forms, purchase orders, contracts, and currency bills. Financial institutions use intelligent document processing to process high volumes of forms such as regulatory forms or loan documents. ID uses AI to extract and classify data from documents, replacing manual data entry. In medicine, document processing methods have been developed to facilitate patient follow-up and streamline administrative procedures, in particular by digitizing medical or laboratory analysis reports. The goal is also to standardize medical databases. Algorithms are also directly used to assist physicians in medical diagnosis, e.g. by analyzing
magnetic resonance images, or
microscopic images. Document processing is also widely used in the
humanities and
digital humanities, in order to extract historical
big data from archives or heritage collections. Specific approaches were developed for various sources, including textual documents, such as newspaper archives, but also images, or maps. {{cite journal |last1=Tang|first1=Yuan Y.|last2=Lee|first2=Seong-Whan|last3=Suen|first3=Ching Y.|title=Automatic document processing: a survey Many technologies support the development of document processing, in particular
optical character recognition (OCR), and
handwritten text recognition (HTR), which allow the text to be transcribed automatically. Text segments as such are identified using instance or
object detection algorithms, which can sometimes also be used to detect the structure of the document. The resolution of the latter problem sometimes also uses
semantic segmentation algorithms. These technologies often form the core of document processing. However, other algorithms may intervene before or after these processes. Indeed, document
digitization technologies are also involved, whether in the form of classical or three-dimensional scanning.{{cite web |url=https://artmyn.com/|title= Revolutionary Scanning Technology for Art At the other end of the chain are various image completion, extrapolation or data cleanup algorithms. For textual documents, the interpretation can use
natural language processing (NLP) technologies. == See also ==