ru

uk

en

fr

es

Digital processing of documents


Problem

Despite of development of digital technologies, printed document flow keeps significant part of current management. These are documents, which contains information on paper. Electronic documents are mostly digital files, that are, in most cases, focused on data representation on a computer screen, but not on information contained in document itself. In another words, most of electronic documents are just simulating printed analogs.

In consideration of the foregoing premises, resolution of the problem of extraction useful information from such kind of documents and representation of this information in structured arrays is making this topic actual. One of approaches for solving this problem is represented in this article below.


Typical process of digitalization of printed paper

Typical process of documents procession could be described by following diagram:

ocr-sys-arch

As can be seen from the above, to solve this problem it is necessary to have such base components:

  • Classifier of incoming documents
  • Pre-processing module
  • Processing module
  • Post-processing module

Also, these service components are also needed:

  • Scaling
  • Load balancing
  • Self-diagnosis
  • Logging
  • Cybersecurity management

Solution and description