Cognitive automation of document processing: 10 questions to consider


As enterprises mature in their usage of robotic process automation and pursue more complex opportunities using artificial intelligence methods, there is a need to better understand the contours of the journey that lies ahead of them. In this article, we propose a list of ten questions that teams tasked with such projects should consider before kicking off cognitive automation projects for document processing.


Internal documents vs. external documents

Organisations like insurance companies deal with both internal (policies and contracts) and external documents (claims and complaints). Processes dealing with internal documents yield easily to cognitive automation as the documents are clean in appearance and often have a known structure that can be leveraged to create solutions using natural language processing. For example, insurance policies may always have the same sections in the same order with even the content in each section being known, the variation being primarily due to the differences in policy details. This allows for the document to be indexed and queried. However, a complaint letter from an irate customer has an unpredictable structure, making it difficult to know “where” to look for “what”; moreover such letters are usually scanned and can be of very poor visual quality. Automating such processes is far more complex posing significant development and feasibility risk.

Sources and formats

Documents are often received by enterprises through numerous sources (or channels) such as fax, email, normal mail, etc. Typically, the format of the document and the visual quality tends to vary by the channel through which it’s received. So, if the target process has multiple intake channels, the implementation will be more involved and customised, stretching project timelines.

Handwritten content

Accuracy of handwriting recognition, either through “home-grown” open-source solutions or through vendor-based solutions (3rd party or through cloud APIs), has improved significantly of late. However, it still is extremely difficult to get near-perfect accuracies (say 90%+). The scenario also matters greatly here. Recognizing numbers or letters written inside boxes on a claim form is still tractable compared to recognizing the word “urgent” scribbled somewhere on a complaint letter. A process where the success of the project hinges on recognising the handwritten content with near-perfect accuracies usually a red flag in terms of feasibility.

“Look-up” systems/applications

During either data entry or the processing of a document, a lot of ancillary systems are usually looked up. For example, determining if a customer was eligible for a certain treatment at a hospital or to look up standard claim codes. As the number of systems to “look up” increases, the implementation timeline tends to draw out due to integration and also requires collaboration with internal IT teams who own these systems and may have different priorities.

Historical documents

To develop any kind of cognitive automation solution using AI methods, the implementation team needs a large sample (“dump”) of historical documents so that they can “train” the solution during development to cater to all sorts of variations and formats. Typically, these “dumps” have to be procured from enterprise document repository applications. These applications may not have been built in the first place to make it easy to extract such “dumps.” This step may thus involve significant cost and time unless prior collaboration has been arranged with the internal IT teams who own and manage these repositories.

Near-real-time feeds

To operationalise a cognitive automation solution for document processing, near-real-time feeds of two kinds are usually required: (a) A “document” feed that supplies the documents to the solution “on demand”; and (b) A “data” feed that supplies the data for the ancillary systems that need to be “looked up." If these feeds are not easy to arrange, then it is a clear red flag from a Development Go/No Go perspective.

Critical application integration

Some processes in the enterprise may handle documents which are extremely critical and can’t afford any downtime. For example, processing “urgent” complaints. In such cases, the cognitive automation solution will also have to be engineered to be “always on” and appropriate production support will have to be set up. This will extend timelines and increase costs.

Centralised vs. decentralised

Whether the eventual solution is a complete automation or partial (requiring human assistance/intervention), consideration has to be given at the outset to the question of whether the solution deployment should be centralised (on a server) or decentralised (on the desktop). Among other things, a centralised solution has the advantage of easier deployment and debugging but with higher costs whereas a decentralised deployment may be difficult to maintain but have a lower cost.

Coverage and accuracy

To complete the feasibility assessment and arrive at a Development Go/No Go decision, discussions are required to understand and calibrate expectations around: (1) Coverage (The minimum % of the in-scope inventory that the solution needs to be able to “handle” for financial viability and process adoption); and (2) Accuracy (could be at a document level or at a data-entry field level). It's an early red flag if the eventual solution has to have high coverage (66% +) and accuracy (80% +) for the project to have a positive financial impact.

Turnaround time

Finally, the implementation team will have to consider the speed with which the cognitive automation solution has to respond. If the solution eventually takes the form of some sort of a chat bot, or a user interface, it will have “user experience” requirements to cater to. On the other hand, if the solution integrates with critical high-speed applications, the response time may be of the order of minutes or even seconds. Given how cognitive automation solutions need to “look up” data and utilise machine learning, sometimes it may not be feasible to meet extremely low response times.


The above article appeared in ETCIO.in.

Icon Picker V2