Available in PaperCut MF only.

Set up locally hosted OCR (On-premise)

To set up locally hosted OCR (On-premise), you need to:

  1. Determine where to install locally hosted OCR (On-premise)

  2. Install locally hosted OCR (On-premise)

  3. Configure the host location and available languages

  4. Tune the OCR server performance

IMPORTANT
  • The locally hosted OCR (On-premise) solution requires the On-prem OCR & Document Processing Pack once the trial period is finished. For more information, contact your local Authorized Solution Center or reseller.

  • The locally hosted OCR (On-premise) solution is available only for Windows.

Step 1: Determine where to install locally hosted OCR (On-premise)

For smaller environments, it makes sense to install locally hosted OCR (On-premise) alongside the Application ServerAn Application Server is the primary server program responsible for providing the PaperCut user interface, storing data, and providing services to users. PaperCut uses the Application Server to manage user and account information, manage printers, calculate print costs, provide a web browser interface to administrators and end users, and much more.. In medium to larger environments, though, you can ensure optimum system and Application Server performance by setting up one or more dedicated OCR servers that the Application Server can contact.

See the table below for recommendations.

Environment size Approx. jobs per day Recommended processors* Recommended installation location Benefits

Small

0 – 50

2

Application Server

  • Less infrastructure cost.

  • Great for smaller business with occasional OCR load

Medium

50 – 200

3

Start on a well- resourced Application Server. Monitor and plan for a separate server on an as-needed basis.

  • Balances resource use, system performance, and OCR processing performance.

Large

200+

4+

One or more separate high performing OCR servers

  • Isolates resources.

  • Better handling of high OCR load, spikes, and multiple jobs. For example, Enterprise or Education.

  • OCR’s heavy resource requirements don’t interfere with the normal operation of the Application Server.

*Recommended available processors to use (to support parallel jobs).

Keep in mind that the more storage and processing power available, the better locally hosted OCR (On-premise) performs—make as much available as you can. For any environment size, we recommend:

  • at least 10 GB available disk space

  • 512 MB available memory

  • running a 64-bit edition of Microsoft Windows.

For information about:

Step 2: Install locally hosted OCR (On-premise)

  1. Download the installer:

    Download OCR installler

  2. On the OCR server, run the file. The Setup Wizard is displayed.

  3. Follow the prompts during the install.

    • If you intend to scan documents to PDF, ensure that the GhostTrap component is selected for installation.

    • If you intend to scan to DOCX, ensure that the Pandoc component is selected for installation.

    On Windows servers, the installer configures the Windows Firewall.

  4. If you are using a non-Windows Firewall, open port 9181 (inbound) to allow connections from the PaperCut MF Application Server.

Step 3: Configure the host location and available languages

  1. In the PaperCut MF Admin web interface, do one of the following:

    • If you’re already on the Capture page, refresh the page.

    • Click Options > Capture. The Capture page is displayed.

  2. In the OCR area, select Use locally hosted OCR (requires additional setup).

  3. In the OCR area, in Hostname, type the hostname or the IP address of the OCR server where you installed locally hosted OCR (On-premise).

    NOTE

    We recommend that you use the IP address only if it’s static. Otherwise use the hostname.

  4. Click Add.

  5. If you want to set up multiple OCR servers, click Add new OCR Server; then repeat steps 3 and 4.

    NOTE

    Adding more than one OCR server is currently in Percolator (Project Wollemi), so bear on mind that this configuration is still being tested and will be refined before launch. No additional downloads or registrations are required.

    Each OCR server is listed on the Capture tab.

  6. In Language support for OCR, select up to 10 languages that you want to be able to scan.

    NOTE

    Although you can select up to 10 languages, the more you select the poorer the overall scanning performance. Usually up to five languages is a good number for most environments.

  7. Click Apply.

  8. Ensure that your scan actions have been configured with OCR enabled.

  9. Run a test job and check the file for success:

    • For a PDF file, check that the text in the file is text searchable.

    • For a docx file, the text should be displayed.

Step 4: Tune the OCR server performance

The approach to tuning the performance of an OCR server depends on whether it is on a standalone system or co-located with other services.

By default, the OCR server processes two jobs in parallel, and they are processed with a normal CPU priority. As described below, you can change the default number of parallel jobs by modifying the configuration file at [ocr-server-path]/data/config/config.toml.

After making changes to the config file, you’ll need to restart the Windows service: PaperCut OCR.

Tuning for installation on a standalone system

When installing the OCR server on a standalone system, to achieve the best performance, it's a good idea to maximize the number of jobs that can be processed in parallel.

The ideal number to use depends on many factors, such as the type and size of the documents being processed and the system architecture. A reasonable starting point is to use the total number of virtual CPUs (or cores times threads on a “bare metal” system) minus two.

Put another way, if you want to process four OCR jobs in parallel and you are installing OCR on a virtual machine, give it six virtual CPUs.

To make this change:

  1. In the config.toml file, remove the # at the start of the MaxJobsInParallel line to uncomment the option and make it active.

  2. Set the MaxJobsInParallel line to MaxJobsInParallel = 4

  3. Restart the Windows service: PaperCut OCR

Tuning for co-location with the Application Server

NOTE

For medium to large environments we do not recommend this approach; see the table above. OCR’s heavy resource requirements can interfere with the normal operation of the Application Server.

If your system has additional available processors (beyond what the Application Server is using), you might want to consider increasing the number of jobs that are processed in parallel from the default of two.

To make this change:

  1. In the config.toml file, remove the # at the start of the MaxJobsInParallel line to uncomment the option and make it active.

  2. Set the MaxJobsInParallel = 3

  3. Restart the Windows service: PaperCut OCR