Tutorials

Text Search

nut.js allows you to locate template images on your screen, but in some cases locating a certain text might be more useful and flexible.

Remark: Text search uses the exact same set of screen methods as image search, only with different query types. For a general understanding of different screen methods, please also take a look at the image search tutorial.

Another remark: Both @nut-tree/plugin-ocr and @nut-tree/plugin-azure are very similar in terms of usage, they only differ in their configuration.

TextFinder Providers


To do so, we will have to install an additional package, providing the actual implementation to perform text search. Otherwise, all functions relying on text search will throw an error like Error: No TextFinder registered.

Currently, nut.js provides two types of TextFinder implementations:

Attention: These are nut.js premium packages which require an active subscription. See the registry access tutorial to learn how to subscribe and access the private registry.

Text queries

Both plugins process text queries to search for text on screen. Currently, nut.js provides two different text queries:

  • singleWord: Searches for a single word.
  • textLine: Searches for a text line, so it's possible to search for multiple, concatenated words. E.g. textLine("How to use this plugin") would search for this very sentence.

Prerequisites

In order to use @nut-tree/plugin-azure, you need to have an Azure account and an Azure AI Vision OCR resource. You can use the free pricing tier (F0) to try the service, and upgrade later to a paid tier for production.

Once you have both things set up, you'll need a key and the endpoint of the resource you created to connect your application to the Azure AI Vision service:

  • After your Azure Vision resource deployed, select Go to resource.
  • In the left navigation menu, select Keys and Endpoint.
  • Copy one of the keys and the endpoint.
  • Use them in your code via e.g. environment variables.

@nut-tree/plugin-azure

npm i @nut-tree/plugin-azure

Configure credentials

Assuming you went through the Prerequisites step, let's load our credentials.

First, we'll create a .env file where we store our credentials obtained from Azure:

VISION_KEY=<YOUR_API_KEY>
VISION_ENDPOINT=<YOUR_API_ENDPOINT>

Next, we will install dotenv, one of the most widely used packages to work with .env files.

npm i dotenv

Now let's use it in our script to populate our environment from our .env file:

const {screen, singleWord} = require("@nut-tree/nut-js");
require('dotenv').config();

The @nut-tree/plugin-azure package provides a subpackage for OCR: @nut-tree/plugin-azure/ocr.

To get things set up, we need to import two things:

const {screen, singleWord} = require("@nut-tree/nut-js");
require('dotenv').config();
const {configure, useAzureVisionOCR} = require("@nut-tree/plugin-azure/ocr");

Since we're using dotenv we can now simple reference our credentials via process.env. Following the configuration part we'll have to call useAzureVisionOCR() to register the plugin and we're good to go:

const {screen, singleWord} = require("@nut-tree/nut-js");
require('dotenv').config();
const {configure, useAzureVisionOCR} = require("@nut-tree/plugin-azure/ocr");

configure({
    apiKey: process.env.VISION_KEY,
    apiEndpoint: process.env.VISION_ENDPOINT,
});

useAzureVisionOCR();

(async () => {
    try {
        const location = await screen.find(singleWord("nut.js"));
    } catch (e) {
        console.error(e);
    }
})();

And that's basically it!

We've provided the minimum required configuration to use the Azure Vision OCR service. Since the Azure Vision OCR service will detect all languages present in an image automatically, we don't have to provide them explicitly, nor do we have to fetch any language data locally as we do with @nut-tree/plugin-ocr.

On the other hand, this plugin does not work offline, so it's up to you to decide which package to use.

Remark: Tests have shown that @nut-tree/plugin-azure yields more accurate results than @nut-tree/plugin-ocr with little to no additional configuration.

Specify OCR language

As we learned earlier, the Azure Vision OCR service will automatically extract text from images, even if it's mixed languages. However, it is still possible to specify one particular language you want to force usage on an OCR run. This language can be configured via the providerData object of the find function.

Remark: Other screen methods like findAll, waitFor or read also accept the providerData

const {screen, singleWord} = require("@nut-tree/nut-js");
require('dotenv').config();
const {configure, useAzureVisionOCR} = require("@nut-tree/plugin-azure/ocr");

configure({
    apiKey: process.env.VISION_KEY,
    apiEndpoint: process.env.VISION_ENDPOINT,
});

useAzureVisionOCR();

(async () => {
    try {
        const location = await screen.find(singleWord("Bestätigen"), {
            providerData: {
                language: Language.German,
            }
        });
    } catch (e) {
        console.error(e);
    }
})();

This way you can specify the languages you want to use for OCR. The Language enum is provided by the plugin. You can find a list of all available languages in the Configuration section of the OCR plugin documentation.

Alternatively, if you don't want to specify the language on every call to find, the configuration can be move to the global plugin configuration:

const {screen, singleWord} = require("@nut-tree/nut-js");
require('dotenv').config();
const {configure, useAzureVisionOCR} = require("@nut-tree/plugin-azure/ocr");

configure({
    apiKey: process.env.VISION_KEY,
    apiEndpoint: process.env.VISION_ENDPOINT,
    language: Language.German,
});

useAzureVisionOCR();

(async () => {
    try {
        const location = await screen.find(singleWord("Bestätigen"));
    } catch (e) {
        console.error(e);
    }
})();

Dealing with flawed results

OCR engines are not perfect and sometimes returned a bit messed up results. Emojis are interpreted as characters, sometimes a space is lost and two words are joined, you name it. In order to deal with such inconsistencies, it's also possible to adjust two parameters via providerData:

  • partialMatch: Even if a single word returned by the OCR engine contains the following period or any similar case, when setting partialMatch to true you'll still get a hit, even if it's only a partial match.
  • caseSensitive: Toggle case sensitivity when looking for matches. This is another way to deal with eventual inconsistencies in OCR results.
const {screen, singleWord} = require("@nut-tree/nut-js");
require('dotenv').config();
const {configure, useAzureVisionOCR} = require("@nut-tree/plugin-azure/ocr");

configure({
    apiKey: process.env.VISION_KEY,
    apiEndpoint: process.env.VISION_ENDPOINT,
    language: Language.German,
});

useAzureVisionOCR();

(async () => {
    try {
        const location = await screen.find(singleWord("Bestätigen"), {
            providerData: {
                partialMatch: true,
                caseSensitive: true,
            }
        });
    } catch (e) {
        console.error(e);
    }
})();

Custom OCR confidence value

One way to configure the minimum required confidence value for a match when performing on-screen search is the screen.config.confidence value. This property was introduced with the initial image search plugin, thus it was exclusively used for image search.

Now that there are additional things to search for on-screen, like text, this single confidence value becomes a bit limiting. In cases where you are using both image and text search you'd want a separate way to configure confidence values used for OCR based searches.

After importing @nut-tree/plugin-azure you'll have another property at your disposal to configure the confidence value required for text search:

screen.config.ocrConfidence

This value specifies the percentage required for a text search result to be accepted.

const {screen, singleWord} = require("@nut-tree/nut-js");
require('dotenv').config();
const {configure, useAzureVisionOCR} = require("@nut-tree/plugin-azure/ocr");

configure({
    apiKey: process.env.VISION_KEY,
    apiEndpoint: process.env.VISION_ENDPOINT,
    language: Language.German,
});

screen.config.ocrConfidence = 0.8;

useAzureVisionOCR();

(async () => {
    try {
        const location = await screen.find(singleWord("Bestätigen"), {
            providerData: {
                partialMatch: true,
                caseSensitive: true,
            }
        });
    } catch (e) {
        console.error(e);
    }
})();

Full example

Let's take a look at a full example which brings all previously discussed pieces together. The following sample would demonstrate a hypothetical scenario where we are trying to click a button which is labelled " Bestätigen" in German (that would be "Confirm" in English).

We configure our languageModelType to the model which delivers the most accurate results, preload German language data, configure a custom OCR confidence value of 80% and run a non case-sensitive search for a singleWord, allowing for partial matches.

const {getActiveWindow, mouse, screen, singleWord, straightTo} = require("@nut-tree/nut-js");
require('dotenv').config();
const {configure, useAzureVisionOCR} = require("@nut-tree/plugin-azure/ocr");

configure({
    apiKey: process.env.VISION_KEY,
    apiEndpoint: process.env.VISION_ENDPOINT,
    language: Language.German,
});

useAzureVisionOCR();

screen.config.ocrConfidence = 0.8;
screen.config.autoHighlight = true;

(async () => {
    const location = await screen.find(singleWord("Bestätigen"), {
        providerData: {
            partialMatch: true,
            caseSensitive: false
        }
    });
    await mouse.move(
        straightTo(
            centerOf(
                location
            )
        )
    );
    await mouse.leftClick();
})();
Previous
Text Search with @nut-tree/plugin-ocr