Text Search - tutorials

nut.js allows you to locate template images on your screen, but in some cases locating a certain text might be more useful and flexible.

Remark: Text search uses the exact same set of screen methods as image search, only with different query types. For a general understanding of different screen methods, please also take a look at the image search tutorial.

Another remark: Both @nut-tree/plugin-ocr and @nut-tree/plugin-azure are very similar in terms of usage, they only differ in their configuration.

TextFinder Providers

To do so, we will have to install an additional package, providing the actual implementation to perform text search. Otherwise, all functions relying on text search will throw an error like Error: No TextFinder registered.

Currently, nut.js provides two types of TextFinder implementations:

Attention: These are nut.js premium packages which require an active subscription. See the registry access tutorial to learn how to subscribe and access the private registry.

Text queries

Both plugins process text queries to search for text on screen. Currently, nut.js provides two different text queries:

singleWord: Searches for a single word.
textLine: Searches for a text line, so it's possible to search for multiple, concatenated words. E.g. textLine("How to use this plugin") would search for this very sentence.

@nut-tree/plugin-ocr

npm i @nut-tree/plugin-ocr

In its simplest form, we only need to install the package and require it in your code to use it:

const {screen, singleWord} = require("@nut-tree/nut-js");
require("@nut-tree/plugin-ocr");

(async () => {
    try {
        const location = await screen.find(singleWord("nut.js"));
    } catch (e) {
        console.error(e);
    }
})();

This is all we need to perform offline text-search using defaults provided by the module.

But it wouldn't be a tutorial if we'd stop here!

Configure the language model type

The @nut-tree/plugin-ocr provider package provides some config options, so let's require it in our code, e.g. index.js, to configure it:

const {screen, singleWord} = require("@nut-tree/nut-js");
const {configure} = require("@nut-tree/plugin-ocr");

(async () => {
    try {
        const location = await screen.find(singleWord("nut.js"));
    } catch (e) {
        console.error(e);
    }
})();

With configure we're able to customize two settings:

languageModelType: The language model type to use for OCR. Possible values are DEFAULT, BEST and FAST. Visit the Configuration section of the OCR plugin documentation for more information.
dataPath: The path where we store OCR models. This may be useful if you want to store the models in a specific location.

const {screen, singleWord} = require("@nut-tree/nut-js");
const {configure, LanguageModelType} = require("@nut-tree/plugin-ocr");

configure({
    languageModelType: LanguageModelType.BEST
});

(async () => {
    try {
        const location = await screen.find(singleWord("nut.js"));
    } catch (e) {
        console.error(e);
    }
})();

In this example we will use LanguageModelType.BEST which is slower compared to other models, but yields the most accurate results.

Specify OCR languages

The default configuration of the OCR plugin uses the English language. But if we want to use a different language, or multiple languages at once, we can specify them in the providerData object of the find function.

Remark: Other screen methods like findAll, waitFor or read also accept the providerData object.

const {screen, singleWord} = require("@nut-tree/nut-js");
const {configure, LanguageModelType, Language} = require("@nut-tree/plugin-ocr");

configure({
    languageModelType: LanguageModelType.BEST
});

(async () => {
    try {
        const location = await screen.find(singleWord("nut.js"), {
            providerData: {
                lang: [Language.English, Language.German],
            }
        });
    } catch (e) {
        console.error(e);
    }
})();

This way we can specify the languages we want to use for OCR. The Language enum is provided by the OCR plugin. You can find a list of all available languages in the Configuration section of the OCR plugin documentation.

Preload OCR languages

Multi-language support works by downloading required models on the fly. When using a new combination of LanguageModelType/Language the first time, the plugin will automatically download and cache the model locally.

What if we want to avoid occasional loading times during execution? It's possible to preload models you know you'll use. This way it's possible to load required models at a defined point in time to make sure they're available when needed. If a model is already cached locally it won't be re-downloaded.

So let's make sure we have both English and German at our disposal:

const {screen, singleWord} = require("@nut-tree/nut-js");
const {configure, LanguageModelType, Language, preloadLanguages} = require("@nut-tree/plugin-ocr");

configure({
    languageModelType: LanguageModelType.BEST
});

(async () => {
    try {
        await preloadLanguages([Language.English, Language.German]);

        const location = await screen.find(singleWord("nut.js"), {
            providerData: {
                lang: [Language.English, Language.German],
            }
        });
    } catch (e) {
        console.error(e);
    }
})();

Dealing with flawed results

OCR engines are not perfect and sometimes return a bit messed up results. Emojis are interpreted as characters, sometimes a space is lost and two words are joined, you name it. In order to deal with such inconsistencies, it's also possible to adjust two parameters via providerData:

partialMatch: Even if a single word returned by the OCR engine contains the following period or any similar case, when setting partialMatch to true you'll still get a hit, even if it's only a partial match.
caseSensitive: Toggle case sensitivity when looking for matches. This is another way to deal with eventual inconsistencies in OCR results.

For our example, let's allow partial matches and disable case-sensitivity:

const {screen, singleWord} = require("@nut-tree/nut-js");
const {configure, LanguageModelType, Language, preloadLanguages} = require("@nut-tree/plugin-ocr");

configure({
    languageModelType: LanguageModelType.BEST
});

(async () => {
    try {
        await preloadLanguages([Language.English, Language.German]);

        const location = await screen.find(singleWord("nut.js"), {
            providerData: {
                lang: [Language.English, Language.German],
                partialMatch: true,
                caseSensitive: false,
            }
        });
    } catch (e) {
        console.error(e);
    }
})();

Custom OCR confidence value

One way to configure the minimum required confidence value for a match when performing on-screen search is the screen.config.confidence value. This property was introduced with the initial image search plugin, thus it was exclusively used for image search.

Now that there are additional things to search for on-screen, like text, this single confidence value becomes a bit limiting. In cases where we are using both image and text search we'd like to have a separate way to configure confidence values used for OCR based searches.

After importing @nut-tree/plugin-ocr there's another property at our disposal to configure the confidence value required for text search:

screen.config.ocrConfidence

This value specifies the percentage required for a text search result to be accepted.

const {screen, singleWord} = require("@nut-tree/nut-js");
const {configure, LanguageModelType, Language, preloadLanguages} = require("@nut-tree/plugin-ocr");

configure({
    languageModelType: LanguageModelType.BEST
});

screen.config.ocrConfidence = 0.9;

(async () => {
    try {
        await preloadLanguages([Language.English, Language.German]);

        const location = await screen.find(singleWord("nut.js"), {
            providerData: {
                lang: [Language.English, Language.German],
                partialMatch: true,
                caseSensitive: true,
            }
        });
    } catch (e) {
        console.error(e);
    }
})();

Full example

Let's take a look at a full example which brings all previously discussed pieces together. The following sample would demonstrate a hypothetical scenario where we are trying to click a button which is labelled "Bestätigen" in German (that would be "Confirm" in English).

We configure our languageModelType to the model which delivers the most accurate results, preload German language data, configure a custom OCR confidence value of 80% and run a non case-sensitive search for a singleWord, allowing for partial matches.

const {getActiveWindow, mouse, screen, singleWord, straightTo} = require("@nut-tree/nut-js");
const {configure, Language, LanguageModelType, preloadLanguages} = require("@nut-tree/plugin-ocr");

configure({
    languageModelType: LanguageModelType.BEST
});

(async () => {
    await preloadLanguages([Language.German]);

    screen.config.ocrConfidence = 0.8;
    screen.config.autoHighlight = true;

    const location = await screen.find(singleWord("Bestätigen"), {
        providerData: {
            lang: [Language.German],
            partialMatch: true,
            caseSensitive: false
        }
    });
    await mouse.move(
        straightTo(
            centerOf(
                location
            )
        )
    );
    await mouse.leftClick();
})();