@nut-tree/element-inspector

Read this first!

This plugin is currently released in beta. Please report any issues you encounter.
This plugin is currently only available for Windows.
New features and platforms will be added in the future.

Installation

npm i @nut-tree/element-inspector

Buy

@nut-tree/element-inspector is included in the Solo and Team plans.

Description

@nut-tree/element-inspector is a window element inspection plugin for nut.js. It provides an implementation of the ElementInspectionProviderInterface to enable inspection of GUI elements of a window.

Usage: Retrieve elements of a window

Let's dive right into an example:

import {
    useConsoleLogger,
    ConsoleLogLevel,
    screen,
    windowWithTitle
} from "@nut-tree/nut-js";
import {useBolt} from "@nut-tree/bolt";
import "@nut-tree/element-inspector";
import {elements} from "@nut-tree/element-inspector/win";

useConsoleLogger({logLevel: ConsoleLogLevel.DEBUG});
useBolt();

const vs = await screen.find(windowWithTitle(/Visual Studio Code/));
await vs.focus();
// We can configure the max depths of the search tree 
// const items = await vs.getElements(); // <== By default, it will search through up to 100 levels deep
const items = await vs.getElements(5); // <== This will search 5 levels deep
console.log(JSON.stringify(items, null, 2));

The above example is a great way to examine the elements of a window.

Here's an excerpt of the output:

{
  "id": "",
  "role": "",
  "title": "",
  "type": "Group",
  "children": [
    {
      "id": "",
      "role": "toolbar",
      "title": "",
      "type": "ToolBar",
      "children": [
        {
          "id": "",
          "role": "button",
          "title": "Go Back (Alt+LeftArrow)",
          "type": "Button",
          "children": [
            {
              ...
            }
          ]
        }
      ]
    },
    {
      ...
    }
  ]
}

As you can see, the output is a JSON object that represents hierarchy of elements of the window. Each entry in the JSON object is an element description which looks like this:

interface ShortWindowElementInfo {
    id?: string,
    role?: string,
    sublRole?: string,
    title?: string,
    type?: string,
    children?: ShortWindowElementInfo[]
}

This element tree is useful to understand the structure of the window and to identify elements that you want to interact with.

Usage: Search for a specific element of a window

import {
    useConsoleLogger,
    ConsoleLogLevel,
    screen,
    windowWithTitle,
    mouse,
    Button,
    straightTo,
    centerOf
} from "@nut-tree/nut-js";
import {useBolt} from "@nut-tree/bolt";
import "@nut-tree/element-inspector";
import {elements} from "@nut-tree/element-inspector/win";

useConsoleLogger({logLevel: ConsoleLogLevel.DEBUG});
useBolt();

const vs = await screen.find(windowWithTitle(/Visual Studio Code/));
await vs.focus();
// We can configure the max depths of the search tree 
// const items = await vs.getElements(); // <== By default, it will search through up to 100 levels deep
const items = await vs.getElements(); // <== This will search 5 levels deep
console.log(JSON.stringify(items, null, 2));

const fileMenu = await vs.find(elements.menuItem({title: "File"}));
if (fileMenu.region != null) {
    await screen.highlight(fileMenu.region);
    await mouse.move(straightTo(centerOf(fileMenu.region)));
    await mouse.click(Button.LEFT);
}

Searching for a specific element is a great way to interact with a window.

Window elements searches are based on WindowElementQuery objects.

{
    id: string;
    type: "window-element";
    by: {
        description: WindowElementDescription;
    }
}

These queries search for an element described by a WindowElementDescription object.

interface WindowElementDescription {
    id?: string | RegExp;
    type?: string;
    title?: string | RegExp;
    value?: string | RegExp;
    selectedText?: string | RegExp;
    role?: string;
}

A WindowElementDescription describes an element by a varying set of properties. However, it requires at least one property to be set to be valid.

When comparing the WindowElementDescription object to an entry in the getElements() tree you'll notice that you can describe elements by their properties returned by the getElements() method. So to search for a specific element, you can use the properties of the element you're looking for to create a query which specifies the element you're looking for.

To avoid cumbersome object creation, the elements object provides a set of factory functions to create WindowElementQuery objects. Element types are platform dependent, so the factory functions are grouped by platform.

To use Windows element types, import the elements object from @nut-tree/element-inspector/win.

If you either do not know the type of element you're looking for, or if you want to search for an element with a specific title, no matter the type, you can use the windowElementDescribedBy factory function to construct a WindowElementQuery.

import {windowWithTitle} from "@nut-tree/nut-js";
import {useBolt} from "@nut-tree/bolt";
import "@nut-tree/element-inspector";
import {windowElementDescribedBy} from "@nut-tree/element-inspector/win";

useBolt();

const vs = await screen.find(windowWithTitle(/Visual Studio Code/));
await vs.focus();

// This line searches for any element with the title "File", no matter the type
const elementWithTitle = await vs.find(windowElementDescribedBy({title: "File"}));

Usage: Search for multiple instances of an element in a window

The element inspection API follows the same pattern as the screen API.

So besides find, there is also a findAll method that returns all instances of an element that match the query.

import {
    useConsoleLogger,
    ConsoleLogLevel,
    screen,
    windowWithTitle,
} from "@nut-tree/nut-js";
import {useBolt} from "@nut-tree/bolt";
import "@nut-tree/element-inspector";
import {elements} from "@nut-tree/element-inspector/win";

useConsoleLogger({logLevel: ConsoleLogLevel.DEBUG});
useBolt();

const vs = await screen.find(windowWithTitle(/Visual Studio Code/));
const menuItems = await vs.findAll(elements.menuItem({}));

const itemTitles = await Promise.all(menuItems.map(item => item.title));
console.log(itemTitles);

As you can see, the findAll method returns an array of elements that match the query. If there are no elements that match the query, an empty array is returned.

Usage: Waiting for an element in a window

Since the element inspection API follows the same pattern as the screen API, it's also possible to wait for an element to appear in a window.

import {
    useConsoleLogger,
    ConsoleLogLevel,
    screen,
    windowWithTitle,
} from "@nut-tree/nut-js";
import {useBolt} from "@nut-tree/bolt";
import "@nut-tree/element-inspector";
import {elements} from "@nut-tree/element-inspector/win";

useConsoleLogger({logLevel: ConsoleLogLevel.DEBUG});
useBolt();

const vs = await screen.find(windowWithTitle(/Visual Studio Code/));
const newFileMenuItem = await vs.waitFor(elements.menu({}));
console.log(newFileMenuItem);

The above snippet will repeatedly search for a menu element in the window until it appears. This way you can wait for elements to appear in a window, like menus do after a click on a menu bar.

Stable elements

Even though it is possible to wait for an element to appear in a window, one still has to be careful when interacting with elements that are not stable. For example, a menu that appears after a click on a menu bar is in a moving state, because as the menu unfolds, its size changes.

This can lead to issues when trying to interact with the menu while it is still moving, because even though the menu might already exist, positions of its child elements might still change.

Element relations

Sometimes it is not obvious which element we're targeting. There might be multiple elements with the same title, or the element we're looking for might be a child of another.

In order to precisely target the element we're looking for, we can use the in relation to specify an element's relation to its parent element(s).

import { windowWithTitle, screen, straightTo, centerOf, mouse, Button, keyboard, Key } from "@nut-tree/nut-js";
import "@nut-tree/element-inspector";
import { elements } from "@nut-tree/element-inspector/win"
import { useBoltWindowFinder } from "@nut-tree/bolt";

useBoltWindowFinder();
mouse.config.autoDelayMs = 100;

const explorer = await screen.find(windowWithTitle("This PC"));

await explorer.focus();

const dvdItem = await explorer.find(elements.treeItem({
    title: "boot",
    in: elements.treeItem({
        title: /DVD.*/,
        in: elements.treeItem({
            title: "Desktop"
        })
    })
}));

await mouse.move(straightTo(centerOf(dvdItem.region)));
await mouse.click(Button.LEFT);

const bootFiles = await explorer.findAll(elements.listItem({
    title: /.*boot.*/,
    in: elements.group({
        title: "Files Currently on the Disc"
    })
}));

In this example, we're looking for a treeItem that is nested withing two parent treeItems.

const dvdItem = await explorer.find(elements.treeItem({
    title: "boot",
    in: elements.treeItem({
        title: /DVD.*/,
        in: elements.treeItem({
            title: "Desktop"
        })
    })
}));

Relation resolving

Relations are resolve in reverse order, so the innermost relation is resolved first.

Why so?

Resolving relations gives us an additional level of control over the search process. By resolving relations in reverse order we're able to immediately discard elements that do not match the innermost relation. So in case of very deep element hierarchies, we can avoid searching through the entire hierarchy by specifying a relation path from the root element to the element we're looking for.

Troubleshooting

Element inspection is a bit of a complex task and relies on the underlying platform's accessibility API. This means that the results of element inspection can vary depending on the platform and the application being inspected.

Here are some common issues and how to solve them:

No/only a few elements are returned when calling `getElements`:

If you did not manually limit the element search depth, it might be an application-specific issue. Some applications have a dedicated accessibility mode that needs to be enabled to expose elements to the accessibility.

Visual Studio Code, for example, has an accessibility mode that needs to be enabled:

"editor.accessibilitySupport": "on"

However, some applications might not have such a mode, or it might not be enough to enable it. In this case, you might need to use a different approach to interact with the application.

Detected element has no size/position

In scenarios where elements are located within dynamic containers, the element's position might not be stable. One example of such a scenario is a menu that appears after a click on a menu bar. You might have to add a short delay before querying the element to ensure that it has settled in its final position.

Element Types

Element types are platform dependent. @nut-tree/element-inspector provides a set of factory functions to create WindowElementQuery objects. These factory functions are grouped by platform.

Windows

On Windows, the following element types are available:

button

A button is an object that a user interacts with to perform an action such as the OK and Cancel buttons on a dialog box. The button control is a simple control to expose because it maps to a single command that the user wishes to complete.

calendar

A calendar control allows the user to easily determine the date and select other dates.

checkBox

A check box is an object used to indicate a state that users can interact with to cycle through that state. Check boxes either present a binary (Yes/No), (On/Off), or tertiary (On, Off, Indeterminate) option to the user.

comboBox

A combo box is a list box combined with a static control or an edit control that displays the currently selected item in the list box portion of the combo box. The list box portion of the control is displayed at all times or only appears when the user selects the drop-down arrow (which is a push button) next to the control. If the selection field is an edit control, the user can enter information that is not in the list; otherwise, the user can only select items in the list.

customControl

An application might make use of custom controls which are not part of the standard control types. In this case, the customControl control type is used.

dataGrid

The dataGrid control type lets a user easily work with items that contain data or automation elements presented in columns or rows. dataGrid controls have rows of items and columns of information about those items. A list-view control in Windows Explorer is an example that supports the dataGrid control type.

dataItem

The dataItem control is an item in a dataGrid control. An entry in a contacts list is an example of a data item control. A dataItem control contains information that is of interest to an end user. It is more complicated than the simple list item because it contains richer information.

document

document controls let a user view and manipulate multiple pages of text. Unlike edit controls which only support a simple line of unformatted text, document controls can host text that is richly styled and formatted. A typical example of a document control is the document area in e.g. MS Word.

editField

editField controls enable a user to view and edit a simple line of text without rich formatting support.

group

A group control represents a node within a hierarchy, similar to a div in HTML. The Group control type creates a separation in the UI Automation tree so items that are grouped together have a logical division within the UI Automation tree.

header

The header control provides a visual container for the labels for rows or columns of information.

headerItem

The headerItem control type provides a visual label for a row or column of information.

link

link controls create links that enable users to navigate within the same page, or from one page to another.

image

image controls used as icons, informational graphics, and charts will support the Image control type. Controls used as background or watermark images will not support the Image control type.

list

The list control type provides a way to organize a flat group or groups of items and allows a user to select one or more of those items.

listItem

A listItem control is an item in a list control. It is a child of a list control and contains data or automation elements.

A menu control allows hierarchical organization of elements associated with commands and event handlers. In a typical Microsoft Windows application, a menu bar contains several menu buttons (such as File, Edit, and Window), and each menu button displays a menu. A menu contains a collection of menu items (such as New, Open, and Close), which can be expanded to display additional menu items or to perform a specific action when clicked. So to put it simple, the window that pops up when you click on the "File" menu in an application is a menu control.

menuBar

menuBar controls are an example of controls that implement the MenuBar control type. Menu bars provide a means for users to activate commands and options contained in an application. A typical example of a menu bar is the bar at the top of the window in an application that contains the File, Edit, and Help menus.

menuItem

A menuItem control is an item representing a menu action item like "New", "Open..." etc. This means that both the "File" menu and the "New" menu item are menu controls.

pane

The pane control type is for potentially scrollable regions that have disparate content. It is used to represent an object within a frame or document window.

progressBar

progressBar controls indicate the progress of a lengthy operation. The control consists of a rectangle that is gradually filled with the system highlight color as an operation progresses.

radioButton

A radioButton consists of a round button and application-defined text (a label), an icon, or a bitmap that indicates a choice the user can make by selecting the button.

scrollBar

scrollBar controls enable a user to scroll content within a window or item container. The control consists of a set of buttons and a thumb control.

semanticZoom

semanticZoom is a technique introduced in Windows 8 for presenting and navigating large sets of related data or content within a single view, such as a photo album, app list, or address book. Semantic Zoom uses two distinct modes of classification, or zoom levels, for organizing and presenting the content. The low-level (or zoomed in) mode displays items in a flat, "all-up" structure; and the high-level (or zoomed out) mode displays items in groups, enabling the user to quickly navigate and browse through the content. For example, zooming a list of cities might change to a list of states containing those cities. Zooming a list of programs might change to a list of logical program groups.

separator

separator controls are used to visually divide a space into two regions. For example, a separator control can be a bar that defines two panes in a window.

slider

A slider control is a composite control with buttons that enable a user to set a numerical range or select from a set of items.

spinner

spinner controls are used to select from a domain of items or a range of numbers.

splitButton

The splitButton control enables an action to be performed on a control, and to expand the control to see a list of other possible actions that can be performed.

statusBar

A statusBar control displays information about an object being viewed in a window of an application, the object's component, or contextual information that relates to that object's operation within your application.

tab

A tab control is analogous to the dividers in a notebook or the labels in a file cabinet. By using a tab control, an application can define multiple pages for the same area of a window or dialog box.

tabItem

A tabItem control is used as the control within a tab control that selects a specific page to be shown in a window.

table

table controls contain rows and columns of text and, optionally, row headers and column headers.

textBox

A textBox control is a basic user interface item that represents a piece of text on the screen.

thumb

thumb controls provide the functionality that enables a control to be moved (or dragged), such as a scroll bar button, or resized, such as a window resizing widget.

titleBar

A titleBar control represents a title or caption bar in a window.

toolBar

toolBar controls enable end users to activate commands and tools contained within a application.

toolTip

toolTip controls are pop-up windows that contain text.

tree

The tree control type is used for containers whose contents have relevance as a hierarchy of nodes, as with the way files and folders are displayed in the left pane of Windows Explorer.

treeItem

The treeItem control type represents a node within a tree container. Each node might contain other nodes, called child nodes. Parent nodes, or nodes that contain child nodes, can be displayed as expanded or collapsed.

window

The window control consists of the window frame, which contains child objects such as title bar, client, and other objects.

Buy