JSON Protocol
Every command returns structured JSON. Parseable by any LLM, any language, any agent framework.
Structured JSON commands for mouse, keyboard, screen, windows, and accessibility trees. One npm install, zero boilerplate.
# Discover all interactive elements in a window
nib snapshot --window "Login" -i
# → @txt:Username @txt:Password @chk:Remember @btn:Login
# Fill in the form using stable element refs
nib type-element @txt:Username admin --window "Login"
nib type-element @txt:Password secret123 --window "Login"
nib click-element @btn:Login --window "Login"
# Verify the result
nib diff --window "Login"
# → @btn:Login changed: isEnabled true → falseEvery design decision optimized for AI agents — structured output, semantic identifiers, and minimal token overhead.
Every command returns structured JSON. Parseable by any LLM, any language, any agent framework.
Human-readable @btn:Save refs instead of fragile coordinates or XPath. Stable across layout changes.
Snapshot, query, and diff the full a11y tree. Agents see the UI semantically, not just pixels.
Chain commands in .nib files. Execute multi-step workflows in a single call.
The agent loop is four steps: observe the UI, let the LLM decide, execute the command, verify the result. NIB handles steps 1, 3, and 4.
No SDK required
Shell exec + JSON parse. Works from TypeScript, Python, Go, Rust — anything.
Observe-Act-Verify built in
Snapshot captures state, diff confirms the effect. The agent always knows what happened.
Structured error handling
Errors include codes, messages, and suggestions the LLM can act on directly.
import { execSync } from "node:child_process";
import Anthropic from "@anthropic-ai/sdk";
function nib(cmd: string) {
return JSON.parse(execSync(`nib ${cmd}`).toString());
}
const client = new Anthropic();
async function agentStep(task: string, window: string) {
// 1. Observe — snapshot the UI
const { data } = nib(`snapshot --window "${window}" -i`);
// 2. Decide — let the LLM pick an action
const response = await client.messages.create({
model: "claude-sonnet-4-5-20250929",
system: `You control a desktop via nib commands.
Available elements:\n${data.refs.join("\n")}`,
messages: [{ role: "user", content: task }],
tools: nibTools,
});
// 3. Act — execute the command
const tool = response.content.find(b => b.type === "tool_use");
nib(`${tool.name} ${tool.input.ref} --window "${window}"`);
// 4. Verify — diff to confirm the effect
return nib(`diff --window "${window}"`);
}$ nib snapshot --window "Settings" -i
{
"ok": true,
"command": "snapshot",
"data": {
"window": { "title": "Settings" },
"refCount": 8,
"refs": [
"@btn:Apply",
"@btn:Cancel",
"@chk:DarkMode",
"@chk:Notifications [checked]",
"@sld:Volume = \"75\"",
"@tab:General [focused]",
"@tab:Privacy",
"@txt:DisplayName = \"Alice\""
]
}
}Snapshots return every interactive element as a human-readable ref. Diffs show exactly what changed after an action. Your agent understands the UI without parsing pixels.
After every action, nib diff compares the current accessibility tree against the last snapshot. The agent gets a precise report of what was added, removed, or changed — no guesswork, no screenshots to parse.
Confirm actions succeeded
Did the checkbox actually toggle? Did the dialog close? Diff tells you.
Catch unexpected side effects
See new elements that appeared, buttons that became disabled, or values that changed.
Faster than re-snapshotting
Diff returns only what changed — fewer tokens, faster agent decisions.
# Take a baseline snapshot
$ nib snapshot --window "Settings" -i
# Perform an action
$ nib click-element @chk:DarkMode --window "Settings"
# Compare current state against the last snapshot
$ nib diff --window "Settings"
{
"ok": true,
"command": "diff",
"data": {
"added": [
"@btn:Restart"
],
"removed": [],
"changed": [
{
"ref": "@chk:DarkMode",
"isChecked": { "from": false, "to": true }
}
]
}
}NIB is a CLI. It works with whatever agent framework you already use.
Expose nib commands as MCP tools. Claude Code, Cursor, and other MCP clients drive desktops directly.
Map nib commands to OpenAI/Anthropic tool schemas. The LLM picks the tool, your harness runs the command.
LLM generates a .nib batch script, you execute it in one shot. Ideal for planned, multi-step workflows.
Mouse, keyboard, screen, windows, clipboard, accessibility, OCR, and more — all through a single CLI with consistent JSON responses.
clicktypepressdragscrollmouse-move-smoothscreenshot-base64readfind-textscreen-sizecolor-atlist-windowsfocus-windowactive-windowresize-windowmove-windowclick-elementtype-elementfocus-elementhover-elementscroll-elementsnapshotdifffind-elementget-elementcheck-elementwaitwait-for-windowbatchclipboard-getclipboard-setLet Claude or GPT operate any desktop app on the user's behalf — no per-app integration needed.
Agents that explore applications, fill forms, and verify state changes without hand-written test scripts.
Snapshot a window, read text via OCR, and parse structured data from any application.
Chain desktop actions with API calls. Automate cross-app workflows that have no API.
NIB ships with a SKILL.md that teaches AI coding agents how to use it. Drop it into your project and Claude Code, Cursor, or any agent that reads markdown instructions can drive your desktop immediately.
Complete command reference
Every command, option, and pattern documented for the agent
Common patterns included
Form filling, menu navigation, dialog handling, state verification
Reliability rules baked in
Best practices the agent follows automatically: snapshot before refs, re-snapshot after transitions, prefer refs over coordinates
SKILL.md
Agent instruction file
# Workflow
Always prefer accessibility-based element interaction over coordinate-based mouse clicks.
# Rules for reliability
1. Always snapshot before using element refs
2. Re-snapshot after UI transitions
3. Use diff for incremental updates
# Common patterns
Form filling, menu navigation, dialog handling, keyboard shortcuts, state verification...
NIB is included in Solo and Team plans. Let your AI operate any application — no per-app integration, no browser required.