Release

ScreenHand is Now on npm: AI Desktop Automation in One Command

· 5 min read · By Clazro Technology
TL;DR: ScreenHand v0.1.1 is live on npm. Run npm i screenhand to install the open-source MCP server that gives Claude, Cursor, and any AI agent the ability to see your screen, click buttons, type text, and control any desktop app. 70+ tools. Native speed. macOS + Windows.
npm i screenhand

v0.1.1 · 73 files · 521 KB · AGPL-3.0

Why This Matters

AI assistants are incredibly powerful at reasoning, writing code, and answering questions. But they have a fundamental limitation: they can't interact with your computer. They can't see what's on your screen. They can't click a button. They can't fill out a form.

ScreenHand removes that wall. It's an MCP server that connects AI agents to your desktop through native OS APIs. Once installed, your AI can see, click, type, and control any application — at native speed.

And now it's on npm, which means getting started takes one command.

What's in the Package

ScreenHand v0.1.1 ships with 70+ tools organized by what you need to do:

Screen Vision

Screenshots, OCR, bounding boxes. See everything visible on screen.

Native App Control

Read UI trees, click buttons by name, set values, trigger menus. ~50ms per action.

Keyboard & Mouse

Click, type, drag, scroll, key combos. Full input simulation.

Chrome Browser (CDP)

Navigate, run JS, query DOM, fill forms. Chrome DevTools Protocol at ~10ms.

Learning Memory

Auto-learns successful strategies. O(1) recall. Gets better over time.

Cross-Platform

Swift bridge on macOS, C# bridge on Windows. Same tools on both.

Install and Connect in 2 Minutes

Step 1: Install

npm i screenhand
cd node_modules/screenhand
npm run build:native         # macOS
# npm run build:native:windows  # Windows

Step 2: Connect to Your AI Client

Add ScreenHand to your MCP config. Here's Claude Code as an example:

// .mcp.json or ~/.claude/settings.json
{
  "mcpServers": {
    "screenhand": {
      "command": "npx",
      "args": ["tsx", "node_modules/screenhand/mcp-desktop.ts"]
    }
  }
}

Works the same way with Claude Desktop, Cursor, Windsurf, and OpenAI Codex CLI. Any MCP client, three lines of config.

Step 3: Automate

Open your AI client and just ask:

ScreenHand translates natural language into native OS actions. No scripting required.

How ScreenHand Compares

ScreenHandScreenshot-Based ToolsBrowser-Only Tools
Speed~50ms per action2-5 seconds~100ms
ScopeAny app + browserAny app (slow)Browser only
TargetingBy element nameBy coordinates (guessing)By CSS selector
ReliabilityHigh (native APIs)Low (layout shifts break it)Medium
PlatformmacOS + WindowsVariesAny
LearningBuilt-in memoryNoneNone
Installnpm i screenhandVariesnpm i

The key advantage: ScreenHand uses native Accessibility APIs to read the actual UI element tree. It knows where every button, text field, and menu item is — by name, not by guessing from pixels. This makes it fast, reliable, and layout-independent.

What's Next

Frequently Asked Questions

Run npm i screenhand, then build the native bridge with npm run build:native (macOS) or npm run build:native:windows (Windows). Add the MCP config to your AI client and you're ready.
Any MCP-compatible client: Claude Desktop, Claude Code, Cursor, Windsurf, OpenAI Codex CLI. Standard MCP over stdio — three lines of JSON config.
Native UI actions take ~50ms via Accessibility APIs. Chrome CDP operations take ~10ms. Screenshots with OCR take ~600ms. This is roughly 100x faster than screenshot-based approaches.
Yes. ScreenHand is free and open-source under the AGPL-3.0 license. Full source at github.com/manushi4/screenhand. Built by Clazro Technology Private Limited.
Yes. macOS uses a Swift bridge with Accessibility APIs. Windows uses a C# (.NET 8) bridge with UI Automation. Same protocol, all 70+ tools work identically on both platforms.
ScreenHand runs entirely locally. No screen data is sent to external servers. All tool calls are audit-logged. On macOS you grant Accessibility permission; on Windows no admin is needed. See the Security Policy for details.
Computer Use is cloud-based screenshot interpretation. ScreenHand is local-first using native OS APIs — ~100x faster, more reliable (targets elements by name, not coordinates), and all data stays on your machine. It also works with any MCP client, not just Claude.

Get Started Now

70+ tools. Native speed. macOS + Windows. Free and open source.