ScreenHand is an open-source MCP (Model Context Protocol) server that gives AI agents like Claude, Cursor, and Codex CLI the ability to see and control your desktop. It provides 70+ tools for screenshots, OCR, native UI control, keyboard/mouse input, Chrome browser automation, and cross-app workflows on macOS and Windows.

What version of ScreenHand is on npm?

ScreenHand v0.1.1 is the current version on npm. It includes 70+ automation tools, AGPL-3.0 licensing, native Accessibility API support for macOS and Windows, Chrome DevTools Protocol integration, and built-in learning memory.

Is ScreenHand free to use?

Yes. ScreenHand is free and open-source under the AGPL-3.0 license. The full source code is available on GitHub at github.com/manushi4/screenhand. Built by Clazro Technology Private Limited.

What AI clients work with ScreenHand?

ScreenHand works with any MCP-compatible AI client: Claude Desktop, Claude Code, Cursor, Windsurf, OpenAI Codex CLI, and any other tool that supports the Model Context Protocol. Setup requires just a few lines of JSON config.

Does ScreenHand work on both macOS and Windows?

Yes. On macOS, ScreenHand uses a Swift native bridge with Accessibility APIs. On Windows, it uses a C# (.NET 8) bridge with UI Automation. Both platforms use the same JSON-RPC protocol, so all 70+ tools work identically on both operating systems.

What can ScreenHand do that other MCP servers cannot?

ScreenHand provides full desktop control through native OS APIs — not just browser automation. It can see the screen via OCR, read UI element trees via Accessibility APIs, click buttons by name (not coordinates), type text, control Chrome via CDP, run AppleScript, and chain actions across multiple apps. It also learns from past sessions and remembers successful strategies.

Release

ScreenHand is Now on npm: AI Desktop Automation in One Command

Q: How do I install ScreenHand from npm?

Run 'npm i screenhand' to install ScreenHand. Then build the native bridge with 'npm run build:native' (macOS) or 'npm run build:native:windows' (Windows). Connect it to your AI client via MCP config and start automating.

Q: How fast is ScreenHand compared to other desktop automation tools?

ScreenHand uses native OS Accessibility APIs, achieving ~50ms per UI action. Chrome DevTools Protocol operations take ~10ms. Screenshots with OCR take ~600ms. This is roughly 100x faster than screenshot-based approaches like Computer Use that take 2-5 seconds per action.

March 7, 2026 · 5 min read · By Clazro Technology

TL;DR: ScreenHand v0.1.1 is live on npm. Run npm i screenhand to install the open-source MCP server that gives Claude, Cursor, and any AI agent the ability to see your screen, click buttons, type text, and control any desktop app. 70+ tools. Native speed. macOS + Windows.

npm i screenhand

v0.1.1 · 73 files · 521 KB · AGPL-3.0

Why This Matters

AI assistants are incredibly powerful at reasoning, writing code, and answering questions. But they have a fundamental limitation: they can't interact with your computer. They can't see what's on your screen. They can't click a button. They can't fill out a form.

ScreenHand removes that wall. It's an MCP server that connects AI agents to your desktop through native OS APIs. Once installed, your AI can see, click, type, and control any application — at native speed.

And now it's on npm, which means getting started takes one command.

What's in the Package

ScreenHand v0.1.1 ships with 70+ tools organized by what you need to do:

Screen Vision

Screenshots, OCR, bounding boxes. See everything visible on screen.

Native App Control

Read UI trees, click buttons by name, set values, trigger menus. ~50ms per action.

Keyboard & Mouse

Click, type, drag, scroll, key combos. Full input simulation.

Chrome Browser (CDP)

Navigate, run JS, query DOM, fill forms. Chrome DevTools Protocol at ~10ms.

Learning Memory

Auto-learns successful strategies. O(1) recall. Gets better over time.

Cross-Platform

Swift bridge on macOS, C# bridge on Windows. Same tools on both.

Install and Connect in 2 Minutes

Step 1: Install

npm i screenhand
cd node_modules/screenhand
npm run build:native         # macOS
# npm run build:native:windows  # Windows

Step 2: Connect to Your AI Client

Add ScreenHand to your MCP config. Here's Claude Code as an example:

// .mcp.json or ~/.claude/settings.json
{
  "mcpServers": {
    "screenhand": {
      "command": "npx",
      "args": ["tsx", "node_modules/screenhand/mcp-desktop.ts"]
    }
  }
}

Works the same way with Claude Desktop, Cursor, Windsurf, and OpenAI Codex CLI. Any MCP client, three lines of config.

Step 3: Automate

Open your AI client and just ask:

"Open Chrome and search for flights to Delhi"
"Fill out this signup form with my details"
"Read what's on my screen right now"
"Export this spreadsheet as PDF"
"Check all open tabs and list their titles"

ScreenHand translates natural language into native OS actions. No scripting required.

How ScreenHand Compares

	ScreenHand	Screenshot-Based Tools	Browser-Only Tools
Speed	~50ms per action	2-5 seconds	~100ms
Scope	Any app + browser	Any app (slow)	Browser only
Targeting	By element name	By coordinates (guessing)	By CSS selector
Reliability	High (native APIs)	Low (layout shifts break it)	Medium
Platform	macOS + Windows	Varies	Any
Learning	Built-in memory	None	None
Install	`npm i screenhand`	Varies	`npm i`

The key advantage: ScreenHand uses native Accessibility APIs to read the actual UI element tree. It knows where every button, text field, and menu item is — by name, not by guessing from pixels. This makes it fast, reliable, and layout-independent.

What's Next

Simpler install — Working toward npx screenhand as a zero-config launcher
Pre-built binaries — No more building the native bridge from source
More platform playbooks — Pre-built automation guides for common apps and websites
Claude Tasks integration — Scheduled desktop automation with Anthropic's new Cron feature

Frequently Asked Questions

Run npm i screenhand, then build the native bridge with npm run build:native (macOS) or npm run build:native:windows (Windows). Add the MCP config to your AI client and you're ready.

Any MCP-compatible client: Claude Desktop, Claude Code, Cursor, Windsurf, OpenAI Codex CLI. Standard MCP over stdio — three lines of JSON config.

Native UI actions take ~50ms via Accessibility APIs. Chrome CDP operations take ~10ms. Screenshots with OCR take ~600ms. This is roughly 100x faster than screenshot-based approaches.

Yes. ScreenHand is free and open-source under the AGPL-3.0 license. Full source at github.com/manushi4/screenhand. Built by Clazro Technology Private Limited.

Yes. macOS uses a Swift bridge with Accessibility APIs. Windows uses a C# (.NET 8) bridge with UI Automation. Same protocol, all 70+ tools work identically on both platforms.

ScreenHand runs entirely locally. No screen data is sent to external servers. All tool calls are audit-logged. On macOS you grant Accessibility permission; on Windows no admin is needed. See the Security Policy for details.

Computer Use is cloud-based screenshot interpretation. ScreenHand is local-first using native OS APIs — ~100x faster, more reliable (targets elements by name, not coordinates), and all data stays on your machine. It also works with any MCP client, not just Claude.

Get Started Now

70+ tools. Native speed. macOS + Windows. Free and open source.

View on npm View on GitHub