ScreenHand is Now on npm: AI Desktop Automation in One Command
npm i screenhand to install the open-source MCP server that gives Claude, Cursor, and any AI agent the ability to see your screen, click buttons, type text, and control any desktop app. 70+ tools. Native speed. macOS + Windows.
npm i screenhand
v0.1.1 · 73 files · 521 KB · AGPL-3.0
Why This Matters
AI assistants are incredibly powerful at reasoning, writing code, and answering questions. But they have a fundamental limitation: they can't interact with your computer. They can't see what's on your screen. They can't click a button. They can't fill out a form.
ScreenHand removes that wall. It's an MCP server that connects AI agents to your desktop through native OS APIs. Once installed, your AI can see, click, type, and control any application — at native speed.
And now it's on npm, which means getting started takes one command.
What's in the Package
ScreenHand v0.1.1 ships with 70+ tools organized by what you need to do:
Screen Vision
Screenshots, OCR, bounding boxes. See everything visible on screen.
Native App Control
Read UI trees, click buttons by name, set values, trigger menus. ~50ms per action.
Keyboard & Mouse
Click, type, drag, scroll, key combos. Full input simulation.
Chrome Browser (CDP)
Navigate, run JS, query DOM, fill forms. Chrome DevTools Protocol at ~10ms.
Learning Memory
Auto-learns successful strategies. O(1) recall. Gets better over time.
Cross-Platform
Swift bridge on macOS, C# bridge on Windows. Same tools on both.
Install and Connect in 2 Minutes
Step 1: Install
npm i screenhand
cd node_modules/screenhand
npm run build:native # macOS
# npm run build:native:windows # Windows
Step 2: Connect to Your AI Client
Add ScreenHand to your MCP config. Here's Claude Code as an example:
// .mcp.json or ~/.claude/settings.json
{
"mcpServers": {
"screenhand": {
"command": "npx",
"args": ["tsx", "node_modules/screenhand/mcp-desktop.ts"]
}
}
}
Works the same way with Claude Desktop, Cursor, Windsurf, and OpenAI Codex CLI. Any MCP client, three lines of config.
Step 3: Automate
Open your AI client and just ask:
- "Open Chrome and search for flights to Delhi"
- "Fill out this signup form with my details"
- "Read what's on my screen right now"
- "Export this spreadsheet as PDF"
- "Check all open tabs and list their titles"
ScreenHand translates natural language into native OS actions. No scripting required.
How ScreenHand Compares
| ScreenHand | Screenshot-Based Tools | Browser-Only Tools | |
|---|---|---|---|
| Speed | ~50ms per action | 2-5 seconds | ~100ms |
| Scope | Any app + browser | Any app (slow) | Browser only |
| Targeting | By element name | By coordinates (guessing) | By CSS selector |
| Reliability | High (native APIs) | Low (layout shifts break it) | Medium |
| Platform | macOS + Windows | Varies | Any |
| Learning | Built-in memory | None | None |
| Install | npm i screenhand | Varies | npm i |
The key advantage: ScreenHand uses native Accessibility APIs to read the actual UI element tree. It knows where every button, text field, and menu item is — by name, not by guessing from pixels. This makes it fast, reliable, and layout-independent.
What's Next
- Simpler install — Working toward
npx screenhandas a zero-config launcher - Pre-built binaries — No more building the native bridge from source
- More platform playbooks — Pre-built automation guides for common apps and websites
- Claude Tasks integration — Scheduled desktop automation with Anthropic's new Cron feature
Frequently Asked Questions
npm i screenhand, then build the native bridge with npm run build:native (macOS) or npm run build:native:windows (Windows). Add the MCP config to your AI client and you're ready.Get Started Now
70+ tools. Native speed. macOS + Windows. Free and open source.