Set Up Agent S2: An Open-Source Computer Use Agent
🎯 TLDR
Agent S2 is an open-source computer use agent that controls your desktop through text commands. Set it up locally in 5 minutes or run it on a cloud desktop via Orgo.
# Get started
git clone https://github.com/spencerkinney/agent-s2-example.git
cd agent-s2-example
pip install -r requirements.txt
python agent_s2.py
Note: This AI agent is experimental. It can be slow and may take time to complete actions. Great for learning and building on!
What is Agent S2?
Agent S2 is a state-of-the-art computer use agent from Simular Research. It takes screenshots, understands what's on screen, and performs mouse/keyboard actions based on your natural language commands.
📊 Performance
Agent S2 achieves the highest success rates on the OSWorld benchmark:
Screenshot from Agent S2 GitHub repo
Key achievements:
- 34.5% success rate (50-step evaluation)
- Outperforms OpenAI's Operator (32.6%)
- Beats Claude Computer Use (26.0%)
🧠 How It Works
Agent S2 is a compositional framework using two primary AI models:
Agent S2 architecture showing the flow between LLM, Grounding Model, and Python execution
The loop:
- 📸 Screenshot captured
- 🧠 LLM decides action (i.e.
click firefox
) - 👁️ Grounding model finds coordinates
- 🖱️ Python clicks at
(x, y)
- 🔄 Repeat until done
You can swap models - defaults use GPT-4o + Claude, but any compatible model works.
🚀 Quickstart
Prerequisites
- Python 3.7-3.11
- API Keys (both required):
Step 1: Install
# Clone and enter directory
git clone https://github.com/spencerkinney/agent-s2-example.git
cd agent-s2-example
# Create virtual environment
python3 -m venv venv
# Activate it
source venv/bin/activate # Mac/Linux
# OR
venv\Scripts\activate # Windows
# Install dependencies
pip install -r requirements.txt
Step 2: Configure API Keys
# Copy template
cp .env.example .env
# Edit .env file
nano .env # or use any text editor
Add your keys:
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
Step 3: macOS Permissions
macOS only: Grant accessibility permissions
- Open System Settings → Privacy & Security → Accessibility
- Click + button
- Add Terminal (or your IDE)
- Enable the checkbox ✓
Step 4: Run Agent S2
python agent_s2.py
Try these commands:
"Open Chrome"
"Take a screenshot"
"Create a folder called projects on the desktop"
"Search for weather in San Francisco"
Type exit
to quit.
Heads up: Agent S2 is experimental and can be slow. Simple commands work best. Try it out and build your own improvements!
☁️ Cloud Desktop Mode
Run Agent S2 on a remote Linux desktop instead of your local machine:
1. Get Orgo Account
Sign up at orgo.ai/start
2. Update Configuration
# Add to .env
ORGO_API_KEY=your_orgo_key
USE_CLOUD_ENVIRONMENT=true
3. Run
python agent_s2.py # Now controls cloud desktop
View the desktop in real-time through Orgo's dashboard.
⚙️ Configuration
Model Selection
# Option 1: Use Claude for everything
AGENT_MODEL=claude-3-7-sonnet-20250219
AGENT_MODEL_TYPE=anthropic
# Option 2: Mix and match (default)
AGENT_MODEL=gpt-4o # Planning
GROUNDING_MODEL=claude-3-7-sonnet-20250219 # Vision
Performance Tuning
MAX_STEPS=20 # More steps for complex tasks (default: 10)
STEP_DELAY=0.1 # Faster execution (default: 0.5s)
🎯 Technical Details
Mixture of Grounding
Agent S2 uses three specialist models for precise interactions:
Specialist | Purpose | Example |
---|---|---|
Visual | Find UI elements | "Click the red button" |
Text | Select document text | "Highlight paragraph 2" |
Structural | Handle tables/sheets | "Update cell B4" |
Proactive Planning
Unlike reactive agents, S2 updates its plan after each step:
Agent S2's proactive planning adapts to new observations after each action
This allows recovery from errors and adaptation to unexpected UI changes.
📈 Benchmarks
Benchmark | Agent S2 | Previous Best | Improvement |
---|---|---|---|
OSWorld (50 steps) | 34.5% | 32.6% (OpenAI) | +5.8% |
WindowsAgentArena | 29.8% | 19.5% (NAVI) | +52.8% |
AndroidWorld | 54.3% | 46.8% (UI-TARS) | +16.0% |
🐛 Troubleshooting
macOS: "Permission denied"
Terminal needs Accessibility permissions. See Step 3 above.
"Module not found" error
# Ensure virtual environment is activated
source venv/bin/activate # Mac/Linux
venv\Scripts\activate # Windows
# Reinstall dependencies
pip install -r requirements.txt
Agent clicks wrong location
- Set display scaling to 100%
- Use standard resolution (
1024x768
or1920x1080
) - Check if
GROUNDING_MODEL_RESIZE_WIDTH
matches your resolution
Agent seems stuck or confused
This AI agent is experimental - Agent S2 can be slow and sometimes gets stuck:
- Use simple, specific commands
- Break big tasks into smaller steps
- Restart if it loops
- Perhaps contribute to their GitHub?
⚠️ Safety Notes
Agent S2 has full control of your computer when running:
- ✅ Always supervise during execution
- ✅ Start simple - test basic commands first
- ✅ Use cloud mode for risky operations
- ❌ Never leave unattended
📚 Resources
- 📄 Agent S2 Paper - Technical details
- 💻 GitHub Repository - Source code
- 🔧 Example Code - Get up and running
Agent S2 is experimental and open source. It can be slow but it's fun to tinker with. Your improvements are welcome!
python agent_s2.py # Start here