Set Up Agent S2: An Open-Source Computer Use Agent

June 27, 2025

🎯 TLDR

Agent S2 is an open-source computer use agent that controls your desktop through text commands. Set it up locally in 5 minutes or run it on a cloud desktop via Orgo.

# Get started
git clone https://github.com/spencerkinney/agent-s2-example.git
cd agent-s2-example
pip install -r requirements.txt
python agent_s2.py

Note: This AI agent is experimental. It can be slow and may take time to complete actions. Great for learning and building on!

What is Agent S2?

Agent S2 is a state-of-the-art computer use agent from Simular Research. It takes screenshots, understands what's on screen, and performs mouse/keyboard actions based on your natural language commands.

📊 Performance

Agent S2 achieves the highest success rates on the OSWorld benchmark:

Agent S2 Performance on OSWorld Benchmark Screenshot from Agent S2 GitHub repo

Key achievements:

34.5% success rate (50-step evaluation)
Outperforms OpenAI's Operator (32.6%)
Beats Claude Computer Use (26.0%)

🧠 How It Works

Agent S2 is a compositional framework using two primary AI models:

Agent S2 Flow Diagram Agent S2 architecture showing the flow between LLM, Grounding Model, and Python execution

The loop:

📸 Screenshot captured
🧠 LLM decides action (i.e. click firefox)
👁️ Grounding model finds coordinates
🖱️ Python clicks at (x, y)
🔄 Repeat until done

You can swap models - defaults use GPT-4o + Claude, but any compatible model works.

🚀 Quickstart

Prerequisites

Python 3.7-3.11
API Keys (both required):
- OpenAI API key 💳
- Anthropic API key 💳

Step 1: Install

# Clone and enter directory
git clone https://github.com/spencerkinney/agent-s2-example.git
cd agent-s2-example
 
# Create virtual environment
python3 -m venv venv
 
# Activate it
source venv/bin/activate  # Mac/Linux
# OR
venv\Scripts\activate     # Windows
 
# Install dependencies
pip install -r requirements.txt

Step 2: Configure API Keys

# Copy template
cp .env.example .env
 
# Edit .env file
nano .env  # or use any text editor

Add your keys:

OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...

Step 3: macOS Permissions

macOS only: Grant accessibility permissions

Open System Settings → Privacy & Security → Accessibility
Click + button
Add Terminal (or your IDE)
Enable the checkbox ✓

Step 4: Run Agent S2

python agent_s2.py

Try these commands:

"Open Chrome"
"Take a screenshot" 
"Create a folder called projects on the desktop"
"Search for weather in San Francisco"

Type exit to quit.

Heads up: Agent S2 is experimental and can be slow. Simple commands work best. Try it out and build your own improvements!

☁️ Cloud Desktop Mode

Run Agent S2 on a remote Linux desktop instead of your local machine:

1. Get Orgo Account

2. Update Configuration

# Add to .env
ORGO_API_KEY=your_orgo_key
USE_CLOUD_ENVIRONMENT=true

3. Run

python agent_s2.py  # Now controls cloud desktop

View the desktop in real-time through Orgo's dashboard.

⚙️ Configuration

Model Selection

# Option 1: Use Claude for everything
AGENT_MODEL=claude-3-7-sonnet-20250219
AGENT_MODEL_TYPE=anthropic
 
# Option 2: Mix and match (default)
AGENT_MODEL=gpt-4o                      # Planning
GROUNDING_MODEL=claude-3-7-sonnet-20250219  # Vision

Performance Tuning

MAX_STEPS=20      # More steps for complex tasks (default: 10)
STEP_DELAY=0.1    # Faster execution (default: 0.5s)

🎯 Technical Details

Mixture of Grounding

Agent S2 uses three specialist models for precise interactions:

Specialist	Purpose	Example
Visual	Find UI elements	"Click the red button"
Text	Select document text	"Highlight paragraph 2"
Structural	Handle tables/sheets	"Update cell B4"

Proactive Planning

Unlike reactive agents, S2 updates its plan after each step:

Proactive Planning Process Agent S2's proactive planning adapts to new observations after each action

This allows recovery from errors and adaptation to unexpected UI changes.

📈 Benchmarks

Benchmark	Agent S2	Previous Best	Improvement
OSWorld (50 steps)	34.5%	32.6% (OpenAI)	+5.8%
WindowsAgentArena	29.8%	19.5% (NAVI)	+52.8%
AndroidWorld	54.3%	46.8% (UI-TARS)	+16.0%

🐛 Troubleshooting

macOS: "Permission denied"

Terminal needs Accessibility permissions. See Step 3 above.

"Module not found" error

# Ensure virtual environment is activated
source venv/bin/activate  # Mac/Linux
venv\Scripts\activate     # Windows
 
# Reinstall dependencies
pip install -r requirements.txt

Agent clicks wrong location

Set display scaling to 100%
Use standard resolution (1024x768 or 1920x1080)
Check if GROUNDING_MODEL_RESIZE_WIDTH matches your resolution

Agent seems stuck or confused

This AI agent is experimental - Agent S2 can be slow and sometimes gets stuck:

Use simple, specific commands
Break big tasks into smaller steps
Restart if it loops
Perhaps contribute to their GitHub?

⚠️ Safety Notes

Agent S2 has full control of your computer when running:

✅ Always supervise during execution
✅ Start simple - test basic commands first
✅ Use cloud mode for risky operations
❌ Never leave unattended

📚 Resources

📄 Agent S2 Paper - Technical details
💻 GitHub Repository - Source code
🔧 Example Code - Get up and running

Agent S2 is experimental and open source. It can be slow but it's fun to tinker with. Your improvements are welcome!

python agent_s2.py  # Start here