Set Up Agent S2: An Open-Source Computer Use Agent

🎯 TLDR

Agent S2 is an open-source computer use agent that controls your desktop through text commands. Set it up locally in 5 minutes or run it on a cloud desktop via Orgo.

# Get started
git clone https://github.com/spencerkinney/agent-s2-example.git
cd agent-s2-example
pip install -r requirements.txt
python agent_s2.py

Note: This AI agent is experimental. It can be slow and may take time to complete actions. Great for learning and building on!


What is Agent S2?

Agent S2 is a state-of-the-art computer use agent from Simular Research. It takes screenshots, understands what's on screen, and performs mouse/keyboard actions based on your natural language commands.

📊 Performance

Agent S2 achieves the highest success rates on the OSWorld benchmark:

Agent S2 Performance on OSWorld Benchmark Screenshot from Agent S2 GitHub repo

Key achievements:

  • 34.5% success rate (50-step evaluation)
  • Outperforms OpenAI's Operator (32.6%)
  • Beats Claude Computer Use (26.0%)

🧠 How It Works

Agent S2 is a compositional framework using two primary AI models:

Agent S2 Flow Diagram Agent S2 architecture showing the flow between LLM, Grounding Model, and Python execution

The loop:

  1. 📸 Screenshot captured
  2. 🧠 LLM decides action (i.e. click firefox)
  3. 👁️ Grounding model finds coordinates
  4. 🖱️ Python clicks at (x, y)
  5. 🔄 Repeat until done

You can swap models - defaults use GPT-4o + Claude, but any compatible model works.


🚀 Quickstart

Prerequisites

Step 1: Install

# Clone and enter directory
git clone https://github.com/spencerkinney/agent-s2-example.git
cd agent-s2-example
 
# Create virtual environment
python3 -m venv venv
 
# Activate it
source venv/bin/activate  # Mac/Linux
# OR
venv\Scripts\activate     # Windows
 
# Install dependencies
pip install -r requirements.txt

Step 2: Configure API Keys

# Copy template
cp .env.example .env
 
# Edit .env file
nano .env  # or use any text editor

Add your keys:

OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...

Step 3: macOS Permissions

macOS only: Grant accessibility permissions

  1. Open System Settings → Privacy & Security → Accessibility
  2. Click + button
  3. Add Terminal (or your IDE)
  4. Enable the checkbox ✓

Step 4: Run Agent S2

python agent_s2.py

Try these commands:

"Open Chrome"
"Take a screenshot" 
"Create a folder called projects on the desktop"
"Search for weather in San Francisco"

Type exit to quit.

Heads up: Agent S2 is experimental and can be slow. Simple commands work best. Try it out and build your own improvements!


☁️ Cloud Desktop Mode

Run Agent S2 on a remote Linux desktop instead of your local machine:

1. Get Orgo Account

Sign up at orgo.ai/start

2. Update Configuration

# Add to .env
ORGO_API_KEY=your_orgo_key
USE_CLOUD_ENVIRONMENT=true

3. Run

python agent_s2.py  # Now controls cloud desktop

View the desktop in real-time through Orgo's dashboard.


⚙️ Configuration

Model Selection

# Option 1: Use Claude for everything
AGENT_MODEL=claude-3-7-sonnet-20250219
AGENT_MODEL_TYPE=anthropic
 
# Option 2: Mix and match (default)
AGENT_MODEL=gpt-4o                      # Planning
GROUNDING_MODEL=claude-3-7-sonnet-20250219  # Vision

Performance Tuning

MAX_STEPS=20      # More steps for complex tasks (default: 10)
STEP_DELAY=0.1    # Faster execution (default: 0.5s)

🎯 Technical Details

Mixture of Grounding

Agent S2 uses three specialist models for precise interactions:

SpecialistPurposeExample
VisualFind UI elements"Click the red button"
TextSelect document text"Highlight paragraph 2"
StructuralHandle tables/sheets"Update cell B4"

Proactive Planning

Unlike reactive agents, S2 updates its plan after each step:

Proactive Planning Process Agent S2's proactive planning adapts to new observations after each action

This allows recovery from errors and adaptation to unexpected UI changes.


📈 Benchmarks

BenchmarkAgent S2Previous BestImprovement
OSWorld (50 steps)34.5%32.6% (OpenAI)+5.8%
WindowsAgentArena29.8%19.5% (NAVI)+52.8%
AndroidWorld54.3%46.8% (UI-TARS)+16.0%

🐛 Troubleshooting

macOS: "Permission denied"

Terminal needs Accessibility permissions. See Step 3 above.

"Module not found" error
# Ensure virtual environment is activated
source venv/bin/activate  # Mac/Linux
venv\Scripts\activate     # Windows
 
# Reinstall dependencies
pip install -r requirements.txt
Agent clicks wrong location
  • Set display scaling to 100%
  • Use standard resolution (1024x768 or 1920x1080)
  • Check if GROUNDING_MODEL_RESIZE_WIDTH matches your resolution
Agent seems stuck or confused

This AI agent is experimental - Agent S2 can be slow and sometimes gets stuck:

  • Use simple, specific commands
  • Break big tasks into smaller steps
  • Restart if it loops
  • Perhaps contribute to their GitHub?

⚠️ Safety Notes

Agent S2 has full control of your computer when running:

  • Always supervise during execution
  • Start simple - test basic commands first
  • Use cloud mode for risky operations
  • Never leave unattended

📚 Resources


Agent S2 is experimental and open source. It can be slow but it's fun to tinker with. Your improvements are welcome!

python agent_s2.py  # Start here