Orgo

Set Up Agent S2: An Open-Source Computer Use Agent

🎯 TLDR

Agent S2 is an open-source computer use agent that controls your desktop through text commands. Set it up locally in 5 minutes or run it on a cloud desktop via Orgo.

# Get started
git clone https://github.com/spencerkinney/agent-s2-example.git
cd agent-s2-example
pip install -r requirements.txt
python agent_s2.py

Note: This AI agent is experimental. It can be slow and may take time to complete actions. Great for learning and building on!


What is Agent S2?

Agent S2 is a state-of-the-art computer use agent from Simular Research. It takes screenshots, understands what's on screen, and performs mouse/keyboard actions based on your natural language commands.

πŸ“Š Performance

Agent S2 achieves the highest success rates on the OSWorld benchmark:

Agent S2 Performance on OSWorld Benchmark
Screenshot from Agent S2 GitHub repo

Key achievements:

  • 34.5% success rate (50-step evaluation)
  • Outperforms OpenAI's Operator (32.6%)
  • Beats Claude Computer Use (26.0%)

🧠 How It Works

Agent S2 is a compositional framework using two primary AI models:

Agent S2 Flow Diagram
Agent S2 architecture showing the flow between LLM, Grounding Model, and Python execution

The loop:

  1. πŸ“Έ Screenshot captured
  2. 🧠 LLM decides action (i.e. click firefox)
  3. πŸ‘οΈ Grounding model finds coordinates
  4. πŸ–±οΈ Python clicks at (x, y)
  5. πŸ”„ Repeat until done

You can swap models - defaults use GPT-4o + Claude, but any compatible model works.


πŸš€ Quickstart

Prerequisites

Step 1: Install

# Clone and enter directory
git clone https://github.com/spencerkinney/agent-s2-example.git
cd agent-s2-example
 
# Create virtual environment
python3 -m venv venv
 
# Activate it
source venv/bin/activate  # Mac/Linux
# OR
venv\Scripts\activate     # Windows
 
# Install dependencies
pip install -r requirements.txt

Step 2: Configure API Keys

# Copy template
cp .env.example .env
 
# Edit .env file
nano .env  # or use any text editor

Add your keys:

OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...

Step 3: macOS Permissions

macOS only: Grant accessibility permissions

  1. Open System Settings β†’ Privacy & Security β†’ Accessibility
  2. Click + button
  3. Add Terminal (or your IDE)
  4. Enable the checkbox βœ“

Step 4: Run Agent S2

python agent_s2.py

Try these commands:

"Open Chrome"
"Take a screenshot" 
"Create a folder called projects on the desktop"
"Search for weather in San Francisco"

Type exit to quit.

Heads up: Agent S2 is experimental and can be slow. Simple commands work best. Try it out and build your own improvements!


☁️ Cloud Desktop Mode

Run Agent S2 on a remote Linux desktop instead of your local machine:

1. Get Orgo Account

Sign up at orgo.ai/start

2. Update Configuration

# Add to .env
ORGO_API_KEY=your_orgo_key
USE_CLOUD_ENVIRONMENT=true

3. Run

python agent_s2.py  # Now controls cloud desktop

View the desktop in real-time through Orgo's dashboard.


βš™οΈ Configuration

Model Selection

# Option 1: Use Claude for everything
AGENT_MODEL=claude-3-7-sonnet-20250219
AGENT_MODEL_TYPE=anthropic
 
# Option 2: Mix and match (default)
AGENT_MODEL=gpt-4o                      # Planning
GROUNDING_MODEL=claude-3-7-sonnet-20250219  # Vision

Performance Tuning

MAX_STEPS=20      # More steps for complex tasks (default: 10)
STEP_DELAY=0.1    # Faster execution (default: 0.5s)

🎯 Technical Details

Mixture of Grounding

Agent S2 uses three specialist models for precise interactions:

SpecialistPurposeExample
VisualFind UI elements"Click the red button"
TextSelect document text"Highlight paragraph 2"
StructuralHandle tables/sheets"Update cell B4"

Proactive Planning

Unlike reactive agents, S2 updates its plan after each step:

Proactive Planning Process
Agent S2's proactive planning adapts to new observations after each action

This allows recovery from errors and adaptation to unexpected UI changes.


πŸ“ˆ Benchmarks

BenchmarkAgent S2Previous BestImprovement
OSWorld (50 steps)34.5%32.6% (OpenAI)+5.8%
WindowsAgentArena29.8%19.5% (NAVI)+52.8%
AndroidWorld54.3%46.8% (UI-TARS)+16.0%

πŸ› Troubleshooting

macOS: "Permission denied"

Terminal needs Accessibility permissions. See Step 3 above.

"Module not found" error
# Ensure virtual environment is activated
source venv/bin/activate  # Mac/Linux
venv\Scripts\activate     # Windows
 
# Reinstall dependencies
pip install -r requirements.txt
Agent clicks wrong location
  • Set display scaling to 100%
  • Use standard resolution (1024x768 or 1920x1080)
  • Check if GROUNDING_MODEL_RESIZE_WIDTH matches your resolution
Agent seems stuck or confused

This AI agent is experimental - Agent S2 can be slow and sometimes gets stuck:

  • Use simple, specific commands
  • Break big tasks into smaller steps
  • Restart if it loops
  • Perhaps contribute to their GitHub?

⚠️ Safety Notes

Agent S2 has full control of your computer when running:

  • βœ… Always supervise during execution
  • βœ… Start simple - test basic commands first
  • βœ… Use cloud mode for risky operations
  • ❌ Never leave unattended

πŸ“š Resources


Agent S2 is experimental and open source. It can be slow but it's fun to tinker with. Your improvements are welcome!

python agent_s2.py  # Start here