Set Up Anthropic Computer Use with Orgo in 30 Seconds
This tutorial is also a YouTube video. You can watch that if you want to speedrun the setup or read this article for more of a step-by-step setup.
TLDR
This guide shows you how to set up a virtual computer that Claude can control through simple English commands. We'll install the Orgo library, get API keys, and write some Python code to have the AI model Claude control a computer.
This guide shows you how to quickly set up Anthropic's Computer Use API with Orgo, allowing Claude to control a virtual desktop. We'll write some Python code and walk through setting up the dependencies, getting our API keys, launching a new headless VM with Orgo and using Claude to prompt our new computer with English.
For those of you that don't know, Anthropic released back in the end of 2024 the first iteration of an AI model custom trained to control a computer. AI models to control computers have been improving quickly. You can check out the current leaderboard for these models here: https://os-world.github.io/
Prerequisites
You'll need:
- Python 3.7+
- Anthropic API key (Claude model access)
- Orgo API key
Setup
First, create a Python virtual environment and activate it:
MacOS/Linux:
python3 -m venv venv
source venv/bin/activate
Windows:
python -m venv venv
venv\Scripts\activate
Install the required packages:
pip install orgo anthropic python-dotenv
Create a .env
file in your project directory:
ORGO_API_KEY=your_orgo_api_key
ANTHROPIC_API_KEY=your_anthropic_api_key
You can get an Orgo API key by signing up at orgo.ai/start and going to the settings page.
For the Anthropic API key, visit the Anthropic Console.
Basic Usage
Here's a simple script to get started:
# basic_computer_use.py
from orgo import Computer
from dotenv import load_dotenv
# Load environment variables
load_dotenv()
# Prompt Claude to open up Firefox
computer = Computer()
computer.prompt("Open up firefox")
# Or you can use the direct API methods
# if you are building your own agent
#computer.left_click(200, 200)
#computer.scroll("down", 1)
#computer.key("ctrl+c")
#computer.bash("ls -l")
#computer.screenshot()
Run the script:
python basic_computer_use.py
This is from the Orgo dashboard, the output should look something like:
We prompted Claude to open Firefox. To achieve this, Claude decided to double click the Firefox icon and screenshot to get the latest screen state.
How It Works
When you run the code, Orgo creates a virtual desktop environment that Claude can control. The prompt()
method lets you give natural language instructions to Claude, which then uses Computer Use capabilities to interact with the desktop.
Behind the scenes, Claude:
- Observes the screen (through screenshots)
- Plans the necessary actions to complete your request
- Executes actions like clicking, typing, and navigating
Direct Control Methods
Beyond natural language prompts, you can directly control the computer with methods like:
computer.left_click(x, y)
- Mouse click at coordinatescomputer.type("text")
- Type textcomputer.key("ctrl+c")
- Press key combinationscomputer.screenshot()
- Take a screenshotcomputer.bash("command")
- Run a bash command
Additional Resources
Well, what now?
You can create a more complex setup by chaining multiple commands or creating a custom agent loop.
Orgo handles the desktop environment hosting, while Anthropic's Claude provides the intelligence to navigate and interact with it. You can also download Anthropic's custom docker image from here if you don't want to use Orgo and instead just want to self host a local setup.
If you want to dive deeper, check out the complete code example in the GitHub repository.