Name: UI-TARS Desktop
Availability: OnlineOnly
Rating: 4.5 (110 reviews)
Author: ByteDance

UI-TARS Desktop

Framework Agnostic Advanced Computer Use Open Source

UI-TARS Desktop is ByteDance's open-source multimodal agent that can see, understand, and interact with desktop applications. It uses vision models to understand the screen, plans actions, and executes them through mouse and keyboard — enabling AI agents that operate any desktop software.

Overview

UI-TARS Desktop brings AI computer use to open source. ByteDance’s multimodal agent sees your screen, understands UI elements, and interacts with any desktop application — handling multi-step tasks across applications.

How It Works

Capture screen — Takes screenshots
Understand — Vision model identifies UI elements
Plan — Determines action sequence
Execute — Mouse clicks, keyboard input, window management

Use Cases

Desktop automation — Automate tasks across any application
Software testing — AI-driven testing of desktop apps
Data entry — Form filling across legacy apps
Process automation — Multi-application workflows

Getting Started

git clone https://github.com/bytedance/UI-TARS-desktop
cd UI-TARS-desktop
npm install && npm run start

Example

Task: "Open Photoshop, create 1920x1080 canvas, add blue gradient"
Agent: Opens Photoshop → New canvas → Gradient tool → Apply
Total: ~30 seconds (vs 2 minutes manually)

Alternatives

CUA — Desktop sandboxes for computer-use agents
Anthropic Computer Use — Claude’s built-in computer use
Open Interpreter — Terminal-based computer control

UI-TARS Desktop

Input / Output

Accepts

Produces

Overview

How It Works

Use Cases

Getting Started

Example

Alternatives

Tags

Compatible Agents

Claude

Gemini

Similar Skills

CUA