UI-TARS Desktop is ByteDance's open-source multimodal agent that can see, understand, and interact with desktop applications. It uses vision models to understand the screen, plans actions, and executes them through mouse and keyboard — enabling AI agents that operate any desktop software.
UI-TARS Desktop brings AI computer use to open source. ByteDance’s multimodal agent sees your screen, understands UI elements, and interacts with any desktop application — handling multi-step tasks across applications.
git clone https://github.com/bytedance/UI-TARS-desktop
cd UI-TARS-desktop
npm install && npm run start
Task: "Open Photoshop, create 1920x1080 canvas, add blue gradient"
Agent: Opens Photoshop → New canvas → Gradient tool → Apply
Total: ~30 seconds (vs 2 minutes manually)
AI agents that work well with UI-TARS Desktop.