Maestro: YAML flows for mobile UI testing that actually tolerate flakiness

Maestro is a cross-platform E2E testing framework for Android, iOS, and web apps. You write flows in YAML, run them against a real emulator or simulator, and the engine handles element waiting, retries, and flakiness tolerance without you wiring any of it up.

Why I starred it

Mobile E2E testing has a reputation for being the slowest, most painful part of a CI pipeline. Appium is verbose and setup-heavy. Espresso and XCTest are platform-locked. Most teams end up with either a minimal test suite or tests that work 70% of the time and get ignored when they go red.

Maestro's bet is that the YAML layer should be high enough that you never touch selectors, and the runtime should retry aggressively enough that transient animation delays don't matter. That's a strong claim. I wanted to see whether the code actually delivers it.

How it works

The architecture is a clean pipeline: YAML file → command list → orchestrator → platform driver.

Parsing. YamlCommandReader in maestro-orchestra/src/main/java/maestro/orchestra/yaml/YamlCommandReader.kt reads a flow file and delegates immediately to MaestroFlowParser. That parser uses Jackson with a custom deserializer (YamlFluentCommand) that maps YAML keys to typed command objects — one Kotlin data class per command. The deserializer handles both the string shorthand (- launchApp) and the object form (- tapOn: { text: "Save" }). That mapping is defined in MaestroFlowParser.kt in a stringCommands map that converts bare strings directly to YamlFluentCommand instances, keeping the YAML surface clean while the internal model stays typed.

Orchestration. Orchestra.kt is the core class. Its executeCommand method is a when dispatch over the full sealed command hierarchy — roughly 40 cases covering everything from TapOnElementCommand to SetAirplaneModeCommand to AssertWithAICommand. The comment at the top is explicit: Orchestra knows nothing about specific platforms, only about the Maestro object (which wraps the platform driver) and Sink abstractions for output. That separation is well-maintained.

Flakiness tolerance. MaestroTimer in maestro-utils is the piece that makes the "no manual sleep" promise work:

fun <T> withTimeout(timeoutMs: Long, block: () -> T?): T? {
    val endTime = System.currentTimeMillis() + timeoutMs
    do {
        val result = block()
        if (result != null) return result
    } while (System.currentTimeMillis() < endTime)
    return null
}

It's a tight poll loop. Every element lookup runs inside this, with a 17-second default timeout (lookupTimeoutMs = 17000L in Orchestra's constructor). That's long enough to cover most real-world animation and network delays. The coroutine-aware variant adds a yield() checkpoint each iteration, which landed in a recent commit (feat: make Maestro natively suspend-aware).

MCP server. There's an mcp module in the CLI with a McpServer.kt that exposes Maestro as an MCP server over stdio. Tools registered include TapOnTool, InputTextTool, InspectViewHierarchyTool, RunFlowTool, and QueryDocsTool. This means you can drive a device or run flows directly from an AI agent — Cursor, Claude Code, whatever. That's a non-obvious integration that most testing tools don't bother with.

Using it

Installation is a single curl:

curl -fsSL "https://get.maestro.mobile.dev" | bash

A basic flow file:

appId: com.android.contacts
---
- launchApp
- tapOn: "Create new contact"
- tapOn: "First Name"
- inputText: "John"
- tapOn: "Last Name"
- inputText: "Snow"
- tapOn: "Save"
- assertVisible: "John Snow"

Run it:

maestro test flow_contacts.yaml

Maestro connects to the running emulator, executes the commands, and reports pass/fail with a live ANSI view. The --continuous flag watches the file for changes and re-runs on save — useful when authoring tests.

For AI assertions (requires an API key):

- assertWithAI: "The contact form shows a success confirmation"

That routes through CloudAIPredictionEngine in maestro-ai, sending a screenshot to their inference endpoint and checking the assertion against it.

Rough edges

The 17-second element lookup timeout is a sensible default, but it's not per-command configurable in the YAML — you set it globally or not at all. On slow CI machines with sluggish emulators, you'll hit edge cases where something takes 18 seconds and the test fails with no obvious explanation.

The AI assertion feature (assertWithAI, assertNoDefectsWithAI) depends on Maestro's own cloud endpoint, not a bring-your-own-key model. That's a coupling that may matter if you're running in an air-gapped environment.

Maestro Studio (the visual IDE) is not open source. The CLI is Apache 2.0 and fully inspectable, but Studio lives outside this repo. That's a notable split — the tool that most non-engineers would use to author tests is proprietary.

The web support (maestro-web) exists but is clearly less mature than the Android and iOS paths. If you're primarily testing web apps, Playwright is still the better option.

Bottom line

Maestro is worth evaluating if you have a React Native or Flutter app and want E2E coverage that doesn't require platform-specific expertise to write. The YAML surface is genuinely minimal, the flakiness tolerance is built into the runtime rather than being a wrapper you add, and the MCP integration makes it composable with AI workflows in ways most testing frameworks haven't thought about yet.