Web testing is rarely just web anymore.
A checkout flow that starts in a React component, hands off to a third-party payment WebView, and returns to a native confirmation screen. An enterprise portal accessible only through Remote Desktop, with no DOM to query. A workflow that ends with a file dialog or an OS-level authentication prompt outside the browser entirely.
Most web automation tools were built for the simple case. The simple case is increasingly rare.
A computer-use agent starts from a different assumption. If it is visible on screen, it can be tested.
How AskUI Runs on Web
AskUI deploys a computer-use agent that observes the screen, reasons about what it sees, and acts through OS-level input. The same loop a human tester runs, just automated.
The agent does not depend on DOM access or stable element identifiers. It reads what is visible, which means it works where standard web automation tools stop: legacy portals, mixed-architecture apps, and workflows that leave the browser mid-flow.
Tests are written in plain English as Markdown or CSV files. No code translation required. The agent finds elements on screen the same way a tester would.
Where Web Testing Gets Complicated
Flows That Leave the Browser
File download dialogs. OS-level authentication prompts. App-to-app handoffs. These steps happen outside the browser context, and outside the reach of tools that instrument only the browser.
Algorithmic automation can only handle what it was programmed to expect. When a workflow crosses into OS-level surfaces, there is no fallback.
The agent operates at the OS level. It sees the full screen regardless of whether the active surface is a browser tab, a system dialog, or a desktop application.
API-Free and Legacy Environments
Some enterprise web applications cannot be reached through standard automation paths. SaaS products accessed via Remote Desktop. Legacy portals with no API access, or web applications running inside Remote Desktop where the DOM cannot be reached from outside the session. Systems where the only path in is the same path a human operator uses.
The agent connects via AgentOS, captures the screen, and interacts through OS-level input. No DOM access required. No changes to the target system.
React, WebViews, and Mixed Architecture
Modern web applications frequently combine native components, embedded WebViews, and third-party SDKs in a single flow. Each layer boundary is a potential failure point for script-based automation that requires every UI state to be defined in advance.
The agent does not distinguish between layers. It reads what is on screen and interacts with it, regardless of what is rendering underneath.
What the Test Project Looks Like
Everything the agent needs lives in plain text files. The folder structure determines what runs and in what order.
├── prompts/
│ ├── device_information.md # browser + OS details
│ ├── ui_information.md # app-specific concepts
│ └── report_format.md
├── procedures/
│ └── login.md
├── plans/
│ └── regression.md
└── tests/
└── your_web_app/
├── setup.md
├── rules.md
└── checkout_flow.mdui_information.md tells the agent how the application works:
# Application UI Concepts
The app has three primary sections: Products, Cart, and Account.
Login state is shown in the top-right corner.
Payment step opens in an embedded WebView —
wait for the "Pay Now" button before proceeding.A test file looks like this:
# Test: Verify checkout completes successfully
## Preconditions
- User is logged in
- At least one item is in the cart
## Steps
1. Navigate to the cart
2. Click Proceed to Checkout
3. Enter shipping details
4. Complete payment in the payment screen
5. Wait for the order confirmation page
## Postconditions
- Order confirmation number is visible
- Confirmation email is noted as sentQA engineers, domain experts, and testers who know the application can write and maintain tests in plain text. No scripting or automation expertise required.
Deployment
AgentOS connects the agent to the browser and OS. Two configurations:
Host mode connects AgentOS on the same machine as the browser. Standard for CI pipelines and local development environments.
Companion mode runs AgentOS on a separate machine, connected via USB HID and HDMI capture. Used for Remote Desktop environments, locked-down enterprise systems, or cases where software installation on the target machine is not possible. The target system stays untouched.
Same SDK, same tests, same files. Only the connection changes.
Common Questions About Web Agent Testing
What is web agent testing?
Web agent testing is an approach to web test automation where a computer-use agent observes the screen and acts through OS-level input. The agent works from what is visible on screen not from the application's internal structure, making it applicable to web environments where standard automation paths are unavailable or unreliable.
Can AskUI test web applications accessed via Remote Desktop or VDI?
Yes. AgentOS connects via Companion Mode screen capture for display, OS-level input for interaction. The agent does not require DOM access, which makes it compatible with web applications running inside Remote Desktop or VDI environments. The target system stays untouched.
Can AskUI handle web flows that cross into OS-level surfaces?
Yes. The agent operates at the OS level and sees the full screen regardless of whether the active surface is a browser, a system dialog, or a desktop application.
Can AskUI test web applications built with React, WebViews, or mixed architectures?
Yes. The agent reads what is visible on screen and does not depend on a consistent DOM structure. It interacts with whatever is rendered, regardless of the underlying framework.
How are web tests written with AskUI?
Tests are plain Markdown or CSV files describing preconditions, numbered steps, and expected outcomes. No instrumentation setup required.
Can AskUI run web tests in CI pipelines?
Yes. AgentOS installs on a CI VM in Host Mode. Tests run on schedule or on every commit from the same repository.
How does AskUI handle web applications that change frequently?
The agent reads the screen at runtime rather than depending on pre-mapped element structures. When the UI changes, the agent adapts to what it sees rather than breaking on a stale reference.
Does AskUI work with enterprise security requirements?
Yes. With BYOM, model inference stays within your own cloud infrastructure. AskUI is ISO 27001 certified and supports on-premise deployment.
YouYoung Seo