Главная· Трекер· Новое· Поиск· Правила· FAQ· Группы

Как собрать пайплайн с LLM агентом который фиксит нативные Android UI автотесты

Страницы: 1

Professor Seleznov

В прошлой статье тут мы разобрали как агент взаимодействует с MCP инструментами для Android разработки. В данной статье мы будем развивать пайплайн и создадим полноценные LLM агенты которые будут разбирать упавшие Android UI автотесты и пытаться починить их.
Какую проблему решаем
UI автотесты сложно починить так как место падения почти никогда не ведет напрямую к причине падения, поэтому если просто скормить LLM ошибку то модель не сможет разобраться что же произошло.
Что будем делать или что может быть интересного в статье:

Пайплайн из двух независимых агентов:
- Запуск и анализ ошибки UI автотеста (Root Cause Analysis)
- Фикс автотеста в цикле с его запуском.

Кастомизация MCP инструментов чтобы оптимизировать контекстное окно.
Система приоритетов в работе агентов.

Каркас уже был собран в предыдущей статье. В данной статье мы будем добавлять недостающие детали в готовый пайплайн на основе Langchain4j, чтобы провести эксперименты с починкой сломанного нативного Android UI автотеста.
Схема пайплайна:

Добавляем MCP инструменты
dependency-mcp- позволит агенту ходить по проекту и находить зависимости.

Клонируем проект - git clone https://github.com/mkearl/dependency-mcp.git
переходим в него - cd dependency-mcp
устанавливаем зависимости - npm install
билдим и запускаем - npm run build

Filesystem MCP— семантический поиск и навигация по файловой системе

устанавливаем - npm install -g @modelcontextprotocol/server-filesystem
проверяем - npx -y @modelcontextprotocol/inspector mcp-server-filesystem

Это минимальный набор который нам понадобится. Устанавливаем сейчас локально. Далее Langchain4j будет использовать эти MCP сервера.
Собираем пайплайн
Добавляем систему двух последовательных агентов. Агенты не должны аффектить друг друга.
Это позволит:

лучше контролировать процесс и управлять ресурсами.
Более точечно задавать агенту его задачу (Single Responsible Principal для LLM)
Программными способами управлять общим пайплайном между агентами.

Первый агент будет выполнять Root Cause Analysis (RCA). Второй агент будет забирать результаты выполненного RCA и пытаться починить автотест в цикле.
Вводим приоритеты для анализа ошибок:
Логика такова:

Сначала обращаем внимание на слой самого теста.
Далее переходим на общий для автотестов Page Object слой.
Только в конце переходим на слой ядра фреймворка.

Это совпадает с тем, как в реальности чаще ломаются UI автотесты.

Высокая - несоответствие шага теста и текущего UI (после скролла/навигации/другого экрана или устаревший матчер.
Высокая - ошибка в Page Object / screen / matcher или побочный эффект недавних правок в соседних тестах/общих экранах.
Средняя - данные или порядок в списке, изменившийся контекст, диалоги, оверлеи, эмулятор не в том состоянии, и тд.
Низкая - баг в тестируемом приложении.
Очень низкая - дефект на уровне кастомных контроллеров или Kaspresso/Espresso на уровне исходников фреймворка.

Итого такие приоритеты у RCA агента оптимизированные под тяжелые UI автотесты:

Слои доказательств: tier‑1 → tier‑2
- Tier‑1: сразу после красного билда — только то, что даёт картину без «мегадампа»: полный по смыслу KASPRESSO тег, плюс ограниченный заход в TestRunner-стиль логов.
- Tier‑2 (ui_dump и глубокий logcat_read) Подключаем если:
  - Низкая уверенность после tier‑1 (логи) и просмотра кода confidence < 0.8 — нужно усилить доказательную базу.
  - Нет фактов по дереву: в tier‑1 нет ни короткого куска иерархии, ни опоры на ui_dump, а гипотеза про конкретные виджеты (id, Recycler, оверлей, «что реально на экране») без этого не проверяется.
  - Лог TestRunner «молчит», а UI‑сбой очевиден: все ограниченные зонды TestRunner‑стиля дали пусто, зато в KASPRESSO уже видно типичный UI‑провал, дальше не крутить grep, а брать дерево (часто сначала ui_dump)
  - Низкая уверенность после tier‑1 (логи) и просмотра кода confidence < 0.8 — нужно усилить доказательную базу.
  - Нет фактов по дереву: в tier‑1 нет ни короткого куска иерархии, ни опоры на ui_dump, а гипотеза про конкретные виджеты (id, Recycler, оверлей, «что реально на экране») без этого не проверяется.
  - Лог TestRunner «молчит», а UI‑сбой очевиден: все ограниченные зонды TestRunner‑стиля дали пусто, зато в KASPRESSO уже видно типичный UI‑провал, дальше не крутить grep, а брать дерево (часто сначала ui_dump)
Дисциплина logcat_read
Запрет сырого буфера без фильтра + требование узкого тега на вызов. Также запреты:
- дублировать одинаковые вызовы,
- крутить лотерею из десятка случайных grep
Приоритеты гипотез (priors), не только инструментов
Порядок рассуждения:
- сценарий/матчеры vs реальный UI и PO/хелперы выше
- приложение/данные/оверлеи средне
- баг Kaspresso/Espresso — почти последний рубеж и только с жёсткими доказательствами.
Ворота confidence (порог 0.8)
После tier‑1 + кода: если уверенности достаточно — можно завершить RCA без полного дерева. Если нет — обязателен tier‑2 перед финальным выводом.

Добавляем ToolProvider который содержит все MCP инструменты:

private static final List<String> TOOLS_DEVICE_AND_BUILD = List.of(
        "device_list", "gradle_task", "logcat_read", "logcat_clear", "ui_dump");
private static final List<String> TOOLS_DEPENDENCY_INSIGHT = List.of(
        "analyze_dependencies", "get_dependency_graph", "get_file_metadata", "get_architectural_score");
private static final List<String> TOOLS_WORKSPACE_IO = List.of(
        "read_text_file", "read_multiple_files", "search_files", "list_directory",
        "directory_tree", "get_file_info", "list_allowed_directories", "edit_file", "write_file");
Set<String> toolWhitelist = Stream.of(TOOLS_DEVICE_AND_BUILD, TOOLS_DEPENDENCY_INSIGHT, TOOLS_WORKSPACE_IO)
        .flatMap(List::stream)
        .collect(Collectors.toCollection(LinkedHashSet::new));
ToolProvider toolProvider = McpToolProvider.builder()
        .mcpClients(List.of(androidMcp, dependencyMcp, filesystemMcp))
        .filterToolNames(toolWhitelist.toArray(String[]::new))
        .build();

Обратите внимание, я использую не весь набор скилов MCP серверов а только отфильтрованные списки. Вот пример для Android тулинга:
List.of("device_list", "gradle_task", "logcat_read", "logcat_clear", "ui_dump");
Это позволит избежать использование потенциально опасных тулов.
Собираем androidMcp:

McpClient androidMcp = new DeviceMcpClient(
        new DefaultMcpClient.Builder()
                .transport(new StdioMcpTransport.Builder()
                        .command(List.of(
                                "npx",                      // менеджер пакетов Node: поднять сервер без глобальной установки
                                "-y",                       // неинтерактивно согласиться на загрузку пакета при первом запуске
                                "android-pilot-mcp"         // имя публикуемого MCP-сервера (stdio)
                        ))
                        .build())
                .build());

здесь команда — это цепочка запуска Node-экосистемы (npx + пакет), а не «один исполняемый файл + корни». Так мы фиксируем конкретный сервер (android-pilot-mcp) и не подмешиваем лишние аргументы в PATH.
Собираем dependencyMcp:

String dependencyMcpEntry = "<абсолютный-путь>/dependency-mcp/dist/index.mjs"; // точка входа stdio-сервера
McpClient dependencyMcp = new DefaultMcpClient.Builder()
        .transport(new StdioMcpTransport.Builder()
                .command(List.of(
                        "node",                 // рантайм: исполняем скрипт MCP локально
                        dependencyMcpEntry      // один аргумент: файл сервера (остальная конфигурация — внутри него / env)
                ))
                .build())
        .build();

Для dependency-mcp мы не хардкодим только список тулов (он уже в whitelist выше), но и сужаем поверхность запуска: один процесс node и один конкретный index.mjs, чтобы не дергать произвольные скрипты из $PATH.
Собираем filesystemMcp:

McpClient filesystemMcp = new DefaultMcpClient.Builder()
        .transport(new StdioMcpTransport.Builder()
                .command(List.of(
                        "mcp-server-filesystem",
                        "<путь-к-android-песочнице>"   // единственный разрешённый корень на диске
                ))
                .build())
        .build();

в command первым идёт запускаемый сервер MCP, всё остальное - абсолютные каталоги, внутри которых серверу разрешено читать/искать (и при включённых тулах — менять файлы). Чем меньше список корней, тем уже зона доступа к диску.
Создаём RCA-агента

RcaAgent rcaAgent = AiServices.builder(RcaAgent.class)
       .chatModel(model)
       .chatMemory(memory)
       .toolProvider(toolProvider)
       .build();

Далее нужно составить промпт для RCA агента. Я выбрал все промпты на английском так как это все-таки более native язык для модели. Пропишу их полностью, так как от системных промптов очень многое что зависит.
Системный промпт:

@SystemMessage("""
       You are an AI assistant for Android test failure investigation.
       Use tools to collect real evidence only. Never invent tool outputs.
       Work by strategy: simple-to-deep recursive RCA.
       Gradle vs Logcat:
       - `gradle_task` output is only Gradle/InstrumentationRunnerSummary: short failure lines, HTML report path—not full Kaspresso session logs.
       - **`KASPRESSO` is your primary narrative for HOW the scenario actually ran**: BEFORE/AFTER sections, DSL steps, flakiness/retry chatter—use **maximum `lines`/`logcat_read` allows** so early steps are not dropped.
       - **`TestRunner`** holds the terminal failure bundle; **never paste the entire View Hierarchy in the first pass**—it wastes context. Use **tiered exposure** below.
       - After `logcat_clear` → `gradle_task`, the ring buffer holds **only this run**.
       - **`confidence` (0..1) is the coordinator for the whole RCA pipeline**: set it after tier-1 evidence + code; escalate tools only when confidence is still below your bar (use **0.8** unless evidence forces lower).
       - Future extensions (not wired here yet): screenshots, TestOps step metadata—you may assume they could be added later; do not hallucinate having them now.
       Root-cause priors (empirical—investigate in **this order** before deep framework theories):
       - **High**: The **test DSL / assertion is out of sync with real UI** (stale step after scroll/navigation, wrong list item/index, matcher assumes on-screen state that **KASPRESSO + TestRunner lite hierarchy summary** contradicts). Tie every hypothesis to the **failing line** and what the hierarchy actually shows.
       - **High**: **Page Object / screen / shared test helpers** wrong or regressed—especially if related tests or common screens were edited recently. Use filesystem + dependency chain; compare matchers/ids to layout before blaming frameworks.
       - **Medium**: **App-under-test** defect, **data or ordering** assumptions, **timing/flake**, **dialogs/overlays**, **emulator or process state** (wrong foreground)—still not Kaspresso core by default.
       - **Very low**: **Kaspresso / Espresso / Android test framework internals broken at source**—last resort only. Do **not** anchor RCA on long Kaspresso stack traces until the high-prior layers are ruled out with cited evidence. If you still claim a framework bug: name **exact file+line**, give a **minimal repro**, and explain why test/PO/app layers cannot explain the failure.
       - **Caveat**: low prior ≠ impossible—list what evidence would **upgrade** a framework hypothesis (e.g. reproducible across minimal projects, wrong behavior with trivial `onView`).
       Context / `logcat_read` discipline (critical):
       - **Never** call `logcat_read` on the **whole** device buffer with only a big `lines` count—unfiltered logs pull megabytes of unrelated noise (media, `UiDevice`, package uninstall, kernel) and exhaust the model context; the host may truncate long results.
       - **Every** `logcat_read` MUST narrow output: set `grep`/filter (whichever the tool exposes) to **one** purpose—typically **`KASPRESSO`**, **`TestRunner`**, **`AndroidRuntime`**, **`FATAL EXCEPTION`**, **`Process:`**, or **`com.<app-under-test>`** substrings. Separate calls per concern beat one giant dump.
       - **Do not repeat** the same `logcat_read` arguments (same device + grep + lines) in one RCA pass unless you cleared logcat and re-ran tests.
       - **`grep` semantics differ by MCP**: a substring that appears inside your **KASPRESSO** tail may still yield **empty** `logcat_read` when you pass that substring as `grep` (tag-only matching, case, or field filters). **Do not** burn the budget on 5–10 random single-word greps hoping one sticks.
       - **Bounded TestRunner probes (max 3 reads after KASPRESSO, before tier-2)**: try, in order, filters likely to match **your** buffer format—e.g. `TestRunner`, then `TestRunner:` (colon after tag), then `AndroidJUnitRunner` or `InstrumentationResult`, then the **Gradle failing class simple name** (e.g. `RecyclerTest`). Use **enough `lines`** on the first hit (e.g. **120–250**) so `failed:`, exception headline, and `View Hierarchy:` head can land in one read. If **all probes are empty**, stop probing; tier-1 **TestRunner** snippet may be **absent** through no fault of the model.
       - **Terminal failure text already in tier-1 KASPRESSO** (e.g. `NoMatchingViewException`, `failed:`, matcher lines): you **may** quote those verbatim into block **(B)** and state provenance **`from KASPRESSO`**. You still need a **`Hierarchy summary`**; if hierarchy lines are **not** present in KASPRESSO, **do not** pretend you saw the tree—set confidence lower and use **tier-2** for real widgets.
       - Espresso **View Hierarchy** in logcat can be enormous: tier-1 report body keeps **1–3 hierarchy lines** plus your summary; tier-2 delivers **full tree / guts** (`ui_dump` or a larger **still-filtered** logcat read).
       Logcat/UI evidence tiers (failed instrumentation tests):
       **Tier 1 (always first after red build)** — minimal transcript, same evidence goals:
       - `logcat_read` **KASPRESSO**: max lines supported (full session trace since `logcat_clear`).
       - `logcat_read` **TestRunner** (lite): **one** read with **small–medium** `lines` (e.g. **80–200**) and the **first** filter from the bounded probe list that returns non-empty; goal—`failed:`, terminal exception headline, `View Hierarchy:` + **first 1–3** hierarchy lines. If every probe is empty, **skip** duplicate greps; proceed with KASPRESSO-only for (B) exception lines and lower tier-1 confidence.
       - In Observed Failure, **`Terminal failure (TestRunner) — lite`**: quoted snippets (from TestRunner read **or** from KASPRESSO if that is where they appeared) + mandatory **`Hierarchy summary`** (what ids/resources you actually saw). If you only have KASPRESSO without hierarchy lines, say so explicitly.
       - From tier-1 + Gradle + code, compute **Root Cause Hypothesis confidence**.
       **Tier 2 (“tree and guts”—only when needed)** — if **confidence < 0.8** after tier-1 + code **OR** hierarchy/widget proof is missing **OR** TestRunner probes were all empty while the failure is clearly UI/instrumentation:
       - Prefer **`ui_dump`** once for the **real on-screen tree** (Recycler children, overlays, dialogs)—usually better than a logcat megadump.
       - Else **one** deeper `logcat_read`: **same** best filter as tier-1 if any worked, with **higher `lines`** (e.g. **400–1200**), still **never** unfiltered max-lines-on-whole-buffer.
       - If tier-1 had **no** TestRunner lines but KASPRESSO proved UI failure, tier-2 **`ui_dump`** is the default next step—not more blind greps.
       - Re-evaluate confidence; repeat tier-2 only if justified (avoid infinite loops).
       - If **`KASPRESSO` is empty**, note it and lower tier-1 confidence; rely more on lite **TestRunner** + code unless tier-2 helps.
       - Before every connected `gradle_task`, call **`logcat_clear`** on that device.
       Proceed to fix-phase planning without tier-2 when **confidence ≥ 0.8** early and the hypothesis is evidenced.
       Gradle outcome (always first after gradle_task):
       - Read the gradle_task tool result text and locate the final Gradle status line.
       - Green build: the result contains a line matching `BUILD SUCCESSFUL` (e.g. `BUILD SUCCESSFUL in 5s`). Treat as no test failure for this run—do not invent failures.
       - Red build: the result contains `BUILD FAILED` or task failure with failing tests—then run full failure RCA below.
       - Quote the exact status line you relied on (verbatim) in Observed Failure.
       Mandatory flow:
       1) Build TODO checklist.
       2) Check emulator availability (`device_list`).
       3) Clear logcat (`logcat_clear`) on the device/emulator that will run tests—immediately before `gradle_task`.
       4) Run test task (`gradle_task`).
       5) If failed, parse failure artifacts from gradle output.
       6) Run RCA:
          - simple diagnosis first (config/network/http/api/environment/timeout);
          - if not enough, deep RCA:
            test -> steps DSL -> page/screen objects -> framework wrappers -> Espresso/UIAutomator -> app code/resources.
          - on red build / failed tests only, you MUST do:
            a) **Tier-1 logcat** as in "Logcat/UI evidence tiers" (`KASPRESSO` max lines; bounded TestRunner probes + lite snippet + hierarchy summary; cite terminal lines from KASPRESSO when TestRunner grep is empty). If **confidence < 0.8** or hierarchy proof is missing, run **Tier-2** (`ui_dump` preferred, else larger filtered `logcat_read`). Use device from `device_list` where required;
            b) at least 1 dependency-mcp call;
            c) at least 2 filesystem-mcp calls (find file + read file).
          - on green build (`BUILD SUCCESSFUL`), skip deep RCA tooling; set T2–T7 to N/A or DONE with reason "no failure".
       7) Update TODO after each step.
       8) If confidence < 0.8 after tier-1, apply tier-2 tools (then code/deps again if needed). Stop escalating when confidence ≥ 0.8 or tools cannot add signal.
       Rules:
       - Use only available tools.
       - Do not use unavailable tools.
       - Do not stop at first hypothesis.
       - This is native Android UI testing; root cause may be in app code.
       - Final report with "RCA pending" is forbidden for failed test.
       Required failure artifacts:
       - report path;
       - failed tests list;
       - short reason from FAILURE/What went wrong;
       - for instrumentation failures: Observed Failure blocks:
          (A) **`Execution trace (KASPRESSO)`** — from tier-1 max-line read; quote chronologically ordered snippets that prove which steps/screens ran—do not replace with Gradle tail-only;
          (B) **`Terminal failure (TestRunner) — lite`** — `failed:` line, exception line, hierarchy **first 1–3 lines only** (from TestRunner read **or** from tier-1 KASPRESSO if repeated `grep` calls are empty), **`Hierarchy summary`** you author;
          (C) **`Terminal failure — deep` (tier-2 only)** — fuller hierarchy / `ui_dump` excerpt—only attach if confidence remained < 0.8 after lite + code evidence.
       - full RCA report must be in English.
       TODO template:
       - [ ] T1 Parse failure artifacts from gradle output (class, method, error, report path).
       - [ ] T2 Locate failed test file via filesystem search.
       - [ ] T3 Read failed test code and identify exact failing assertion/step.
       - [ ] T4 Build dependency/call chain via dependency-mcp.
       - [ ] T5 Read top related files from chain (page objects, wrappers, framework, app code).
       - [ ] T6 Validate primary root-cause hypothesis against observed evidence.
       - [ ] T7 Produce fix plan and verification plan.
       TODO rules:
       - On red UI failures, T1 closes after tier-**1** full `KASPRESSO` + bounded TestRunner probes / lite (or KASPRESSO-sourced terminal lines when probes empty). Use tier-**2** when confidence dictates or hierarchy is missing. If `KASPRESSO` empty, declare it and compensate with tier-2 sooner if needed.
       Confidence gate:
       - After tier-1 + T2–T3 (code locate + read failing test/hotspots), reassess Root Cause Hypothesis confidence.
       - **≥ 0.8**: you may finalize RCA and fix plan **without** full hierarchy/`ui_dump` unless a specific gap remains—state briefly why certainty is sufficient.
       - **< 0.8**: you **must** run tier-**2** tooling before concluding T7.
       - T3/T4/T5 cannot be DONE without tool calls when the build is red; on green build, mark them N/A.
       - If T6 is not evidence-backed, keep PENDING and continue analysis.
       - On red build: final answer allowed only if T1-T5 are DONE.
       - On green build: T1 DONE after quoting `BUILD SUCCESSFUL` line; remaining T2-T7 N/A.
       - If confidence >= 0.9 and response includes concrete code fragment + concrete fix plan, set T6/T7 DONE (red build only).
       Final answer sections:
       1) TODO Progress
       2) Observed Failure
       3) Dependency/Call Chain
       4) Root Cause Hypothesis (confidence 0..1) — state which **prior tier** (test/UI sync, PO/shared code, app/emulator, framework) you used as primary and why evidence supports it
       5) Alternative Hypotheses
       6) Concrete Fix Plan
       7) What to verify next (if confidence < 0.8)
       8) Evidence tier (1-only / tier-2 logcat / tier-2 ui_dump / tier-2 both)—one line + trigger
       """)

пользовательский промпт:

@UserMessage("""
       Check emulators via device_list, then logcat_clear on the test device, then run gradle_task:
       - projectDir: {{projectDir}}
       - task: {{task}}
       - args (JSON array of Gradle JVM args, append exactly as given): {{gradleArgsJson}}
       If gradleArgsJson is not "[]", you MUST pass the same strings in gradle_task "args".
       If gradle output shows `BUILD SUCCESSFUL`, report success and stop (no failure RCA).
       If build is red / tests failed:
       - Tier-1: logcat_read **KASPRESSO** (max lines); up to **3** bounded `logcat_read` probes for TestRunner-style tags (`TestRunner`, `TestRunner:`, `AndroidJUnitRunner`, failing class name)—**no** identical-arg repeats; if probes empty, use exception/matcher lines from KASPRESSO for (B) and lower confidence.
       - If after code you are not confident (**< 0.8**) or you lack hierarchy proof: Tier-2—**ui_dump** first, else larger filtered logcat; then revise confidence.
       - extract failed test class/method from logs;
       - find and read this test file via filesystem MCP;
       - run dependency analysis for project and failed test file;
       - produce RCA in required format—including **confidence** and which evidence tier justified it.
       """)

Это все были аннотации над методом:

String execute(@V("projectDir") String projectDir, @V("task") String task, @V("gradleArgsJson") String gradleArgsJson);

Создаём FIX-агента

FixAgent fixAgent = AiServices.builder(AndroidFixAgent.class)
       .chatModel(model)
       .chatMemory(chatMemory)
       .toolProvider(toolProvider)
       .build();

Для простоты мы подключаем один и тот же ToolProvider к обоим агентам, поэтому оба видят один и тот же набор MCP-инструментов. В продакшене разумнее разводить тулы по ролям (RCA и Fix) и выдавать каждому агенту свой срез или свой ToolProvider.
Память диалога общая: .chatMemory(...) указывает на один объект, поэтому контекст первого агента доступен второму. Сейчас при переходе от RCA к Fix мы не делаем отдельной обработки sharedMemory (не сжимаем, не суммаризируем, не выкидываем старые тул-результаты). Позже это, скорее всего, потребуется, чтобы уложиться в лимит контекстного окна модели и не раздувать историю.
Системный и пользовательский промпт для FIX агента:

@SystemMessage("""
       You are an AI assistant for automatic Android test fixing.
       Use the latest RCA report from chat memory as source of truth.
       Never invent tool outputs.
       Before editing: if the latest gradle_task output in memory shows a line `BUILD SUCCESSFUL`, do not change code; report that no fix is needed.
       Strict fixing policy:
       1) Pick minimal safe code change.
       2) Edit only relevant files.
       3) Before verification `gradle_task`: call **logcat_clear** on the device, then run `gradle_task` with the SAME projectDir, task and args as the failing run (see user message placeholders).
       4) If verification still shows failure, collect diagnostics like RCA: **`logcat_read`** only with **grep/filter**—never unfiltered huge buffers; KASPRESSO=max useful lines; bounded TestRunner-style probes (no duplicate args, no grep lottery); terminal lines may come from KASPRESSO if tag-grep is empty; tier-2 prefers **`ui_dump`** or one larger filtered read—then reconcile with RCA.
       5) If still failing after analysis, provide next iteration plan.
       Safety rules:
       - Do not make broad refactors.
       - Keep changes focused on proven root cause.
       - If confidence in fix is low, explain uncertainty and stop.
       - Follow the same **root-cause priors** as RCA: prefer fixes to **test steps**, **Page Objects**, **shared helpers**, or **app code**; do **not** patch Kaspresso/Espresso framework sources unless the RCA already justified that path with hard evidence.
       Final answer in English:
       - Fix Summary
       - Files Changed
       - Verification Result
       - Remaining Risks
       - Next Iteration Plan (if needed)
       """)
@UserMessage("""
       Apply automatic fix based on previous RCA context and verify:
       - projectDir: {{projectDir}}
       - task: {{task}}
       - args (JSON array, same as RCA run): {{gradleArgsJson}}
       Before verification: logcat_clear, then gradle_task.
       If gradleArgsJson is not "[]", reuse the exact args in verification gradle_task calls.
       """)
String execute(@V("projectDir") String projectDir, @V("task") String task, @V("gradleArgsJson") String gradleArgsJson);

Сжимаем данные полученные с помощью MCP инструментов чтобы уменьшить контекстное окно.
Как Вы видите, системные промпты очень сильно разраслись. Мы на 100% сейчас полагаемся на MCP инструменты которыми модель будет запрашивать новые данные. Проблема в том, что нам нужно очень много данных прокидывать в модель каждый раз. Во время прохождения автотеста, тестовый фреймворк генерирует гигантское количество информации. Количество данных умножается на ChatMemory степы которых хранят контекст предыдущих вызовов. Как результат, контекстное окно каждый раз растет по экспоненте.
Мы сделаем обертку над logcat_read mcpинструментом, чтобы не полагаться на модель а оптимизировать входящие данные прямо на уровне кода.
Далее привожу пример класса где мы оверрайдим интересующий нас метод execute:

public final class McpReadClient implements McpClient {
   static final int DEFAULT_MAX_LOGCAT_CHARS = 32768;
   private final McpClient delegate;
   private final int maxLogcatChars;
   public LogcatReadClient(McpClient delegate) {
       this(delegate, DEFAULT_MAX_LOGCAT_CHARS);
   }
   public LogcatReadClient(McpClient delegate, int maxLogcatChars) {
       this.delegate = delegate;
       this.maxLogcatChars = maxLogcatChars;
   }
   @Override
   public ToolExecutionResult executeTool(ToolExecutionRequest request) {
       ToolExecutionResult r = delegate.executeTool(request);
       if (!"logcat_read".equals(request.name())) {
           return r;
       }
       String text = r.resultText();
       if (text == null || text.length() <= maxLogcatChars) {
           return r;
       }
       String truncated = text.substring(0, maxLogcatChars)
               + "\n\n<<< LOGCAT_TRUNCATED_BY_HOST retainedChars="
               + maxLogcatChars
               + " totalChars="
               + text.length()
               + " >>>\n"
               + "Narrow the next read: use tool grep/tag for KASPRESSO, TestRunner, AndroidRuntime, "
               + "or your app package—do not request unfiltered buffers.\n";
       return ToolExecutionResult.builder()
               .isError(r.isError())
               .resultText(truncated)
               .result(truncated)
               .build();
   }
}

Теперь агент будет действовать последовательно. Сначала он выгрузит только KASPRESSO тег и сам ексешен автотеста. Далее если агент решит что информации недостаточно, то он будет выгружать более подробные теги на уровне TestRunner.
Привожу примеры порезанных логов.
Эксепшен:

KASPRESSO: Failed to interact with view matching: (an instance of android.widget.TextView and view.getText() with or without transformation to match: is "0") because of NoMatchingViewException
KASPRESSO: All attempts to interact for 10000 ms totally failed because of NoMatchingViewException
KASPRESSO: ------... [+6970 chars]

Шаги теста:

KASPRESSO: BEFORE TEST SECTION
KASPRESSO: ---------------------------------------------------------------------------
KASPRESSO: ---------------------------------------------------------------------------
KASPRESSO: TEST SECTION
KASPRESSO: ---------------------------------------------------------------------------
KASPRESSO: scroll RecyclerView to position: 15 on RecyclerView(id=recycler;)
ViewInteraction: Checking 'com.kaspersky.kaspresso.proxy.ViewAssertionProxy@dc0f451' assertion on view (an instance of android.widget.TextView and view.getText() with or without transformation to match: is "14")
KASPRESSO: Check (view has effective visibility <VISIBLE> and view.getGlobalVisibleRect() to return non-empty rectangle) on TextView(id=textView;text=14;)
KASPRESSO: Scroll RecyclerView to the bottom on RecyclerView(id=recycler;)
ViewInteraction: Checking 'com.kaspersky.kaspresso.proxy.ViewAssertionProxy@593a8d6' assertion on view (an instance of android.widget.TextView and view.getText() with or without transformation to match: is "0")
:... [+9384 chars]

Собираем Android проект для проведения экспериментов
Я просто взял Kaspresso samples проект тут. В проекте хорошие примеры для различных областей: скроллы, матчеры, Kakao PageObject и тд.
Скачиваем и билдим проект. Запускаем эмулятор. Пайплайн будет просто стучаться в Android проект на локальной машине и запускать тест на уже запущенном эмуляторе.

rcaAgent.execute(/Kaspresso-master, :samples:kaspresso-sample:connectedDebugAndroidTest,
Pandroid.testInstrumentationRunnerArguments.class=com.kaspersky.kaspressample.recycler.RecyclerTest#test
)

Эксперимент №1 - Ничего не говорящая ошибка NoMatchingViewException
В качестве LLM я использовал deepseek-v4-flash по API. Заметьте, что мы юзаем голую модель которая точно не может ломиться ни в какие интернеты и поискать код автотестов на гитхабе. Модель будет пытаться додуматься сама на основании локально добытых данных.
Я сломал автотест прямо на фасаде. (код шагов автотеста в тестовом классе) Это будет простейший вариант самой популярной UI ошибки: NoMatchingViewException
Было:

recycler { scrollToEnd() }
   element29 { isDisplayed() }

Стало:

recycler { scrollToEnd() }
   element0 { isDisplayed() }

Что будет происходить на экране:
Скролл в конец списка, но тест будет ожидать первый элемент вместо последнего. Соответственно первого элемента не будет на экране и будет выброшен NoMatchingViewException.
Сам по себе NoMatchingViewException сложно выловить поскольку элемента на экране может не быть по 1000 причинам. Это не просто ошибка компиляции.
RCA агент - Анализ и вердикт:

### 4) Root Cause Hypothesis (confidence: 0.92)
**Primary prior tier: Test DSL / assertion out of sync with real UI (HIGH)**
The test code at line 42-43 of `RecyclerTest.kt`:
```kotlin
recycler { scrollToEnd() }
element0 { isDisplayed() }
```
After `scrollToEnd()`, the RecyclerView is scrolled to the **bottom** — showing the last items (positions ~27-29, with text "27", "28", "29"). The test then asserts that `element0` (a `TextView` with text `"0"`) is displayed. But element "0" is at the **top** of the list (position 0), far off-screen after scrolling to the end. The `NoMatchingViewException` is correct and expected: no view with text "0" exists in the visible view hierarchy.
The KASPRESSO trace confirms:
1. `scrollTo(15)` → success
2. `element14 { isDisplayed() }` → success (text "14" is visible near position 15)
3. `scrollToEnd()` → success (scroll action completed)
4. `element0 { isDisplayed() }` → **fails** — text "0" is not on screen after scrolling to end
The adapter has 30 items (positions 0-29), each displaying its position as text. After `scrollToEnd()`, only the last few items are visible. The test appears to have a logical error: it expects `element0` to be visible after scrolling to the **end**, which is impossible unless the list wraps around or the test intends to scroll back to 0 (which is what the third block does: `scrollTo(0)` then `element0 { isDisplayed() }`).
The second block (`scrollToEnd()` + `element0 { isDisplayed() }`) is either:
- A **bug in the test**: the assertion should be `element29 { isDisplayed() }` (the last element), not `element0`
- Or the test was written expecting `scrollToEnd()` to have different semantics
Given that the third block correctly does `scrollTo(0)` + `element0 { isDisplayed() }`, the second block's `element0` is almost certainly a copy-paste error — it should be `element29`.
---
### 5) Alternative Hypotheses
1. **Kaspresso `scrollToEnd()` regression** (confidence: 0.03): If `scrollToEnd()` were supposed to auto-scroll back to find the matching view (via `AutoScrollProviderImpl`), the KASPRESSO trace shows auto-scroll *was* engaged (the stack trace goes through `AutoScrollProviderImpl.withAutoScroll`), but it couldn't find the view because "0" is at position 0 and auto-scroll typically scrolls forward, not backward to the beginning. This is expected behavior, not a framework bug.
2. **RecyclerView not rendering items** (confidence: 0.02): Ruled out — `element14` was successfully found and checked after `scrollTo(15)`, proving the RecyclerView renders items correctly.
3. **Emulator/timing issue** (confidence: 0.02): The test retried for 10 seconds (flaky safety), so timing is not the issue. The view simply isn't there.
4. **Dialog/overlay blocking** (confidence: 0.01): No evidence of system dialogs in the KASPRESSO trace; `SystemDialogSafetyProviderImpl` was in the call chain but didn't report any dialog.
---
### 6) Concrete Fix Plan
**Fix the test assertion in `RecyclerTest.kt`**, line 43. Change:
```kotlin
recycler { scrollToEnd() }
element0 { isDisplayed() }       // BUG: element0 is at position 0, not visible at end
```
To:
```kotlin
recycler { scrollToEnd() }
element29 { isDisplayed() }      // FIX: element29 is at position 29, visible at end
```
The `element29` KTextView is already defined in `RecyclerScreen.kt`:
```kotlin
val element29 = KTextView { withText("29") }
```

FIX агент - Фикс теста и отчет пользователю:

Имя инструмента: edit_file
Аргументы: {"path": "/Kaspresso-master/samples/kaspresso-sample/src/androidTest/kotlin/com/kaspersky/kaspressample/recycler/RecyclerTest.kt", "edits": [{"oldText": "            recycler { scrollToEnd() }\n            element0 { isDisplayed() }", "newText": "            recycler { scrollToEnd() }\n            element29 { isDisplayed() }"}]}
✅ Результат: {"content":"```diff\nIndex: /Kaspresso-master/samples/kaspresso-sample/src/androidTest/kotlin/com/kaspersky/kaspressample/recycler/RecyclerTest.kt\n===================================================================\n--- /Kaspresso-master/samples/kaspresso-sample/src/androidTest/kotlin/com/kaspersky/kaspressample/recycler/RecyclerTest.kt\toriginal\n+++ /Kaspresso-master/samples/kaspresso-sample/src/androidTest/kotlin/com/kaspersky/kaspressample/recycler/RecyclerTest.kt\tmodified\n@@ -34,9 +34,9 @@\n             recycler { scrollTo(15) }\n             element14 { isDisplayed() }\n \n             recycler { scrollToEnd() }\n-            element0 { isDisplayed() }\n+            element29 { isDisplayed() }\n \n             recycler { scrollTo(0) }\n             element0 { isDisplayed() }\n         }\n```\n\n"}

FIX агент - Запуск после фиксов:

Имя инструмента: gradle_task
Аргументы: {"projectDir": "/Kaspresso-master", "task": ":samples:kaspresso-sample:connectedDebugAndroidTest", "args": ["-Pandroid.testInstrumentationRunnerArguments.class=com.kaspersky.kaspressample.recycler.RecyclerTest#test"]}
✅ Результат: Task ':samples:kaspresso-sample:connectedDebugAndroidTest' completed.
BUILD SUCCESSFUL in 6s
159 actionable tasks: 9 executed, 1 from cache, 149 up-to-date

FIX агент - Финальный отчет пользователю:

## Fix Summary
**Root cause:** The test `RecyclerTest#test` had a logical error in its second assertion block. After `scrollToEnd()`, the RecyclerView displays the last items (positions ~27-29). The test incorrectly asserted that `element0` (text "0", at position 0) was displayed, but position 0 is at the top of the list and off-screen after scrolling to the end. The correct assertion is `element29` (text "29", at position 29), which is the last element and visible after `scrollToEnd()`.
**Fix:** Changed line 43 of `RecyclerTest.kt` from `element0 { isDisplayed() }` to `element29 { isDisplayed() }`.
## Files Changed
- `samples/kaspresso-sample/src/androidTest/kotlin/com/kaspersky/kaspressample/recycler/RecyclerTest.kt` — one line changed (`element0` → `element29` in the second block)
## Verification Result
- **BUILD SUCCESSFUL** in 6s
- 1 test started, 1 test finished, no failures
- The fix is confirmed working
## Remaining Risks
- **None.** The fix is minimal and targeted. The `element29` KTextView was already defined in `RecyclerScreen.kt`. The test now correctly verifies: scroll to position 15 → check element 14; scroll to end → check element 29; scroll to 0 → check element 0.
## Next Iteration Plan
- No further iteration needed. The test passes.

Выводы по первому эксперименту

RCA‑агент при такой постановке быстро и устойчиво свёл симптом к причине высокого приоритета: рассинхрон DSL/ожиданий с реальным состоянием UI (после скролла в конец на экране хвост списка, а проверяется верх списка).
- В отчёте видны: опора на порядок шагов, KASPRESSO, здравый смысл по позициям/текстам ячеек, аккуратный разбор альтернатив с низкими confidence — без ухода в «баг Kaspresso» без доказательств.
- Для этого класса ошибок NoMatchingViewException сам по себе «немой», но в связке с трассой сценария и контекстом списка агент сделал его интерпретируемым.
FIX‑агент при минимальном изменении (одна строка в тесте) и том же наборе ограничений довёл задачу до подтверждённой верификации (BUILD SUCCESSFUL, тест зелёный). Это показывает, что для явной логической ошибки в сценарии отдельный «супер‑инструмент» для фикса не обязателен — достаточно согласованного RCA и правки в коде теста.

Граница применимости:
Эксперимент №1 намеренно простой: причина на поверхности сценария, воспроизведение детерминированное, сигнал в логах читаемый.
Не следует из этого эксперимента, что любой NoMatchingViewException (дубли матчеров, оверлеи, неверный Page Object, неверный id в лайауте) будет раскрываться с тем же качеством без tier‑2 (иерархия/ui_dump), без сравнения с разметкой и без цепочки зависимостей.
Эксперимент №2 - Ошибка неявная и закопана в PageObject
Второй эксперимент сделаем немного посложнее. Это будет AmbiguousViewMatcherException причем я внесу правки в код приложения, чтобы ресайкл вью показал два одинаковых элемента. Со стороны автотеста будет все как было в оригинальном проекте Kaspresso, тоесть сырой локатор по тексту, а вьюхи будет уже две благодаря моим правкам приложения.
Вот такой изначальный локатор:

val element0 = KTextView{ withText("0") }

Вот тут подправляем Адаптер чтобы было два раза “0” на одном экране:

fun bind(position: Int) {
            textView.text = when (position) {
                0, 5 -> "0"
                else -> position.toString()
            }
        }

Следовательно модели придется идти смотреть что же там в ресайклом вью и далее думать что с этим всем делать.
RCA агент - Анализ автотеста и вердикт:

**Hierarchy summary:** The RecyclerView contains 30 items. Positions 0 and 5 both have `text="0"` (by design in `RecyclerAdapter.bind()`). After `scrollTo(0)`, both items are visible on screen simultaneously (or at least both are present in the view hierarchy), causing Espresso's `AmbiguousViewMatcherException` when the test tries to assert `element0 { isDisplayed() }` using the matcher `withText("0")`.
## TODO Progress
- [x] T1 Parse failure artifacts from gradle output (class, method, error, report path).
- [x] T2 Locate failed test file via filesystem search.
- [x] T3 Read failed test code and identify exact failing assertion/step.
- [x] T4 Build dependency/call chain via dependency-mcp.
- [x] T5 Read top related files from chain (page objects, wrappers, framework, app code).
- [x] T6 Validate primary root-cause hypothesis against observed evidence.
- [x] T7 Produce fix plan and verification plan.
---
## Observed Failure
**Gradle status line (verbatim):** `BUILD FAILED in 8s`
**Failed test:** `com.kaspersky.kaspressample.recycler.RecyclerTest#test`
**Report path:** `file://Kaspresso-master/samples/kaspresso-sample/build/reports/androidTests/connected/debug/index.html`
### (A) Execution trace (KASPRESSO)
```
I KASPRESSO: scroll RecyclerView to position: 15 on RecyclerView(id=recycler;)
I KASPRESSO: Check ... on TextView(id=textView;text=14;)   ← PASSED
I KASPRESSO: Scroll RecyclerView to the bottom on RecyclerView(id=recycler;)
I KASPRESSO: Check ... on TextView(id=textView;text=29;)   ← PASSED
I KASPRESSO: scroll RecyclerView to position: 0 on RecyclerView(id=recycler;)
E KASPRESSO: Failed to interact with view matching: (an instance of android.widget.TextView and view.getText() with or without transformation to match: is "0") because of AmbiguousViewMatcherException
```
### (B) Terminal failure (TestRunner) — lite
**From KASPRESSO log** (TestRunner probes were empty; terminal lines embedded in KASPRESSO output):
```
AmbiguousViewMatcherException: '(an instance of android.widget.TextView and view.getText() with or without transformation to match: is "0")' matches 2 views in the hierarchy:
- [1] TextView{id=2131231242, res-name=textView, visibility=VISIBLE, width=680, height=54, ... text=0, ...}
```
**Hierarchy summary:** The Gradle output shows only one of the two matching views (the first one). Both are `TextView` with `id=textView` and `text="0"`. The KASPRESSO trace confirms the ambiguity: after scrolling to position 0, two `TextView` items with text "0" are visible on screen simultaneously.
### (C) Terminal failure — deep (tier-2)
`ui_dump` showed the launcher (app already torn down post-test), so no additional hierarchy evidence from the app. The root cause is fully explained by source code analysis below.
---
## Dependency/Call Chain
```
RecyclerTest.test()
  → TestCase.run { }
    → RecyclerScreen { }
      → recycler { scrollTo(0) }          // KRecyclerView scrolls to position 0
      → element0 { isDisplayed() }        // KTextView withText("0") — AMBIGUOUS!
        → KakaoViewInterceptor.interceptCheck()
          → AutoScrollViewBehaviorInterceptor
          → FlakySafeViewBehaviorInterceptor
          → SystemDialogSafetyViewBehaviorInterceptor
          → Espresso onView(...).check(matches(isDisplayed()))
            → AmbiguousViewMatcherException
```
**Key files:**
- `RecyclerTest.kt` — test DSL
- `RecyclerScreen.kt` — page object: `element0 = KTextView { withText("0") }`
- `RecyclerAdapter.kt` — **root cause**: `bind()` sets text to `"0"` for both position 0 AND position 5
- `item_recycler.xml` — each item has `TextView` with `id=@+id/textView`
---
## Root Cause Hypothesis (confidence: 0.95)
**Primary tier: Test DSL / assertion is out of sync with real UI (stale step after scroll/navigation).**
The `RecyclerAdapter.bind()` method intentionally creates duplicate text values:
```kotlin
fun bind(position: Int) {
    textView.text = when (position) {
        0, 5 -> "0"
        else -> position.toString()
    }
}
```
When the test scrolls to position 0, both position 0 and position 5 are visible on screen (the RecyclerView shows multiple items). The page object `element0` is defined as:
```kotlin
val element0 = KTextView { withText("0") }
```
This matcher finds **all** `TextView` instances with text `"0"` — and since both position 0 and position 5 have text `"0"` and are visible, Espresso throws `AmbiguousViewMatcherException`.
The test previously passed for `element14` and `element29` because those positions have unique text values (only position 14 has text "14", only position 29 has text "29"). The ambiguity is specific to the value "0" which appears at two positions.
---
## Alternative Hypotheses
1. **Timing/flake** (confidence 0.02): Not supported — the failure is deterministic given the adapter data and visible viewport.
2. **App defect** (confidence 0.02): The duplicate "0" is intentional per the Russian comment; it's a designed test scenario, not a bug.
3. **Kaspresso/Espresso framework bug** (confidence 0.01): The framework correctly reports ambiguity — this is expected Espresso behavior when a matcher matches multiple views.
---
## Concrete Fix Plan
The page object `element0` needs a more specific matcher to disambiguate. Options:
**Option A (recommended):** Use `KRecyclerView` child matchers to scope the assertion to a specific position in the RecyclerView:
```kotlin
// In RecyclerScreen.kt, replace:
val element0 = KTextView { withText("0") }
// With a child of the recycler at a specific position:
val element0 = recycler.childAt<KTextView>(0) { withText("0") }
```
Or use `KRecyclerItem` with position scoping.
**Option B:** Add a unique identifier (e.g., `withId` + `withText` combo, or a content description) to distinguish position 0 from position 5 in the adapter.
**Option C:** Change the test to use `recycler.childAt<...>(0) { isDisplayed() }` directly without relying on text matching for the ambiguous case.
---
## What to verify next
- Confirm that `KRecyclerView.childAt<KTextView>(0)` or equivalent position-scoped API is available in the kakao library version used.
- Run the test with the fix to confirm the ambiguity is resolved.
- Check if other tests in the suite also use `withText("0")` and might be affected.
---
## Evidence tier
**Tier-1 logcat** (KASPRESSO max lines + bounded TestRunner probes — all empty; terminal failure lines sourced from KASPRESSO). Tier-2 `ui_dump` confirmed app was torn down (launcher visible), but source code analysis provided definitive root cause without needing live hierarchy.

FIX агент - Попытка фикса теста:
Было:

val element0 = KTextView { withText("0") }

Стало:

val element0 = KRecyclerItem<KTextView>(0) { withText("0") }

Заметьте, второй агент так и не смог пофиксить Kakao объект так как выдумал несуществующий конструктор. Он также пытался проломиться в мой Gradle кэш, но у него и прав на это нет (права только на андроид проект-песочницу) и jar он прочитать не сможет.

Результат: Access denied - path outside allowed directories: .gradle/caches/modules-2/files-2.1/io.github.kakaocup

Ему нужны исходники чтобы не выдумывать их.
Выводы по второму эксперименту

RCA — зачёт.
Агент корректно интерпретировал AmbiguousViewMatcherException: связал два совпадения в иерархии с дублированием отображаемого текста и одним общим матчером withText("0"), подтвердил это KASPRESSO, цепочкой вызовов и чтением RecyclerAdapter / RecyclerScreen. Для эксперимента это главный результат: тип ошибки «шумный», но становится доказуемым, когда есть лог + код адаптера + PO.
Дополнительно: Tier‑2 (ui_dump) в этом прогоне почти не добавил сигнала (после тира приложение уже разобрано — в дампе не то окно). Это полезный практический вывод: для RCA важно не только «вызвать тул», а поймать момент; при этом исходников адаптера и PO хватило для уверенного вердикта — это нормальная ситуация для детерминированной коллизии матчеров.
FIX — незачёт(ожидаемо в текущей конфигурации).
Агент не смог вывести компилируемый Kakao‑код: предложил несуществующий конструкт (KRecyclerItem<KTextView>(…)) вместо канонической схемы:
KRecyclerItem + регистрация itemType + recycler { childAt<…>(…) { … } }
Попытка прочитать Kakao из ~/.gradle/caches упёрлась в политику MCP (путь вне allowlist; даже при доступе сырой jar часто не заменяет чтение распакованных sources).

Эксперимент №3 -Проверка с локальной моделью llama3.1:8b
Я прогнал тот же самый пайплайн и тот же самый сломанный тест (NoMatchingViewException) через локальную модель llama3.1:8b. Результат печальный.
Вот полный ответ агента на этапе RCA:

RCA result: Here is the JSON function call with its proper arguments that best answers the given prompt:
```
{
  "name": "gradle_task",
  "parameters": {
    "projectDir": "/Kaspresso-master",
    "task": ":samples:kaspresso-sample:connectedDebugAndroidTest",
    "args": ["-Pandroid.testInstrumentationRunnerArguments.class=com.kaspersky.kaspressample.recycler.RecyclerTest#test"]
  }
}
```
This function call runs the specified Gradle task with the provided arguments. If the build is successful, it will report success and stop. If the build is red or tests failed, it will proceed to perform Tier-1 RCA as described in the prompt.
After running `gradle_task`, if the output shows `BUILD SUCCESSFUL`, the response should be:
```
{
  "name": "logcat_clear",
  "parameters": {
    "deviceId": "<device_serial>"
  }
}
```

Из лога видно, что модель запуталась в последовательности действий и не пытается больше вызывать MCP инструменты после первого ответа. Вместо этого она просто генерирует JSON-обёртку, имитирующую вызов, но не выполняет реальный тулинг.
Заметьте, в предыдущей статье тут, я использовал ту же самую локальную модель llama3.1:8b. Там был абсолютно минимальный системный промпт и только один MCP инструмент в списке доступных инструментов. Напомню, что в таких условиях модель успешно дергала MCP в том формате как это ожидает langchain4j.В текущем эксперименте контекста гораздо больше. Мелкая модель запутывается в контексте и выводит текст-ту-тест ответ имитируя вызов MCP.

{
  "name": "gradle_task",
  "parameters": {
    "projectDir": "/Users/.../Kaspresso-master",
    "task": ":samples:kaspresso-sample:connectedDebugAndroidTest",
    "args": ["-Pandroid.testInstrumentationRunnerArguments.class=..."]
  }
}

Дальше модель не идёт ни в файловую систему, ни в dependency-mcp, ни в logcat. Она симулирует продолжение работы, но в коде нет реальных вызовов executeTool.
Выводы по третьему эксперименту
С самой примитивной MCP задачей (один device_list) llama3.1:8b справляется - это показано в первой статье. Но как только контекст усложняется (два агента, приоритеты, tier-1/tier-2, несколько MCP серверов), локальная модель 8B ломается. Она теряет нить рассуждений, забывает про доступные инструменты и уходит в имитацию.
Завершение
Практика показывает, что связка «прогон UI тестов → логи и отчёты → чтение кода в репозитории» уже достаточна, чтобы агент устойчиво закрывал первую и самую дорогую по времени часть работы: восстановить картину падения и сформулировать обоснованный RCA с опорой на факты, а не на догадки из головы. Это особенно заметно на типовых симптомах вроде NoMatchingViewException и AmbiguousViewMatcherException, где решающими оказываются трассировка сценария, Page Object и анализ кода приложения. Я хотел еще привести примеры с изменением лайаута приложения и программируемым лайаутом, но статья и так получилась очень большая.
Для проектов с постоянно растущим количеством UI‑тестов мы видим образующийся новый класс работы: не замена SDET, а помощь в разборе однотипных красных автотестов. Фикс, проверка API и приёмка по‑прежнему остаются за людьми и инфраструктурой.
По фиксам постараюсь сделать отдельную статью. Там много подводных камней которые тоже стоит описать.
В данной статье я постарался привести примеры того, что вместо рутины мы можем строить новые инструменты вокруг привычного SDET фреймворка. Это гораздо более интересная работа, как по мне. Многое нужно настраивать и постоянно развивать.-Источник

Пользовательское Соглашение | Связь с администрацией

SysAdmin.Guru · Telegram-канал

Loading...

Error