A living book about agentic workflows, agent orchestration, and agentic scaffolding
by
Today we hit a class of failures that are easy to misdiagnose: not prompt bugs, not repository logic bugs, and not deterministic workflow bugs. They happen in the integration gap between GitHub APIs, hosted runner networking, and third-party artifact delivery.
We observed two concrete patterns:
Dispatch returned HTTP 500 but still started downstream.
An intake workflow failed on gh workflow run ... with a server error, yet the routing workflow actually started and completed. The upstream run looked failed while the pipeline still moved forward.
Phase execution failed during tool install with HTTP 502.
A phase run failed while installing GitHub Copilot CLI because the release asset download returned 502. A rerun succeeded without workflow changes.
These are classic “no-man’s-land” failures: infrastructure or platform edge conditions where control-plane state and CLI/API responses can briefly disagree.
If you treat every failure as deterministic logic failure, you waste time debugging the wrong layer. In multi-stage agent pipelines, transient infra errors can also block the next phase even when issue state is otherwise correct.
5xx in dispatch/install steps as potentially transient.Transient failures are unavoidable, but expensive execution can be reduced with better workflow boundaries.
Agentic systems are only as reliable as their orchestration edges. “No-man’s-land” failures are normal in production; what matters is designing the workflow so they are recoverable.
tags: