| Radian IT | Production AI | 11 min read
From AI prototype to controlled workflow
Moving an AI workflow from prototype to regular use needs a workflow boundary, production implementation, evaluation fixtures, telemetry, release controls, handover, and a runbook.
A useful AI prototype can show that a workflow is worth building. It does not show that the workflow is ready for regular use.
The prototype may answer the right question, draft the right message, summarise the right source, update a test record, or guide a user through a manual process. That is useful evidence, but it is not production evidence.
The production question is narrower: what will run, where will it run, which systems may it use, who approves its actions, what evidence will the release owner receive, how will failures be handled, and how will the buyer's team operate it after handover?
Moving an AI workflow into regular use needs a bounded workflow, production implementation, evaluation fixtures, telemetry, release controls, deployment handover, and a runbook.
Start with a bounded workflow
The first build decision is not model selection. It is the workflow boundary.
A bounded workflow describes one repeatable job, the people involved, the source systems it may use, the actions it may take, the approval points it must respect, and the evidence needed for release. Without that boundary, the build drifts towards a general assistant. General assistants are harder to test, harder to hand over, and harder to stop when something goes wrong.
The starting map should answer these questions.
| Boundary | Build question |
|---|---|
| Workflow goal | Which job should the workflow complete, and which related jobs are outside scope? |
| Release owner | Who can approve controlled use, pause rollout, or accept a residual risk? |
| Users | Which roles can start the workflow, review output, approve actions, or override a decision? |
| Source systems | Which systems are authoritative, indexed, read-only, synthetic, or excluded? |
| Inputs | Which documents, records, messages, prompts, forms, or events can start the workflow? |
| Outputs | Which answers, drafts, tickets, records, messages, decisions, or tasks can the workflow produce? |
| Tool authority | Which tools can be called, at what authority level, and with what approval path? |
| Data classes | Which data is public, internal, sensitive, customer-owned, regulated, or blocked? |
| Human approvals | Which actions require human approval before side effects occur? |
| Telemetry | Which traces, costs, latency, denials, approvals, failures, and audit events are captured? |
| Rollback | What happens when output is wrong, a tool call fails, a source is unavailable, or audit logging is incomplete? |
The boundary should be short enough to read and specific enough to test. It becomes the contract for implementation, evaluation, release, handover, and later expansion.
Use control language, not confidence language
The current governance and practitioner sources are useful because they describe production AI as a control problem, not just a capability problem.
The IMDA Model AI Governance Framework for Agentic AI frames agentic systems around bounded risk, human accountability, technical controls, testing, monitoring, traceability, and user responsibility. For a workflow build, that means the design has to name the limits on data, tools, autonomy, approval, rollout, and monitoring before the system is treated as ready.
The 2026 preprint Agentic AI in Industry: Adoption Level and Deployment Barriers is useful for the same reason. It describes a capability-deployment verification gap: teams can demonstrate useful agentic capability experimentally, but production integration is blocked when verification is inadequate.
The practitioner review Making Sense of AI Agents Hype points towards architecture, task decomposition, coordination, reliability constraints, and operational limits. Those are build concerns, not afterthoughts.
The Thomson Reuters 2026 AI in Professional Services Report is useful for buyer communication. AI work needs clear success criteria and clear communication about how AI is being used. A workflow build should therefore leave release owners with evidence they can understand, not just a transcript from a successful demo.
For a buyer, the practical shift is simple: do not ask whether the prototype looked good. Ask which controls would make the workflow controlled enough to run regularly.
Turn the prototype into a service boundary
The build should turn prototype behaviour into an explicit service boundary that can be operated after delivery.
That boundary does not have to imply a large platform. A bounded first workflow may be a small service, scheduled job, internal tool, agentic pipeline, retrieval-backed assistant, or event-driven process. The important point is that it has named inputs, outputs, dependencies, permissions, telemetry, and release rules.
Workflow request or event
-> user, role, channel, and task scope
-> input validation and data-class check
-> source selection
-> source-of-truth system
-> approved indexed corpus
-> current record lookup
-> synthetic or fixture data in test
-> workflow orchestration
-> prompt, policy, and model selection
-> retrieval or memory read where allowed
-> tool proposal
-> human approval where required
-> tool execution where allowed
-> output contract
-> answer, draft, record update, task, ticket, or blocked action
-> citations or source references where needed
-> evidence state and escalation route where useful
-> evidence capture
-> run ID, workflow version, sources, tool calls, approvals, denials
-> telemetry, cost, latency, errors, audit events
-> release controls
-> fixture verdicts, rollback route, runbook checks, handover notesThis map is intentionally plain. It should be easy for the release owner, engineering team, and operator to inspect. A workflow that cannot be described this way is not ready for regular use.
Build the first implementation slice
The first implementation slice should prove the whole operating path, not every possible feature.
A useful slice runs from input to evidence capture: it receives a representative task, reads approved sources, produces a bounded output, applies approval rules, records telemetry, and leaves enough evidence to decide whether the workflow can continue.
| Build slice | What it proves | Output |
|---|---|---|
| Intake contract | The workflow starts from agreed inputs and rejects excluded inputs. | Input schema, validation rules, data-class checks, blocked-input examples. |
| Source access | The workflow reads the right sources through approved paths. | Source inventory, access method, freshness rule, citation or reference rule. |
| Workflow logic | The prototype behaviour can run as a repeatable service. | Prompt or policy version, orchestration path, tool proposal rules. |
| Tool boundary | Side effects are constrained and approval-aware. | Tool list, authority level, approval route, denied-action behaviour. |
| Output contract | The result is structured enough for review or downstream use. | Output schema, answer or draft format, escalation state, error format. |
| Evidence capture | The run can be reconstructed later. | Run ID, sources, tool calls, approvals, denials, cost, latency, audit events. |
| Release gate | The release owner can see pass, fail, warning, and risk decisions. | Fixture report, release recommendation, residual-risk record. |
| Handover | The buyer can operate the workflow after delivery. | Deployment note, runbook, rollback route, monitoring checks. |
This slice is deliberately narrower than the target future workflow. It gives the buyer a working operating pattern before expansion.
Evaluate the workflow, not the transcript
The evaluation pack for a workflow build should test the workflow boundary.
The fixture set should include the happy path, but it cannot stop there. It should include source mistakes, stale records, missing permissions, prompt-injection attempts, unavailable dependencies, tool denial, cost and latency limits, audit gaps, and known regressions.
| Fixture type | What it proves | Example pass condition |
|---|---|---|
| Happy path | The workflow completes the intended task with approved inputs. | Correct output, expected source use, trace present. |
| Expected source | The workflow uses the authoritative source, not a plausible substitute. | Output references the current approved record. |
| Excluded source | The workflow does not use blocked or out-of-scope material. | Restricted material is absent and the denial is logged. |
| Stale source | The workflow handles old indexed material conflicting with the current source. | Current source wins or the conflict is escalated. |
| Tool approval | The workflow proposes an action that needs human approval. | Approval state is visible before side effects happen. |
| Tool denial | The workflow tries to call a tool outside its authority. | Action is blocked, logged, and explained to the user or operator. |
| Prompt injection | Source content attempts to alter policy or tool authority. | The instruction is treated as untrusted content. |
| Dependency failure | Retrieval, model, tool, audit sink, or downstream system is unavailable. | Workflow stops, degrades, or escalates through the agreed path. |
| Cost and latency | The workflow runs within practical operating limits. | Run stays within threshold or records the release impact. |
| Known regression | A previous bad output or tool path is repeated as a fixture. | The old failure does not recur and evidence shows why. |
The result is not a generic score. The result is a release decision for this workflow.
Capture run evidence
Run evidence is what makes the workflow operable. It lets the team debug behaviour, compare versions, explain approval decisions, and decide whether a failure is a defect, runbook issue, backlog item, or buyer-owned risk.
A minimum evidence record should include:
| Field | Purpose |
|---|---|
| Run ID | A stable identifier for the workflow execution. |
| Workflow version | Prompt, code, policy, model, tools, fixture pack, and deployment version. |
| Actor and role | User, service account, agent identity, and permission scope. |
| Input | Request, source event, fixture, data class, and expected output class. |
| Sources | Source IDs, versions, timestamps, citations, access decisions, and freshness result. |
| Tool calls | Tool name, authority level, input, output, approval state, side effects, and denial state. |
| Approvals | Who approved, denied, edited, escalated, or paused the workflow. |
| Output | Final answer, draft, record change, message, ticket, refusal, or escalation. |
| Telemetry | Cost, latency, retries, errors, fallbacks, trace completeness, and audit event state. |
| Verdict | Pass, fail, warning, release gate, runbook check, backlog item, or buyer-owned risk. |
The evidence schema should be boring. It should be simple to store, query, compare, and hand over to the buyer's team.
Put release controls into the build
Release controls should not be written after the workflow is already live. They should be part of the build.
| Control | Build output |
|---|---|
| Release owner | Named person or role that approves controlled use and can pause rollout. |
| Fixture threshold | Minimum fixture results needed before release. |
| Approval policy | Actions that require human approval before side effects. |
| Audit requirement | Events that must be captured for each run. |
| Failure policy | Stop, retry, degrade, escalate, or rollback behaviour by failure type. |
| Rollback route | How to disable the workflow, revert a tool action, or return to manual operation. |
| Monitoring checks | Cost, latency, denial, failure, drift, audit-gap, and override signals. |
| Handover checklist | Operator contacts, deployment notes, runbook checks, and known limitations. |
The release decision should be explicit. A workflow may be ready for limited internal use, ready only with runbook checks, blocked by missing audit evidence, or ready to expand after a controlled period. Those are different decisions and should not be blurred.
The proof asset: workflow-build pack
The minimum proof asset for a Production AI workflow build is a workflow-build pack. It does not need private buyer material. It is a generic, anonymised structure showing what the build will make explicit.
Production AI workflow-build pack
1. Workflow boundary
- workflow goal
- release owner
- user roles
- source systems
- data classes
- input contract
- output contract
- tool authority
- approval points
- telemetry and audit sinks
- rollback path
2. Implementation slice
- prototype behaviour to preserve
- service boundary
- source adapters
- prompt, policy, and model versioning
- orchestration path
- tool integration
- output schema
- error and escalation path
3. Evaluation pack
- happy path
- expected-source cases
- excluded-source cases
- stale-source conflicts
- tool approval cases
- tool denial cases
- prompt-injection cases
- dependency failures
- cost and latency limits
- known regressions
4. Run-evidence schema
- run ID and workflow version
- actor, role, and permission scope
- retrieved or read sources
- tool calls and side effects
- approvals, denials, and escalations
- output decision
- telemetry, cost, and latency
- audit events and trace completeness
- verdict and release impact
5. Release and handover
- release-gate rubric
- rollback route
- monitoring checks
- incident and support route
- deployment note
- runbook
- handover sessionThis pack is enough to start a specific buyer conversation without claiming a private case study. The buyer can see the operating shape, the evidence they will receive, and the handover artefacts that make the workflow manageable after delivery.
What a build should produce
A Production AI workflow build should leave the buyer with running software and operating evidence. It is not just a prototype review or a strategy note.
Useful outputs include:
- A written workflow boundary with release owner, source systems, users, inputs, outputs, tool authority, approval points, telemetry, and rollback route.
- A production implementation for the first bounded workflow, integrated with the agreed repository, platform, source systems, and delivery process.
- An evaluation pack with fixtures tied to the workflow's real risks.
- A run-evidence schema or telemetry path that captures sources, tool calls, approvals, denials, cost, latency, errors, and audit events.
- Release controls that separate defects, runbook checks, backlog items, monitoring requirements, and buyer-owned risk decisions.
- Deployment handover with a runbook, rollback route, monitoring checks, support expectations, and known limitations.
The purpose is not to make the first workflow large. The purpose is to make it controlled enough that the buyer can use it, inspect it, pause it, and decide what to expand next.
Related service path: Production AI workflow build. For rollout-readiness context, see How to evaluate agentic workflows before rollout and the service-page evaluation method.
To discuss a bounded workflow build, email [email protected] with:
- the workflow or manual process that needs to move into regular use;
- the current prototype, demo, spreadsheet, prompt chain, or manual process;
- the source systems and tools it may need to use;
- the release owner or decision owner;
- the main evidence gap blocking controlled use.
Sources used for this article
- production AI
- agentic workflows
- evaluation
- deployment