中文 / EN

From Autonomous AI
to Production-Grade
Agent Systems

Human checkpoints, SOP-managed context, and sandboxed execution for production-grade agent systems.

crazywoola (Banana)
Developer Relations @ Dify
banana@dify.ai

Agents Can Act — But Production Is Not Ready

Model capability has crossed the "usable" threshold, but teams pushing agents into production keep hitting the same three walls.

Hallucinations Reach Users

Without human checkpoints, AI-generated errors can reach end users directly. One incident destroys trust.

Compliance Cannot Close

Finance, healthcare, and government need approval records and traceable audit trails. Pure automation fails the audit.

Fragile Toolchains

Prompts grow endlessly, tool lists keep expanding, handoffs rely on hidden state — maintenance cost far exceeds expectations.

Today we will cover how Dify addresses these gaps with three architectural advances.

Session Agenda

01
The Evolution
From prompt pipelines to orchestrated agents — why the shift matters
3 min
02
Human-in-the-Loop (HITL)
Pause / resume / approve mid-execution for compliance-safe deployments
8 min
03
Agent × Skills
Current failure modes, Skills, SOP context, and explicit deliverables
12 min
04
Sandboxed Runtime & Collaboration
POSIX-style runtime, Command node, safe execution, and team collaboration
10 min
Q&A
Open Discussion
Live demo · your production use-cases · roadmap preview
5 min
Total ~38 min
01

The Evolution of AI Systems

Moving beyond one-shot prompts into fully orchestrated, controllable agent architectures.

Three Generations of LLM Applications

Each generation unlocks new value — and new complexity.

Gen 1

Prompt → Response

Single-turn completions. No memory, no tools, no state.

ChatGPT wrappers one-shot summarizers
Gen 2

Pipeline Orchestration

Chained nodes with data transformation, RAG, and conditional branching.

LangChain Dify Workflow

Three Architectural Advances

The features that define Dify's production-grade agent direction.

Human-in-the-Loop Nodes
Dedicated execution nodes that pause a workflow and wait for human approval, override, or rejection before continuing. Enables auditable, compliance-friendly AI.
Agent × Skills
A thinner agent layer built around reusable Skills, SOP-managed context, and explicit deliverables instead of giant prompt blobs.
Sandboxed Runtime & Collaboration
A POSIX-style runtime with isolated execution, command-based workflows, and collaborative authoring around shared operational knowledge.
02

Human-in-the-Loop (HITL)

Put human judgment inside the workflow graph, not beside it.

Available in Workflow and Chatflow modes

Why Teams Need HITL & Where It Fits

Oversight should not be a patch — it should be a native gate placed exactly where the workflow needs it.

Shifting Objectives

Work changes mid-run. A pause point keeps workflows from becoming rigid.

Trust Gap

High-stakes teams need visible checkpoints before AI can act for the business.

Integration Complexity

External approval queues and webhooks turn oversight into extra engineering, not native capability.

Before External Actions

Pause before a workflow sends an email, publishes content, submits a ticket, or contacts a customer.

When Confidence Drops

Use HITL on anomalies and edge cases instead of reviewing every run.

When Context Is Missing

Ask for one missing field, then continue automatically with the updated value.

When Policy Requires Sign-Off

Finance, compliance, and customer-facing flows often need a visible approval checkpoint.

Fewer, better-placed nodes usually beat reviewing every step. If reviewers still need another system to finish the task, the node design is incomplete.

How a Human Input Node Works

Execution pauses, a human sees the right context, and the workflow resumes on one of three simple outcomes.

Workflow Running
HITL Gate
Paused · Notification Sent
Human Reviews Context
Approve
Continue
Edit & Approve
Modified Values
Reject
Alt Branch
Delivery

Generate a review page and route it to the right person.

Variables

Insert editable fields and return new values safely.

Actions

Buttons, branches, and timeout rules to ensure resumption.

Liang · Investment Services

HITL adds expert judgment exactly where automated output becomes client-facing.

Scaling Problem
40 min
manual work per client
100+
clients to serve

Report generation was automated, but compliance still needed a final look before financial updates reached clients.

HITL Placement
after synthesis on anomalies before send

Reviewers saw exactly what clients would receive, edited if needed, and approved with one click. By June, all 100 clients received consistent reports.

"The system handles the calculations, but humans apply judgment where it matters most."

Min · Global Support Team

HITL is not only for approval. It is also a clean way to request missing context.

Support Challenge

Employees moved across separate HR, finance, and IT portals. Requests often arrived without the details needed to route them correctly.

single entry point query classification knowledge routing
Where HITL Helped

When Jason from R&D asked about reimbursement, the workflow noticed missing location data, requested it through Human Input, then returned the right Shanghai-office policy.

"The result feels like collaborative intelligence."
03

Agent × Skills

The agent becomes smaller: choose the right SOP, call the right skill, and leave usable artifacts behind.

Works across Workflow and Agent modes

From Monolithic Prompt to Thin Orchestrator

Four failure modes in today's agent workflows — and the better operating model.

Single-run flow Tool noise Fragile files Long debug loops
Before — Prompt Does Everything
Inside the prompt
tool routing file handling retry logic output formatting
What breaks
duplicated logic hard to test tool bloat hidden state
After — Agent Orchestrates
Agent owns
goal choose SOP call skills pick deliverables
Workflow gets
text files fields memory snapshot

Missing Deliverables Break Workflows

If the useful artifact stays inside the agent's memory, downstream nodes can only guess from prose.

Workflow example showing an agent feeding an IF/ELSE node

Example: an IF/ELSE node tries to infer state from plain text.

Text is not state

Checking whether the agent happened to say success is brittle and hard to maintain.

Files disappear

Raw tables, reports, or generated artifacts can stay buried in memory while the next node only sees the summary.

Agents cannot relay work

The next agent cannot reliably see what the previous one actually delivered.

What a Node Should Hand Off

A production workflow needs more than a polished answer.

Text Answer

The human-facing explanation or final response.

Files

Reports, tables, images, and other artifacts downstream steps can keep using.

Structured Fields

Status, decisions, IDs, and parameters that branches or tools can read directly.

Memory Snapshot

Reusable context that later nodes can extract facts, parameters, or files from.

If downstream cannot consume it, it is not a real deliverable.

What Is a Skill?

A reusable execution unit that bundles SOP context, runtime behavior, and a reliable handoff contract.

SOP-Backed

The “how to do this well” playbook lives with the skill instead of being copied across nodes.

Reusable

Publish once, then invoke it from different agents and workflows.

Testable

Run the skill with fixture inputs without triggering the full workflow.

Version-Pinned

Agents can pin a stable version instead of breaking whenever a shared skill changes.

Typical Input Sources
conversation context prior node outputs files memory extraction

One SOP Library, Many Entrypoints

Context engineering needs a shared home, not repeated prompt snippets.

Today — SOPs buried in nodes
same SOP pasted again hard to review best practices drift
Better — Shared /sops workspace
write once different entry files version with workflow
Different agents often share the same SOP library and simply enter through different files.

In Practice: From Scattered Prompts to a Shared Skill Library

An e-commerce ops team consolidated duplicated customer-service SOPs from 5 separate workflows into one shared Skill — cutting maintenance by 4x.

Before — 5 workflows, each with its own copy
returns flow shipping inquiry complaint handling order exception VIP service

Each workflow contained a near-identical "customer-service script SOP" and "ticket classification logic." Updating one meant updating all five.

After — 1 Skill, 5 entrypoints
shared SOP: service script shared SOP: ticket classification

Each workflow only defines its own entrypoint and unique logic. Shared knowledge is maintained and versioned in one Skill.

"Before, updating one script meant opening five workflows. Now we change the Skill once and it takes effect everywhere."

Skill + SOP Agent Architecture

Reasoning stays thin. Execution happens in a workspace built around files, commands, and reusable outputs.

Inputs
user request prior node outputs uploaded files
Agent Layer
choose SOP assemble context call skills decide next step
Workspace
/sops commands files versioned skills
Handoffs
text files fields memory HITL
Think in SOPs, files, and artifacts, not a giant dropdown of tools.

Memory Extraction Makes Context Reusable

Memory stops being an implementation detail and becomes a reusable workflow artifact.

LLM Node A
Memory Store
Extraction LLM
Downstream Node B
Runs, produces context Full context preserved Reads & extracts params/files Receives structured values
Memory extraction diagram
Cost & Latency

The extraction LLM call is lightweight: it reads a bounded context window and outputs structured fields. Typical overhead is <1s and <500 tokens.

Fallback on Failure

If extraction fails, the node falls back to the upstream agent's raw text output, so the workflow never silently breaks.

How It Differs from RAG

RAG retrieves from an external corpus; Memory Extraction pulls from the same run's working context. No vector DB needed — this is intra-workflow state, not cross-session retrieval.

04

Sandboxed Runtime & Collaboration

Once agents work over SOPs, files, and explicit deliverables, the runtime has to feel useful and safe at the same time.

Available in self-hosted and Dify Cloud

Command Node: A Small but Powerful Primitive

One command line in, stdout out, files left behind for the next step.

Example
report --input ./turnsheet.csv --format json
command line in stdout out files stay in runtime
Natural for LLMs

Models already understand commands, pipes, and file paths from pretraining.

Smaller Product Surface

You do not need a custom UI for every tiny transformation or helper tool.

Better Handoffs

Bigger artifacts can stay as files and move forward explicitly instead of being squeezed into prompts.

From Tool Lists to a POSIX Workspace

Stop modeling every capability as a bespoke tool card. Let the runtime expose commands, files, and stdout.

Before — Tool-first orchestration
step1: A = google_search(query="Dify", max_size=30)
step2: B = summary(query=A)
hidden conversion outputs stay in memory every tool needs UI
After — POSIX-style execution
summary --query "$(google_search --query dify --max_size 30)"
string interface shell composition inspect with ls /bin

Sandboxed Code Execution

Agents need a real execution surface — just not the host machine. A sandbox makes both possible at once.

Host System Access

Without isolation, code can read local credentials, environment variables, and files.

No Resource Limits

A bad loop or memory spike can block workers and hurt everyone sharing the runtime.

Supply Chain Risk

Imported packages can quietly exfiltrate workflow data unless the environment is controlled.

Safety Boundary
  • No host filesystem access
  • Network restricted by allowlist
  • CPU and memory limits per run
  • Timeout configurable per node
Usable Runtime Surface
ls /bin stdin/stdout I/O files as handoff Python 3.11+ JavaScript (Node 20) external file storage
How to Enable
Cloud — on by default Self-hosted — set SANDBOX=true
The goal of a sandbox is safe capability: a real runtime inside hard boundaries.

Observability: Make Every Step Traceable

A production system must not only run — it must be diagnosable when things go wrong and measurable day to day.

Node-Level Tracing

Every node's input, output, latency, and token usage is traced independently, so failures can be pinpointed to the exact step.

Cost Tracking

Token costs broken down by workflow, by node, and by model so teams know where money is going.

Latency Analysis

Is the bottleneck in inference, tool calls, or file I/O? Latency distribution charts make optimization evidence-based.

Error Replay

Failed runs can be replayed with full context — no guessing, no reproduction steps needed.

Collaborative Workflow Development

The workflow itself becomes a shared product surface for the team.

Role-Based Access

Different people can draft, review, or publish without stepping on each other.

Version History

Every publish creates a snapshot, so teams can compare and roll back quickly.

Draft → Review → Publish

The lifecycle becomes visible and repeatable instead of living in screenshots and chat messages.

Shared SOP Library

Best practices stop living as private prompt snippets and become team assets.

Simple team flow

One person drafts the workflow, another reviews the SOPs, and a lead publishes the approved version with history still intact.

Putting It All Together

A production agent system where reasoning, runtime execution, and human review all share explicit deliverables.

Input → Agent Reasoning Layer
user query / files / scheduled trigger choose SOP call skills assemble context
Execution (Sandbox) + HITL Gate
command node skills files + stdio code sandbox pause → review → resume
Deliverables + Observability
text / files / structured fields memory snapshot trace log cost tracking
Production-grade means every step leaves behind artifacts the next step can actually use.

Global Community

Open-source ecosystem · GitHub Top 100 project

GitHub Top 100 · Open Source LLMOps
1M+
Powered by Dify
130K+
GitHub Stars
150+
Countries
1,000+
Contributors
60+
Industries
550M+
Total Downloads
Companies using Dify

Next Steps

You do not need all three on day one — pick the pain point that hurts most and start today.

Try the HITL Node

Drop a Human Input node into any workflow in the latest Dify release and add your first human checkpoint.

Available now

Explore Agent Skills

Extract your most-copied SOP into your first Skill and experience the efficiency of reuse and versioning.

Coming soon

Join the Community

Star the repo, join Discord, and help shape the future of agent systems with developers worldwide.

langgenius/dify

Thank You

Questions, feedback, or want to explore a specific feature deeper?

Dify Discord QR code
Discord Community
Scan to join and keep the conversation going around Dify and agent systems.
crazywoola (Banana)
Developer Relations @ Dify