中文 / EN
Dify

The Evolution of RAG:
From State to Memory

Dify × Milvus Joint Tech Talk

Speaker: Zheng Li · Head of Open Source Ecosystem @ Dify

Unstructured Data Meetup · 30 minutes

The real value of RAG: give the LLM memory

  • LLMs excel at compute; their weights hold static knowledge from training time.
  • RAG mounts an external, dynamic memory onto the model.
  • Industry view: RAG is a spectrum.
  • Key question: how do we build—govern—use that memory?

RAG spectrum (agenda)

  • Naive RAG: simple "state" retrieval
  • Advanced RAG: systematically upgrading the "state" quality
  • Agentic RAG: turning memory into part of an agent
  • Knowledge Pipeline: the production line for high-quality memory

Phase 1 · Naive RAG (simple state retrieval)

  • Flow: Query → Embedding → vector search (Milvus) → chunks → LLM
  • Pain: semantic breaks, noisy hits, Lost in the Middle
  • Conclusion: usable but not delightful → static, low-quality "state"

Phase 2 · Advanced RAG (systematic quality lift)

Three hard rules

  • Hybrid recall: vectors + keyword/regex + metadata filters; candidates 100–300
  • Re-rank before assembly: Cross-Encoder or LLM re-rank → top 20–40
  • Respect context rot: structured, tight context beats stuffing the window

Context assembly: instruction-first, dedupe/merge, diversify sources, strict token budget

Industry tip: first-stage hybrid recall with 200–300 candidates is fine; always re-rank before assembling context.

Dify practice

  • Parent-child retrieval: hit child chunks, return parent blocks to balance precision and context
  • Reranking: Milvus fast recall → re-rank → feed the LLM
  • Trend: LLM-as-reranker is rising; as cost/latency drop, expect more brute-force style information cleanup.

Don’t ship RAG; ship retrieval

  • Problem: calling it "RAG" hides the key design trade-offs.
  • Primitives: dense, lexical/regex, filters, re-rank, assembly, eval loop.
  • Move: win the first phase with hybrid recall (200–300 candidates is okay).
  • Discipline: always re-rank before context assembly; respect context rot.

Thoughts · The future and trade-offs of re-ranking

  • Trend: LLM-as-reranker will become mainstream; specialized rerankers may fade.
  • Reality: firing 300 parallel LLM re-ranks still hurts tail latency today.
  • Strategy: mix models short-term (Cross-Encoder + LLM); mid-term rely on caching/sharding to tame the tail.
  • Future: cheaper/faster LLMs make brute-force info cleanup viable.

Phase 3 · Agentic RAG (state → memory)

  • RAG moves from passive flow to an active agent tool
  • Query rewriting: clarify the ask before searching
  • Multi-step / looped retrieval: decide next action from intermediate results
  • Dify: turn RAG into tools inside agent orchestration—plan → retrieve → reflect → iterate

Foundation · Knowledge Pipeline (memory production line)

[Ingest]

  • Parse + chunk (domain-aware: headings, code blocks, tables)
  • Enrich: headings, anchors, symbols, metadata
  • Optional: block summaries (code/API NL gloss)
  • Embed: dense vectors + optional sparse signals
  • Write to Milvus (text, vectors, metadata)

[Query]

  • First-stage hybrid recall: vectors + lexical/regex + metadata filters
  • Candidate pool: about 100–300 → re-rank to top 20–40
  • Context assembly: instruction-first, dedupe/merge, diversify, hard token cap

Law: garbage in, garbage out

Outer loop: evaluation and operational feedback

  • Cache + cost guardrails (guardrails)
  • Small gold set → plug into CI + dashboards
  • Error analysis: re-chunk / tweak filters / tune re-rank prompts
  • Memory compaction: summarize interaction traces into retrievable facts (compaction)

Tip: spend an evening (pizza night) to build a tiny gold set and wire it into CI and dashboards.

Process once, reuse everywhere

  • Decouple: knowledge processing ↔ app development
  • Reuse: one Milvus knowledge base serves multiple Dify apps
  • Quality: govern memory in one place, raise the ceiling for all downstream apps

Dify × Milvus: division of labor

Milvus = memory foundation

  • Store/index/recall vectors and metadata efficiently
  • Stable, reliable, scalable

Dify = memory + app platform

  • Knowledge pipeline: build/manage/optimize memory (write to Milvus)
  • Application engine: orchestrate and use memory (Advanced/Agentic RAG)

Dify platform capabilities (one-stop)

  • Prompt engineering and evaluation
  • Knowledge pipeline: parent-child docs, hybrid recall, re-rank
  • Agent orchestration: tool-augmented, visual workflows
  • Full lifecycle ops: logs, labeling, analytics

Knowledge Pipeline core capabilities

Enterprise connectors

  • Local files: 30+ formats (PDF, Word, Excel, etc.)
  • Cloud storage: Google Drive, S3, Azure Blob, etc.
  • Online docs: Notion, Confluence, SharePoint
  • Web crawling: Firecrawl, Jina, Bright Data

Visual debugging & orchestration

  • Canvas orchestration: source connect → document processing
  • Live debugging: step testing, inspect intermediate variables
  • Standardized pipelines: publish into managed flows

Prebuilt templates & flows

  • General document: cost-effective indexing for bulk corpora
  • Long document: parent-child chunking to keep precision + global context
  • Table extraction: build structured QA pairs
  • Complex PDF parsing: targeted chart/figure extraction
  • Multimodal enrichment: LLM-generated chart descriptions for better recall

Pipeline core steps: Extract → Transform → Load

Enterprise value

Lower the barrier

  • Business teams can participate directly
  • Visual debugging to spot issues fast
  • Engineers focus on core product work

Boost efficiency

  • Templatized flows are reusable
  • Swap components flexibly
  • Stable architecture cuts maintenance cost

Vision: make enterprise unstructured data processing simple, reliable, and efficient

Summary & actions

  • RAG is evolving from static "state" to dynamic "memory."
  • The ceiling of memory is set by the knowledge pipeline and outer-loop evaluation.
  • Dify × Milvus provide an end-to-end path to build, store, and use memory.
Dify

Thank you

Contact: banana@dify.ai · GitHub: dify