跳转至

Outlook Index 总体架构设计

这份文档不再把架构拆成很多零散小图,而是收敛成一张主图。目标是让读者先用一张图看清:

  • 系统边界在哪里
  • 前端内部怎么分层
  • 一次上传如何进入 document-recognition run
  • 结果如何变成字段区和文档高亮
  • 历史回看如何恢复原始文档与字段修订 timeline

主图

flowchart TB
  User[内部操作用户]

  subgraph Frontend[Outlook Index 前端工作台]
    direction TB

    App[app.tsx]
    Workspace[FusionWorkspace]

    subgraph UI[界面层]
      direction LR
      Tabs[WorkspaceTabs]
      Header[SubHeader]
      ViewerShell[DocViewer]
      ResultShell[ExtractionForm]
    end

    subgraph State[工作台状态层]
      direction LR
      AgentState[selectedRuntimeAgentId 与 agentRecordList]
      RuntimeState[runtimeAgentDetail 与 uploadSlotResolution]
      UploadState[selectedUploadSlotKey 与 stagedDocumentSource]
      RunState[activeRun runRecordList pollingRunId]
      ReplayState[hydratedRunDocumentSource 与 hydratedRunDocumentCacheKey]
      HistoryState[selectedHistoryFieldId 与 fieldReviewTimeline]
      SelectionState[selectedTargetId currentPageNumber boundingBoxesVisible]
    end

    subgraph Adapters[前端适配层]
      direction LR
      DocumentRecognitionApi[shared/api/documentRecognition.ts]
      AuthApi[shared/api/auth.ts]
      OutputAdapter[lib/fusion-output.ts]
      GeometryAdapter[lib/bbox.ts]
    end

    subgraph ViewerStack[文档渲染层]
      direction TB
      ViewerPane[DocumentViewerPane]
      CanvasViewer[DocumentCanvasViewer]
      Overlay[BoundingBoxOverlay]
    end

    App --> Workspace
    Workspace --> Tabs
    Workspace --> Header
    Workspace --> ViewerShell
    Workspace --> ResultShell

    Workspace --> AgentState
    Workspace --> RuntimeState
    Workspace --> UploadState
    Workspace --> RunState
    Workspace --> ReplayState
    Workspace --> SelectionState

    AgentState --> DocumentRecognitionApi
    RuntimeState --> DocumentRecognitionApi
    UploadState --> DocumentRecognitionApi
    RunState --> DocumentRecognitionApi
    RunState --> OutputAdapter
    OutputAdapter --> GeometryAdapter
    ReplayState --> DocumentRecognitionApi
    HistoryState --> DocumentRecognitionApi

    ViewerShell --> ViewerPane
    ViewerPane --> CanvasViewer
    CanvasViewer --> Overlay

    SelectionState --> ViewerPane
    OutputAdapter --> ResultShell
    OutputAdapter --> SelectionState
    ReplayState --> ViewerPane
    UploadState --> ViewerPane
  end

  subgraph ProxyLayer[同源代理层]
    direction TB
    DemoBackend[demo-backend]
  end

  subgraph Backend[后端执行层]
    direction TB

    subgraph ApiLayer[API 层]
      direction LR
      AdminAuth[admin auth]
      RuntimeRegistry[document-recognition runtime registry]
      RecognitionRuns[document-recognition runs]
      AssetDownload[source/result download]
    end

    subgraph RuntimeLayer[运行时层]
      direction LR
      RunService[run_service]
      Runner[runner]
    end

    subgraph Persistence[持久化层]
      direction LR
      RunStore[fusion_runs]
      InputStore[fusion_run_inputs]
      OutputStore[fusion_run_outputs]
      ObjectStore[对象存储]
    end
  end

  User --> App

  DocumentRecognitionApi --> DemoBackend
  AuthApi --> DemoBackend
  DemoBackend --> AdminAuth
  DemoBackend --> RuntimeRegistry
  DemoBackend --> RecognitionRuns
  RecognitionRuns --> AssetDownload

  RecognitionRuns --> RunService
  RunService --> Runner
  RunService --> RunStore
  RunService --> InputStore
  RunService --> OutputStore
  AssetDownload --> ObjectStore
  Runner --> InputStore
  Runner --> OutputStore
  InputStore --> ObjectStore

  OutputStore --> OutputAdapter
  AssetDownload --> ReplayState

怎么读这张图

这张图表达的是一条贯穿主链路,而不是几个平行模块的堆叠。

  1. 用户进入 Outlook Index 后,所有交互都先落到 FusionWorkspace
  2. FusionWorkspace 维护七类核心状态:agent、runtime、upload、run、replay、history、selection
  3. 页面不直接消费原始后端响应,而是通过前端适配层处理:
  4. shared/api/documentRecognition.ts 负责 canonical document-recognition 接口调用
  5. lib/fusion-output.ts 负责结果归一化
  6. lib/bbox.ts 负责几何转 bbox
  7. 文档显示不是一个组件,而是一条渲染栈:
  8. DocViewer
  9. DocumentViewerPane
  10. DocumentCanvasViewer
  11. BoundingBoxOverlay
  12. Outlook Index 只通过 demo-backend 访问后端;图中的 source/result download 与 fusion_run_outputs 都属于 ai_service 内部能力,不是前端直连点
  13. 后端真正负责执行的是 document-recognition orchestration;Fusion 只是底层 runtime family 之一
  14. 历史回看时,前端先通过 demo-backend 获取 document-recognition run 详情、字段 summary 和下载地址,再按需读取单字段 revision timeline
  15. MANAGE 布局只影响 OPERATE 表单展示;history inspector 会直接读取完整 field_reviews,因此不会因为布局隐藏而漏字段

主链路

下面这条图只保留一条最重要的业务路径:从上传到结果联动,再到历史回放。

flowchart LR
  UserSelect[用户按 runtime name 选择已注册 agent 并选择文件]
  ResolveRuntime[读取 runtime detail 与 upload_slot_resolution]
  CreateRun[POST document-recognition/runs multipart 上传]
  PollRun[轮询 document-recognition run 详情直到终态]
  NormalizeOutput[归一化 workspace_output]
  RenderResult[渲染字段区与表格区]
  Highlight[点击字段后生成 highlightTarget]
  RenderViewer[viewer 跳页并绘制 bbox]
  OpenHistory[读取 document-recognition/runs 历史]
  FetchBlob[按 source_document_url/source_pdf_url 恢复历史文档 blob]
  SelectField[在 history inspector 中选择字段]
  FetchTimeline[按需读取 field revision timeline]
  ReplayViewer[在 viewer 中重建历史预览]

  UserSelect --> ResolveRuntime
  ResolveRuntime --> CreateRun
  CreateRun --> PollRun
  PollRun --> NormalizeOutput
  NormalizeOutput --> RenderResult
  RenderResult --> Highlight
  Highlight --> RenderViewer
  PollRun --> OpenHistory
  OpenHistory --> FetchBlob
  FetchBlob --> ReplayViewer
  OpenHistory --> SelectField
  SelectField --> FetchTimeline
  FetchTimeline --> RenderViewer

这条主链路说明了为什么当前设计必须保留几个中间层:

  • 没有 ResolveContract,上传槽位就只能硬编码
  • 没有 PollRun,前端就无法处理异步运行
  • 没有 NormalizeOutput,结果面板和 viewer 联动就会直接耦合后端原始结构
  • 没有 FetchBlob,历史 run 就只能看 JSON,不能恢复原文档预览
  • 没有按需 FetchTimeline,history 视图就只能看到字段当前态,无法解释“这个字段为什么变成现在这样”

状态机

如果只想看页面是怎么“动起来”的,看这张状态图就够了。

stateDiagram-v2
  [*] --> Boot
  Boot --> AgentReady : runtime agent 列表加载完成
  AgentReady --> ContractReady : 选择 runtime name 并解析内部 agent_id 后拉取 runtime detail
  ContractReady --> FileStaged : 本地文件已选择
  FileStaged --> RunCreated : 创建 document-recognition run 成功
  RunCreated --> Polling : status 为 queued 或 running
  RunCreated --> RunCompleted : status 为 completed
  RunCreated --> RunFailed : status 为 failed
  Polling --> Polling : 继续拉取 run
  Polling --> RunCompleted : 进入 completed
  Polling --> RunFailed : 进入 failed
  RunCompleted --> ResultBrowsing : 浏览字段 表格 bbox
  ResultBrowsing --> HistorySelected : 切换到某条历史 run
  HistorySelected --> ReplayLoading : 存在 download_url
  HistorySelected --> ReplayMissing : 无 download_url 或对象不存在
  ReplayLoading --> ReplayReady : blob 恢复成功
  ReplayLoading --> ReplayMissing : 404 或对象缺失
  ReplayReady --> ResultBrowsing : 继续查看结果 高亮 与字段 timeline
  RunFailed --> FileStaged : 重新上传

代码入口

如果要落到源码,这里只保留最关键的入口,不再铺太多文件。

flowchart TB
  Root[想改 Outlook Index]
  Root --> Workspace[FusionWorkspace.tsx]
  Workspace --> Api[shared/api/documentRecognition.ts]
  Workspace --> Output[lib/fusion-output.ts]
  Workspace --> Viewer[components/viewer]
  Api --> Backend[ai_service/document_recognition/interfaces/http/runs.py]
  Backend --> Runtime[ai_service/document_recognition/application/use_cases/create_runs.py]
  Runtime --> FusionRuntime[ai_service/fusion/application/use_cases/create_run.py 与 infrastructure/runtime/runner.py]

关键文件:

  • outlook-index/src/components/fusion/FusionWorkspace.tsx
  • outlook-index/shared/api/documentRecognition.ts
  • outlook-index/src/lib/fusion-output.ts
  • outlook-index/src/components/viewer/DocumentViewerPane.tsx
  • outlook-index/src/components/viewer/DocumentCanvasViewer.tsx
  • outlook-index/src/components/viewer/BoundingBoxOverlay.tsx
  • demo-backend/app/main.py
  • ai_service/document_recognition/interfaces/http/runs.py

设计结论

outlook-index 的本质不是“一个页面”,而是一个围绕 canonical document-recognition run 生命周期组织起来、同时保留 Fusion 作为底层 runtime 的前端工作台壳:

  • 上游通过 runtime registry 与 detail API 解析上传契约
  • 中游通过 /document-recognition/runs 发起与跟踪执行
  • 下游通过 canonical workspace_output 和 viewer 栈完成字段展示与 bbox 联动
  • 历史侧通过单证识别下载接口恢复原文档回放

如果后续继续扩展,这张主图仍然成立。新增能力应优先挂在四个位置之一:

  • 工作台状态层
  • 前端适配层
  • viewer 渲染层
  • document-recognition API 与 run projection 层