Outlook Index 总体架构设计
这份文档不再把架构拆成很多零散小图,而是收敛成一张主图。目标是让读者先用一张图看清:
- 系统边界在哪里
- 前端内部怎么分层
- 一次上传如何进入 document-recognition run
- 结果如何变成字段区和文档高亮
- 历史回看如何恢复原始文档与字段修订 timeline
主图
flowchart TB
User[内部操作用户]
subgraph Frontend[Outlook Index 前端工作台]
direction TB
App[app.tsx]
Workspace[FusionWorkspace]
subgraph UI[界面层]
direction LR
Tabs[WorkspaceTabs]
Header[SubHeader]
ViewerShell[DocViewer]
ResultShell[ExtractionForm]
end
subgraph State[工作台状态层]
direction LR
AgentState[selectedRuntimeAgentId 与 agentRecordList]
RuntimeState[runtimeAgentDetail 与 uploadSlotResolution]
UploadState[selectedUploadSlotKey 与 stagedDocumentSource]
RunState[activeRun runRecordList pollingRunId]
ReplayState[hydratedRunDocumentSource 与 hydratedRunDocumentCacheKey]
HistoryState[selectedHistoryFieldId 与 fieldReviewTimeline]
SelectionState[selectedTargetId currentPageNumber boundingBoxesVisible]
end
subgraph Adapters[前端适配层]
direction LR
DocumentRecognitionApi[shared/api/documentRecognition.ts]
AuthApi[shared/api/auth.ts]
OutputAdapter[lib/fusion-output.ts]
GeometryAdapter[lib/bbox.ts]
end
subgraph ViewerStack[文档渲染层]
direction TB
ViewerPane[DocumentViewerPane]
CanvasViewer[DocumentCanvasViewer]
Overlay[BoundingBoxOverlay]
end
App --> Workspace
Workspace --> Tabs
Workspace --> Header
Workspace --> ViewerShell
Workspace --> ResultShell
Workspace --> AgentState
Workspace --> RuntimeState
Workspace --> UploadState
Workspace --> RunState
Workspace --> ReplayState
Workspace --> SelectionState
AgentState --> DocumentRecognitionApi
RuntimeState --> DocumentRecognitionApi
UploadState --> DocumentRecognitionApi
RunState --> DocumentRecognitionApi
RunState --> OutputAdapter
OutputAdapter --> GeometryAdapter
ReplayState --> DocumentRecognitionApi
HistoryState --> DocumentRecognitionApi
ViewerShell --> ViewerPane
ViewerPane --> CanvasViewer
CanvasViewer --> Overlay
SelectionState --> ViewerPane
OutputAdapter --> ResultShell
OutputAdapter --> SelectionState
ReplayState --> ViewerPane
UploadState --> ViewerPane
end
subgraph ProxyLayer[同源代理层]
direction TB
DemoBackend[demo-backend]
end
subgraph Backend[后端执行层]
direction TB
subgraph ApiLayer[API 层]
direction LR
AdminAuth[admin auth]
RuntimeRegistry[document-recognition runtime registry]
RecognitionRuns[document-recognition runs]
AssetDownload[source/result download]
end
subgraph RuntimeLayer[运行时层]
direction LR
RunService[run_service]
Runner[runner]
end
subgraph Persistence[持久化层]
direction LR
RunStore[fusion_runs]
InputStore[fusion_run_inputs]
OutputStore[fusion_run_outputs]
ObjectStore[对象存储]
end
end
User --> App
DocumentRecognitionApi --> DemoBackend
AuthApi --> DemoBackend
DemoBackend --> AdminAuth
DemoBackend --> RuntimeRegistry
DemoBackend --> RecognitionRuns
RecognitionRuns --> AssetDownload
RecognitionRuns --> RunService
RunService --> Runner
RunService --> RunStore
RunService --> InputStore
RunService --> OutputStore
AssetDownload --> ObjectStore
Runner --> InputStore
Runner --> OutputStore
InputStore --> ObjectStore
OutputStore --> OutputAdapter
AssetDownload --> ReplayState
怎么读这张图
这张图表达的是一条贯穿主链路,而不是几个平行模块的堆叠。
- 用户进入
Outlook Index后,所有交互都先落到FusionWorkspace FusionWorkspace维护七类核心状态:agent、runtime、upload、run、replay、history、selection- 页面不直接消费原始后端响应,而是通过前端适配层处理:
shared/api/documentRecognition.ts负责 canonical document-recognition 接口调用lib/fusion-output.ts负责结果归一化lib/bbox.ts负责几何转 bbox- 文档显示不是一个组件,而是一条渲染栈:
DocViewerDocumentViewerPaneDocumentCanvasViewerBoundingBoxOverlayOutlook Index只通过demo-backend访问后端;图中的 source/result download 与fusion_run_outputs都属于ai_service内部能力,不是前端直连点- 后端真正负责执行的是 document-recognition orchestration;Fusion 只是底层 runtime family 之一
- 历史回看时,前端先通过
demo-backend获取 document-recognition run 详情、字段 summary 和下载地址,再按需读取单字段 revision timeline MANAGE布局只影响OPERATE表单展示;history inspector 会直接读取完整field_reviews,因此不会因为布局隐藏而漏字段
主链路
下面这条图只保留一条最重要的业务路径:从上传到结果联动,再到历史回放。
flowchart LR
UserSelect[用户按 runtime name 选择已注册 agent 并选择文件]
ResolveRuntime[读取 runtime detail 与 upload_slot_resolution]
CreateRun[POST document-recognition/runs multipart 上传]
PollRun[轮询 document-recognition run 详情直到终态]
NormalizeOutput[归一化 workspace_output]
RenderResult[渲染字段区与表格区]
Highlight[点击字段后生成 highlightTarget]
RenderViewer[viewer 跳页并绘制 bbox]
OpenHistory[读取 document-recognition/runs 历史]
FetchBlob[按 source_document_url/source_pdf_url 恢复历史文档 blob]
SelectField[在 history inspector 中选择字段]
FetchTimeline[按需读取 field revision timeline]
ReplayViewer[在 viewer 中重建历史预览]
UserSelect --> ResolveRuntime
ResolveRuntime --> CreateRun
CreateRun --> PollRun
PollRun --> NormalizeOutput
NormalizeOutput --> RenderResult
RenderResult --> Highlight
Highlight --> RenderViewer
PollRun --> OpenHistory
OpenHistory --> FetchBlob
FetchBlob --> ReplayViewer
OpenHistory --> SelectField
SelectField --> FetchTimeline
FetchTimeline --> RenderViewer
这条主链路说明了为什么当前设计必须保留几个中间层:
- 没有
ResolveContract,上传槽位就只能硬编码 - 没有
PollRun,前端就无法处理异步运行 - 没有
NormalizeOutput,结果面板和 viewer 联动就会直接耦合后端原始结构 - 没有
FetchBlob,历史 run 就只能看 JSON,不能恢复原文档预览 - 没有按需
FetchTimeline,history 视图就只能看到字段当前态,无法解释“这个字段为什么变成现在这样”
状态机
如果只想看页面是怎么“动起来”的,看这张状态图就够了。
stateDiagram-v2
[*] --> Boot
Boot --> AgentReady : runtime agent 列表加载完成
AgentReady --> ContractReady : 选择 runtime name 并解析内部 agent_id 后拉取 runtime detail
ContractReady --> FileStaged : 本地文件已选择
FileStaged --> RunCreated : 创建 document-recognition run 成功
RunCreated --> Polling : status 为 queued 或 running
RunCreated --> RunCompleted : status 为 completed
RunCreated --> RunFailed : status 为 failed
Polling --> Polling : 继续拉取 run
Polling --> RunCompleted : 进入 completed
Polling --> RunFailed : 进入 failed
RunCompleted --> ResultBrowsing : 浏览字段 表格 bbox
ResultBrowsing --> HistorySelected : 切换到某条历史 run
HistorySelected --> ReplayLoading : 存在 download_url
HistorySelected --> ReplayMissing : 无 download_url 或对象不存在
ReplayLoading --> ReplayReady : blob 恢复成功
ReplayLoading --> ReplayMissing : 404 或对象缺失
ReplayReady --> ResultBrowsing : 继续查看结果 高亮 与字段 timeline
RunFailed --> FileStaged : 重新上传
代码入口
如果要落到源码,这里只保留最关键的入口,不再铺太多文件。
flowchart TB
Root[想改 Outlook Index]
Root --> Workspace[FusionWorkspace.tsx]
Workspace --> Api[shared/api/documentRecognition.ts]
Workspace --> Output[lib/fusion-output.ts]
Workspace --> Viewer[components/viewer]
Api --> Backend[ai_service/document_recognition/interfaces/http/runs.py]
Backend --> Runtime[ai_service/document_recognition/application/use_cases/create_runs.py]
Runtime --> FusionRuntime[ai_service/fusion/application/use_cases/create_run.py 与 infrastructure/runtime/runner.py]
关键文件:
outlook-index/src/components/fusion/FusionWorkspace.tsxoutlook-index/shared/api/documentRecognition.tsoutlook-index/src/lib/fusion-output.tsoutlook-index/src/components/viewer/DocumentViewerPane.tsxoutlook-index/src/components/viewer/DocumentCanvasViewer.tsxoutlook-index/src/components/viewer/BoundingBoxOverlay.tsxdemo-backend/app/main.pyai_service/document_recognition/interfaces/http/runs.py
设计结论
outlook-index 的本质不是“一个页面”,而是一个围绕 canonical document-recognition run 生命周期组织起来、同时保留 Fusion 作为底层 runtime 的前端工作台壳:
- 上游通过 runtime registry 与 detail API 解析上传契约
- 中游通过
/document-recognition/runs发起与跟踪执行 - 下游通过 canonical
workspace_output和 viewer 栈完成字段展示与 bbox 联动 - 历史侧通过单证识别下载接口恢复原文档回放
如果后续继续扩展,这张主图仍然成立。新增能力应优先挂在四个位置之一:
- 工作台状态层
- 前端适配层
- viewer 渲染层
- document-recognition API 与 run projection 层