跳转至

单证识别参考

本页回答“当前单证识别由哪些模块构成、对外暴露什么 API、怎么表达回顾持久化”。

主模块

路径 作用
ai_service/document_recognition/domain/ 单证识别 run/review/issue/summary 模型
ai_service/document_recognition/application/ports/ repository / asset store 抽象
ai_service/document_recognition/application/use_cases/create_runs.py runtime metadata 解析与 canonical run 创建用例
ai_service/document_recognition/application/use_cases/review_runs.py Fusion run projection 与字段回顾用例
ai_service/document_recognition/application/projections.py summary / issue / field review 归一化
ai_service/document_recognition/infrastructure/persistence/document_recognition_repository.py SQLAlchemy 仓库适配器
ai_service/document_recognition/infrastructure/persistence/legacy_document_extraction_job_bridge.py 历史 review storage bridge
ai_service/document_recognition/infrastructure/storage/minio_document_asset_store.py 文档资产读写
ai_service/document_recognition/interfaces/http/router.py HTTP 聚合入口
ai_service/document_recognition/interfaces/http/runs.py run/review API

当前 API 面

接口 作用
GET /document-recognition/runtime-agents 列出当前已注册的可选 Fusion runtime agent
GET /document-recognition/runtime-agents/{runtime_agent_id} 查看 runtime 的 published 版本、上传槽位解析与执行策略摘要
POST /document-recognition/runs 通过 canonical document-recognition 入口上传文件并创建 run
GET /document-recognition/runs 列出可投影的单证识别 run
GET /document-recognition/runs/{run_id} 查看单个 run 的 source、summary、fields、issues、review 状态与 workspace_output
PATCH /document-recognition/runs/{run_id}/field-reviews/{field_id} 修正字段 review
GET /document-recognition/runs/{run_id}/field-reviews/{field_id}/revisions 按需读取单字段 baseline 与 revision timeline
GET /document-recognition/runs/{run_id}/source-document 下载原始 source document
GET /document-recognition/runs/{run_id}/source-pdf 下载 PDF source
GET /document-recognition/runs/{run_id}/result 下载 structured result JSON
GET /admin/document-recognition/overview Studio 总览
GET /admin/document-recognition/runs 后台运行记录
GET /admin/document-recognition/runtime-agents 查看当前 registry
PUT /admin/document-recognition/runtime-agents/{agent_id} 注册一个可用于单证识别的 Fusion agent
DELETE /admin/document-recognition/runtime-agents/{agent_id} 从 registry 中移除一个 Fusion agent

Registry PUT / DELETE 使用 agent_id 作为稳定标识,要求管理员具备 document_recognition.write,并写入 admin audit;该路径不要求 x-admin-challenge-token

Run Detail Response

run detail 返回:

  • runtime_agent_id
  • runtime_agent_version_id
  • runtime_agent_type_snapshot
  • source_document_url
  • source_pdf_url
  • summary
  • field_reviews
  • issue_list
  • preview_pages
  • workspace_output

其中 field_reviews[] 只携带轻量 revision summary:

  • revision_count
  • is_changed_from_extracted
  • last_revised_at
  • last_revised_by

完整单字段 ledger 需要单独读取 GET /document-recognition/runs/{run_id}/field-reviews/{field_id}/revisions

Field Revision Timeline

单字段 timeline 响应会返回:

  • baseline extracted value
  • 当前 current_value / current_review_status / current_reviewer_note
  • history_status,取值为 recordedunrecorded
  • append-only revisions[]

对于功能上线前没有 ledger 的旧 run,服务端会返回 baseline 快照与 history_status=unrecorded。前端应把它展示成“未记录历史”,而不是空白。

应用边界提示

  • Fusion 子系统负责创建和执行 run。
  • document_recognition 负责把 Fusion output 转成 review projection。
  • run detail 默认只返回轻量 field revision summary;timeline 按需单独读取,避免默认 payload 过重。
  • runtime agent 是否属于 document recognition 由 admin registry 显式声明。
  • 旧 review storage 兼容只在 infrastructure bridge 内部处理。
  • /document-recognition/runtime-agents*/document-recognition/runs* 都是公开 route,不再维护额外的前缀别名。
  • /agents/{agent_id}/extraction-jobs 不再由 ai_service/document_recognition 包注册。

读代码时的典型切入点

想改 Fusion output 到字段回顾的映射

application/projections.py

想改 projection 持久化

application/use_cases/review_runs.pyinfrastructure/persistence/legacy_document_extraction_job_bridge.py

想改 API 返回字段

看:

  1. interfaces/http/schemas.py
  2. interfaces/http/serialization.py

相关文档