接口契约（已实现）

范围说明

本页仅描述当前代码中已实现、可对接的接口契约与交互约定。

端点清单请以「端点概览」为准（单点维护）。
规划中的能力请移步「Chameleon → API 路线图」。

快速入口

端点概览：按路由快速查表（HTTP / SSE / WebSocket）。
API 服务（mkdocstrings）：ChatRequest、会话/Turn/上下文响应模型、WebSocket 消息模型等。
存储模型（mkdocstrings）：Session、Message、Agent、KnowledgeSource 等持久化模型。
AI 服务脑图（全景导航）：帮助快速建立心智模型。

MCP 出站能力约定

本版本仅支持 出站 MCP Client 能力（Agent 消费外部 MCP Server），不包含入站 MCP Server。
MCP 能力按 agent_id 解析挂载：未配置 MCP 挂载时，对话行为与现有流程保持一致。
当前支持 stdio 与 http_sse 两种传输。
http_sse 支持 URL Import 模式，可将普通 HTTP URL 映射为单工具能力。
凭据仅以加密形式存储在数据库，接口不返回明文 secret。
工具调用审计可通过会话维度与 Agent 维度查询。

Fusion Public Access 契约

Fusion Agent 现在支持显式 public access 控制，并把匿名发现与凭据调用分离：

public_access_mode 仅支持 admin_only、public_anonymous、public_api_key
public_api_key 只允许用于 agent_type="fusion"
GET /public/agents 只返回 status="active" 且 public_access_mode="public_anonymous" 的 Agent
public_api_key 模式不会出现在匿名 catalog 中，但允许通过 public Fusion runtime / run 路由配合 x-agent-api-key 调用

当前 public Fusion 的契约边界如下：

GET /public/agents/{agent_id}/fusion-runtime
返回 published AgentVersion 派生的 runtime metadata
不返回 live Fusion draft、prompt、治理上下文
POST /public/agents/{agent_id}/fusion-runs
允许匿名或 API key 调用，具体取决于 public_access_mode
若请求显式传入 agent_version_id，该值只能等于当前 published version
GET /public/agents/{agent_id}/fusion-runs
GET /public/agents/{agent_id}/fusion-runs/{run_id}
GET /public/agents/{agent_id}/fusion-runs/{run_id}/inputs/{input_id}/download
上述 public 读路径只暴露 agent_version_id == published_agent_version_id 的 run，不公开历史 snapshot run

Fusion Agent API key 管理契约如下：

POST /agents/{agent_id}/public-api-keys
仅创建成功响应返回一次明文 key
服务端只保存 token_hash 与元数据
GET /agents/{agent_id}/public-api-keys
只返回 label、prefix、过期时间、撤销状态、最近使用时间等元数据
POST /agents/{agent_id}/public-api-keys/{key_id}/revoke
撤销后该 key 立即失效，但历史 run 仍保留 authenticated_agent_api_key_id 关联

请求方约定：

当 public_access_mode=public_api_key 时，必须在 public Fusion 请求中携带 x-agent-api-key
缺失、过期或已撤销 key 返回 401
public 响应不会返回 governance_context、trace_id、trace_url、prompt_messages、model_raw_response

一、通用约定

1) 编码与时间格式

所有 JSON 请求/响应均为 UTF-8 编码。
时间字段统一使用 ISO 8601 字符串（由后端 Pydantic 输出）。

2) 标识符字段

id、session_id、thread_id、agent_id、source_id 等均为字符串。

3) 会话与消息概念

session：一次多轮对话的会话容器，包含状态与模型配置。
message：会话中的单条消息，包含角色与内容。
turn：一次“用户输入 → 助手响应”的观测单元，包含 trace、模型参数与上下文快照。

4) 角色（`role`）

当前对外接口中 role 以字符串表达（未强制枚举类型）：

user：终端用户
assistant：AI 回复
agent：人工坐席
system：系统提示或错误提示

5) Token Usage 约定

系统现在为会话 turn、定时任务运行视图和评测结果统一返回 token 统计结构：

TokenUsageSummary
input_tokens
output_tokens
total_tokens
token_usage_source
token_usage_source 当前固定为以下四种之一：
provider_reported
tokenizer_estimated
no_model_invocation
unavailable
TokenUsageRollup
同样返回 input_tokens/output_tokens/total_tokens
额外返回 provider_reported_count、tokenizer_estimated_count、no_model_invocation_count、unavailable_count
EstimatedCostSummary
input_cost_usd
output_cost_usd
total_cost_usd
estimated_interaction_count
missing_pricing_count

具体挂载位置：

会话 Turn：generation_usage
定时任务运行：derived_generation_usage
评测 Run：generation_usage_rollup、judge_usage_rollup、combined_total_tokens
评测样本结果：generation_usage、judge_usage
Agent cost overview：额外返回 conversation_estimated_cost、evaluation_generation_estimated_cost、evaluation_judge_estimated_cost、combined_estimated_cost

价格估算规则：

仅当对应 (model_provider, model_name) 在 model config 中配置了输入/输出单价时，系统才会累计该次交互的估算费用
单价字段含义为 USD / 1,000,000 tokens
所有费用字段都是内部估算值，实际价格以模型供应商账单为准

写入规则统一为：

优先使用 provider 直接返回的 usage；如果同一个 turn / evaluation item 内只拿到部分 provider usage，则整条记录降级为 tokenizer 估算，避免把 partial total 标记成 provider_reported
缺失时使用本地 tokenizer 估算
deterministic / no-model 路径写入 0 并标记 no_model_invocation
若估算也失败，则显式写入 unavailable

二、AI Service（FastAPI）对接契约

1) `GET /healthz`

用途：健康检查
响应（示例）：

{
  "status": "ok",
  "app_env": "production",
  "app_instance": "aibot-prod",
  "database_fingerprint": "prod-db/chameleon_meta",
  "guard_mode": "strict",
  "guard_state": "ok",
  "guard_messages": []
}

说明：

status 字段保持向后兼容（仍为 ok）。
guard_state 取值：ok / warning / failed。
database_fingerprint 仅用于排障，不包含凭据。

2) `POST /stream`（SSE 流式对话）

模型以 ChatRequest 为准（见 API 服务（mkdocstrings））。

请求语义

thread_id 用于标识会话；若不存在将创建新会话。
可通过 model_* 覆盖当前会话的模型配置（会触发校验并写库）。
message_type 可选，用于为本次入站消息打标签，并影响发送给 Agent 的文本：
已知值：user_chat / form / rule / other
缺省值：user_chat
未知值：会被归一化为 other（不会触发 422）
注入规则：
- user_chat：原文透传，不加前缀
- 其他类型：按以下格式拼接后再发送给 Agent
```
[message_type=<type>]
<original_message>
```
agent_id 可选，用于：
RAG 检索过滤：自动过滤为该 Agent 挂载的知识源
模型默认参数：当请求未显式提供 model_* 时，使用 Agent 的默认模型配置
系统提示：当 Agent 配置了 system_prompt 时，会作为系统消息影响对话
聊天运行时选择：当 agent_type=chat 时，服务端会读取 Agent 的 orchestrator_key 选择具体聊天编排器；未配置的历史行会回退到默认 chameleon_chat_v1
角色化模型路由：当 agent_type=chat 且配置了 model_routing_config.enabled=true 时，工具阶段可使用 tool_call_model，最终回复阶段会在 general_model、complex_task_model、reasoning_model 与主模型回退之间做确定性选择
MCP 工具能力：当 Agent 配置了有效 MCP 挂载时，编排器可进行受控工具调用
固定响应短路：仅 chat Agent 支持 message_type_response_config；服务端会先按当前 message_type + message 命中 message_key_responses[]，未命中时再回退该 message_type 的默认启用文案；命中后直接返回配置文案，不进入 orchestrator / LLM / MCP，也不会应用本次请求的模型覆盖到会话状态。WebSocket chat session 沿用 [message_type=<type>]\n<body> content 约定，服务端会在进入 session runtime 编排前解析该 tag，并用剥离 tag 后的 <body> 做 keyed response 匹配。
MCP quick-match 短路：仅 chat Agent 支持 mcp_response_config.quick_match_rules[]；当 message_type=user_chat 且原始消息命中某个前缀规则时，服务端会在 /stream API 层直接调用一次已挂载 MCP 工具，并返回 direct-response 格式化结果或 raw JSON，不进入 orchestrator / 最终 LLM

请求示例

{
  "thread_id": "thread_123",
  "agent_id": "agent_abc",
  "message": "请根据这个表单内容生成一段总结。",
  "message_type": "form"
}

说明：若传入未知值（例如 "message_type":"foo"），服务端会将其归一化为 other，并按 other 的注入规则处理。

模型默认值与覆盖优先级

当请求中包含 agent_id 时，模型参数优先级如下：

请求显式提供的 model_name、model_provider、model_temperature
chat Agent 的 model_routing_config（仅在未显式覆盖且 enabled=true 时，对不同阶段分别决策）
Agent 的默认主模型配置 model_name、model_provider、model_temperature
系统全局默认配置

补充说明：

若本次请求显式提供任意 model_* 字段，则当前执行会折叠为单模型路径，model_routing_config 只保留为 Agent 配置，不参与本次阶段路由。
若 model_routing_config 某个角色槽位为空，则该阶段自动回退到 Agent 的主模型三元组。

SSE 响应事件格式

服务端以 text/event-stream 输出事件，单条事件格式：data: {json}\n\n。

{"type":"token","content":"..."}：流式输出片段
{"type":"error","content":"..."}：校验或运行错误
{"type":"done"}：本次输出结束

说明：当前实现中会将完整回复按空格拆分为片段并模拟流式输出；无论成功或失败，都会在最后发送 done。

说明：若命中 Agent 的 message_type_response_config 固定响应或 mcp_response_config.quick_match_rules[] 快路径，SSE 事件形态保持不变，仍返回 token / done；区别仅在于回复由服务端确定性地产出，而不是最终模型生成。固定响应与 quick-match 都仅适用于 chat Agent。

WebSocket 补充说明：

WS /ws/{session_id} 与 WS /public/ws/{session_id} 支持同一套 message-type fixed response 语义。
当 WebSocket content 以 [message_type=form]\n、[message_type=rule]\n 或 [message_type=other]\n 开头时，服务端会先解析 message type，并使用 tag 后正文参与 message_key_responses[] 精确匹配。
若命中 fixed response，服务端会广播正常 message 事件：一条用户消息和一条 assistant fixed response；该路径不会进入 RAG、MCP、LLM 或模型覆盖写入。

Turn 级观测补充：

GET /sessions/{session_id}/turns/{turn_id}/context 的 context_payload.model_routing 会返回本轮实际路由决策，当前字段包括：
route_mode
policy_version
explicit_override_active
stage_decisions[]
final_model_role
final_model_config
ConversationTurn 的平铺 model_name/model_provider/model_temperature 仍表示最终回复阶段实际使用的模型，不展开记录每个中间阶段。

SSE 响应示例

data: {"type":"token","content":"这是"}

data: {"type":"token","content":"总结"}

data: {"type":"done"}

3) `WS /ws/{session_id}`（WebSocket 会话）

入站/出站模型以 IncomingWebSocketMessage / OutgoingWebSocketMessage 为准（见 API 服务（mkdocstrings））。

连接建立后的行为

连接建立后，服务端立即发送：
type=history：最近消息（当前上限为 200 条）
type=session：当前会话状态，包含 status、human_takeover_enabled、owner 快照与时间戳
查询参数支持：
role=customer|operator
operator_id：role=operator 时执行接管 / 释放 / 人工回复所必需；纯观察连接可省略
可选 operator_name

消息类型（出站）

type=history：历史消息回放
type=message：单条消息（入站消息写库回传、或 AI 生成回复写库回传）
type=system：状态类系统事件（例如接管/释放导致的状态变更）
type=takeover_event：持久化 takeover 审计事件广播
type=error：校验或运行错误

事件（入站 `event`）

若 event 为空但 content 非空，服务端会将其视为用户消息事件（等价于 user_message）。
当 event 为 agent_takeover / agent_release 时：
仅 role=operator 连接可触发，且要求 websocket 查询参数中提供 operator_id
控制路由会直接更新会话状态（AI_ACTIVE / HUMAN_ACTIVE），然后广播 session / system / takeover_event
服务端会额外发送一次 type=system，携带当前状态：{"status": "..."}
当会话处于 HUMAN_ACTIVE：
客户消息仍会写入 messages 并广播到房间
AI 不会自动回复
不会创建新的 conversation_turns
人工回复必须由 active owner 的 role=operator 连接发出，并持久化为 messages.role="agent"

3.1) 会话历史观测端点

GET /sessions
返回分页信封：items、next_cursor、limit
支持过滤参数：agent_id、q、date_from、date_to
每个 item 返回 agent_id、可选 agent_name、human_takeover_enabled、takeover_owner_*、takeover_started_at、takeover_released_at
当 sessions.agent_id 为空时，服务端会回退到该会话「最近一条非空 turn.agent_id」作为 agent_id
agent_id 过滤同时匹配会话级绑定与上述回退结果（无需历史数据迁移）
GET /sessions/{session_id}/messages
人工回复额外返回 sender_id_snapshot 与 sender_name_snapshot
GET /sessions/{session_id}/takeover-events
返回该会话的 durable 审计流，当前 event_type 包括 takeover、force_takeover、release、manual_message
GET /sessions/{session_id}/turns
返回该会话的 turn 时间线，支持 limit/cursor
每个 turn 包含用户消息/助手回复预览、trace、模型配置快照，以及 generation_usage
若该 turn 已存在纠错，还会附带 correction 摘要：始终返回 status、revision_number、manual_reply_message_id、published_at、updated_at
若当前管理员同时具备 knowledge.read 或 knowledge.write，correction 摘要还会额外返回 source_id/source_name、document_id、ingestion_job_id、ingestion_job_status(_message) 与 last_error_message
GET /sessions/{session_id}/turns/{turn_id}/context
返回 turn 上下文快照：context_payload、rag_context、skill_context、mcp_context
turn 子对象同样包含 generation_usage
返回 trace 关联审计：skill_calls、mcp_calls
其中 turn.correction 适用与 GET /sessions/{session_id}/turns 相同的知识字段脱敏规则
GET /sessions/{session_id}/turns/{turn_id}/correction
返回完整纠错草稿：original_user_message、original_ai_answer、corrected_question、corrected_answer、correction_note
同时返回 source_id/source_name、manual_reply_message_id、document_id、ingestion_job_id、ingestion_job_status(_message)、last_error_message
该接口要求同时具备 conversations.read 与 knowledge.read
PATCH /sessions/{session_id}/turns/{turn_id}/correction
请求体要求 corrected_question 与 corrected_answer
manual_reply_message_id 为可选字段；仅在已发送当前会话纠正答复时回填
POST /sessions/{session_id}/turns/{turn_id}/correction/publish
不接收额外请求体；发布的是当前已保存的纠错草稿
首次发布时会自动解析或创建该 Agent 的托管纠错知识源
同步响应只表示“发布任务已启动”；返回 status=publishing 时，新的纠错内容尚未进入检索，需等待后台 ingestion_job 成功
每次发布只会摄取本次候选纠错文档；若旧 publish 任务晚于新任务完成，旧结果不会回滚或覆盖新的发布状态
GET /knowledge-sources/{source_id}/corrections
返回该托管知识源下的完整纠错发布账本
该接口要求同时具备 knowledge.read 与 conversations.read

4) Agent / 知识源 / 文档与摄取

本组接口用于 Agent 管理、知识源管理、文档上传与摄取等能力：

端点清单请以「端点概览」为准。
请求/响应模型请以「API 服务（mkdocstrings）」为准。
Agent 配置允许在创建后继续更新（包含 agent_type），但当存在不兼容的运行中任务时会被拒绝。
agent_type 是粗粒度产品边界；聊天运行时版本通过 orchestrator_key 表达，而不是新增 agent_type。
orchestrator_key 仅对 chat Agent 生效。创建、更新、回滚到 chat 快照时，若该字段为空，服务端会补成默认值 chameleon_chat_v1；非 chat Agent 不接受该字段。
model_routing_config 仅对 chat Agent 生效。该结构体允许为 tool_call_model、general_model、complex_task_model、reasoning_model 分别保存独立模型；空槽位继续回退到 Agent 主模型。
model_routing_config.collapse_to_explicit_override=true 时，请求级或评测级显式 model_* 覆盖会把本次执行折叠成单模型路径。这是当前默认兼容策略。
GET /chat-orchestrators 返回当前已注册聊天编排器元数据，供控制面下拉选择，不应在前端硬编码。
所有聊天运行时都必须继续接受 OrchestratorInput 并返回 OrchestratorResult；运行时差异属于服务端内部实现，不扩展 API 请求体。
所有聊天运行时都应输出统一最小观测负载，当前版本字段标记为 observability_schema_version = "v2"。
当 RAG 命中托管纠错来源时，turn 观测中的 rag_context[] 还会携带 source_name、source_kind、source_priority、document_id 与 document_origin，用于解释最终答案为何优先采用纠错条目。
审批 checkpoint 与长期记忆属于运行时内部输入：
不新增 API 级请求字段
由服务端在运行时内部读取 runtime_checkpoints / agent_memory_records
当前长期记忆默认配置关闭；checkpoint 仅在显式配置审批工具名时激活
chat Agent 配置可包含 message_type_response_config，当前支持按 rule/form/other 配置默认 enabled + response_text，并可在每个 message_type 下配置 message_key_responses[] 明细映射。
chat Agent 还可配置 human_takeover_enabled，用于决定该 Agent 的会话是否允许被控制台人工接管。
chat Agent 还可配置 hide_rag_source_filename，用于在 RAG 注入时隐藏给模型看的来源文件名；该设置不影响后台 rag_context_records、trace payload 等审计/排障负载中的原始文件名。
chat Agent 还可配置 response_grounding_config，用于控制最终回复阶段的 grounded-answer 策略。当前字段包括：
enabled
policy_version
mode（balanced / strict）
allow_general_knowledge
low_confidence_behavior
structured_output_failure_behavior
response_grounding_config.enabled=true 时，默认聊天运行时会在最终回复阶段优先尝试结构化输出，并根据策略把回答归类为直接回答、要求澄清或保守拒答。
message_key_responses[] 中的 message_key 与 response_text 都必须是非空字符串；同一 message_type 下归一化后的 message_key 不可重复。
运行时优先级为：先匹配 message_key_responses[]，再回退默认 enabled + response_text。
chat Agent 还可配置 mcp_response_config：
direct_response_rules[] 绑定 mcp_server_id + tool_name
输出模式支持 raw_json、json_subset、text
wrap_json_code_block=true 时，JSON 类输出会包成 Markdown ```json fenced block，且格式化失败回退到 raw_json 时也沿用该行为
quick_match_rules[] 支持显式前缀命令，例如 #get_tracking123556
quick_match_rules[].response_rule_key 可选，用于复用 direct-response rule；若不配置则 quick-match 默认返回 raw JSON
intent_gate_rules[] 当前首版支持 tracking capability，可配置查询前门禁关键词、Python regex、default/append/replace 来源模式与 allow_contextual_follow_up
查询前门禁关键词匹配的主入口是 Agent settings 中的 mcp_response_config，不是全局 ai_service/utils/settings.py
Agent 版本管理支持手动快照、历史查询和回滚；orchestrator_key、model_routing_config、message_type_response_config、mcp_response_config、response_grounding_config 都会一起进入快照，回滚前会自动保存当前配置快照以支持可逆操作。

5) MCP 管理接口（REST）

MCP Server：/mcp-servers（CRUD + health-check + test-call）
MCP Credential：/mcp-credentials（CRUD + rotate）
Agent MCP Mount：/agents/{agent_id}/mcp-mounts（CRUD）
Agent MCP Preview：/agents/{agent_id}/mcp-response-preview
MCP 审计：/sessions/{session_id}/mcp-calls、/agents/{agent_id}/mcp-calls

MCP Server 配置约束

transport_type=stdio：
connection_config.command 必填
connection_config.args 可选（字符串或字符串数组）
transport_type=http_sse：
connection_config.url 必填，且必须是 http:// 或 https://
connection_config.headers 可选（JSON 对象）
connection_config.import_mode 可选：mcp / url
- mcp：远端处理 health/check、tools/list、tools/call
- url：控制面将 URL 作为单工具导入
import_mode=url 时支持 connection_config.http_method（GET/POST）与 connection_config.tool_name
test-call：
action 支持 tools/list 与 tools/call
action=tools/call 时 tool_name 必填
arguments、credential_id、timeout_ms 均可选
mcp-response-preview：
仅 chat Agent 支持
只允许调用该 Agent 当前已挂载且处于 active 的 MCP Server
响应会同时返回 raw_response_payload、去掉 response 包装层后的 business_response_payload 与可选字段目录 field_catalog[]

凭据安全约定

创建/轮换凭据时，客户端提交 secret_payload。
服务端在落库前加密并持久化到 encrypted_secret_json。
响应仅返回凭据元数据，不回传密文或明文 secret。

审计约定

每次 MCP 工具调用都会产生审计记录（success / timeout / error / blocked）。
审计中的请求与响应 payload 会按敏感键进行脱敏后存储。
MCP 审计包含 trace_id 字段，可与 turn 观测记录做确定性关联。

6) Skill Runtime 接口（REST）

Skill 资产：/skills（创建、查询、更新）
Skill 版本：/skills/{skill_id}/versions（创建、查询、更新草稿、发布）
Agent Skill 挂载：/agents/{agent_id}/skill-mounts（CRUD）
Skill 审计：/sessions/{session_id}/skill-calls、/agents/{agent_id}/skill-calls

SkillVersion 发布约定

创建版本时默认 is_released=false（草稿）。
仅草稿版本允许原地修改。
调用 POST /skills/{skill_id}/versions/{version_id}/release 后版本进入已发布状态，不再允许破坏式更新。
对 mcp 后端版本，发布时要求 backend_ref_id 指向的 MCP Server 处于 active，否则返回 409。

Agent 挂载约定

Agent 挂载目标是 skill_version_id（非 Skill 主对象）。
仅允许挂载已发布版本，未发布版本返回 409。
若挂载目标是 mcp 后端版本，则其 MCP Server 必须为 active，否则返回 409。
挂载支持策略字段：is_active、priority、timeout_ms、max_calls_per_turn、tool_allowlist。

Skill 执行与审计约定

编排器默认执行顺序：Skill 先执行，MCP 后执行。
Skill 与 MCP 链路互不阻断；任一链路失败时主回复流程继续。
每次 Skill 执行尝试都会产出审计记录，状态包含：success / timeout / error / blocked。
Skill 审计记录包含 trace_id 字段，用于跨链路问题追踪。

三、Demo Gateway（仅 demo 使用）

项目内 demo-backend 使用 Socket.IO 作为网关，将前端事件转发到 AI Service：

前端 → demo-backend：事件 user_message
demo-backend → 前端：事件 ai_response（token|done|error）

说明：该网关属于 Demo 形态的中转层，契约可能随 Demo 演进调整；对外对接建议优先直接使用 AI Service 的 FastAPI 接口。