| feat(agent_teams): add OpenTelemetry observability subsystem
Bridge OTel TracerProvider to three existing injection points without
modifying any observed code:
- AsyncCallbackFramework: LLM invoke/stream/error, tool start/finish/error,
agent invoke (~80% of observable surface). First-token latency captured
on the first stream chunk; reasoning_content split into a child span.
- TeamAgent.add_event_listener: team / member / task / message events
attached to a long-lived team root span, with task spans for terminal
task events.
- DeepAgent rails: a minimal ObservabilityRail covers only the outer
task-loop iteration boundary; the other 8 hooks are intentionally
no-ops because the Callback handlers already cover them.
Attribute keys follow OpenLLMetry semantic conventions (gen_ai.*) plus
agentteam.* / deepagent.* namespaces. Prompt and completion bodies are
preserved by default; redaction is opt-in via ObservabilityConfig.
Exporter is selectable (otlp_grpc / otlp_http / console) and can be
disabled wholesale.
Tests: 9 pytest cases under tests/unit_tests/agent_teams/observability/
exercise the streaming TTFT path, reasoning child span, tool nesting,
error propagation, monitor event dispatch, rail iteration spans,
redaction toggle, and the disabled no-op path.
Deployment: docker-compose stack (OTel Collector + Langfuse) under
deploy/observability/ for local end-to-end verification, plus a
matching example entry point at
examples/agent_teams/agent_team_observability_e2e.py.
Refs: #751
| 24 天前 |