LLM Integration
1. Overview
The LLM module provides a unified interface for large language model access, supporting two API protocols:
- OpenAI-compatible protocol (default) — For most LLM providers
- Anthropic protocol — Only for Kimi Coding Plan (api.kimi.com/coding)
Key components:
- LLMProvider — OpenAI-compatible API provider
- AnthropicProvider — Anthropic protocol API provider (Kimi Coding Plan)
- LLMClient — High-level client with token counting and streaming
- Factory functions —
create_llm_client()andcreate_embedding_model() - OpenAICompatibleEmbeddings — Embedding model for RAG
2. Supported Providers
OpenAI-Compatible Protocol (provider_type="openai", default)
| Provider | Examples |
|---|---|
| OpenAI | GPT-4o, o3, o4-mini |
| DeepSeek | deepseek-chat, deepseek-reasoner |
| Claude | claude-3-5-sonnet (via OpenAI-compatible layer) |
| GLM (Zhipu) | GLM-4 series |
| Moonshot / Kimi | kimi-k2, kimi-k2.5 (standard API) |
| Qwen / DashScope | qwen-plus, qwq |
| Doubao / Volcengine | doubao-seed |
| SiliconFlow | Various models via SiliconFlow |
| vLLM | Local deployment |
| Ollama | Local models |
Anthropic Protocol (provider_type="anthropic")
| Provider | Example Models | Notes |
|---|---|---|
| Kimi Coding Plan | kimi-for-coding | https://api.kimi.com/coding endpoint uses Anthropic protocol |
Note: Kimi's standard API (e.g., kimi-k2) uses OpenAI-compatible protocol. Only Kimi Coding Plan (api.kimi.com/coding) requires
provider_type="anthropic".
See
settings.example.more.jsonfor provider-specific configuration examples including thinking/reasoning parameters.
3. Provider Selection
Select API protocol via provider_type parameter:
from akg_agents.core_v2.llm import create_llm_client
# OpenAI-compatible protocol (default, for most providers)
client = create_llm_client(
model_name="deepseek-chat",
base_url="https://api.deepseek.com/beta/",
api_key="your-key",
provider_type="openai" # optional, default value
)
# Kimi Coding Plan (requires Anthropic protocol)
client = create_llm_client(
model_name="kimi-for-coding",
base_url="https://api.kimi.com/coding",
api_key="your-key",
provider_type="anthropic" # must be specified
)
Or via config file/environment variables:
{
"models": {
"standard": {
"base_url": "https://api.kimi.com/coding",
"api_key": "sk-kimi-xxx",
"model_name": "kimi-for-coding",
"provider_type": "anthropic"
}
}
}
export AKG_AGENTS_PROVIDER_TYPE="anthropic"
4. LLMProvider (OpenAI-Compatible Protocol)
LLMProvider is the low-level API client based on AsyncOpenAI.
provider = LLMProvider(
model_name="deepseek-reasoner",
base_url="https://api.deepseek.com/beta/",
api_key="your-api-key",
extra_body={"thinking": {"type": "enabled"}} # Provider-specific params, passed through to API
)
# Non-streaming
result = await provider.generate(messages, temperature=0.2)
# Streaming
async for chunk in provider.generate_stream(messages, temperature=0.2):
print(chunk)
The extra_body parameter is passed directly to the API request body, allowing you to configure provider-specific features like thinking/reasoning. Different providers use different parameter formats — see settings.example.more.json for examples.
4. LLMClient
LLMClient wraps LLMProvider with additional features:
- Token counting: Tracks total tokens used
- Streaming UI: Sends streaming output to UI via session_id
- Reasoning content: Automatically handles
reasoning_contentfrom thinking models
from akg_agents.core_v2.llm import create_llm_client
client = create_llm_client(model_level="standard", session_id="my_session")
result = await client.generate(
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Hello!"}
],
stream=True,
agent_name="MyAgent"
)
content = result["content"]
reasoning = result.get("reasoning_content", "")
Constructor Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
provider |
LLMProvider | — | LLM provider instance |
session_id |
string | None |
UI session ID for streaming |
temperature |
float | 0.2 |
Sampling temperature |
max_tokens |
int | 8192 |
Maximum output tokens |
top_p |
float | 0.9 |
Top-p sampling |
5. Factory Functions
create_llm_client
Create an LLMClient from configuration.
from akg_agents.core_v2.llm import create_llm_client
# From config level
client = create_llm_client(model_level="complex")
client = create_llm_client(model_level="standard", session_id="xxx")
client = create_llm_client(model_level="fast")
# Custom level (defined in settings.json)
client = create_llm_client(model_level="coder")
# Direct parameters (override config)
client = create_llm_client(
model_name="gpt-4",
base_url="https://api.openai.com/v1",
api_key="your-key",
temperature=0.5
)
create_embedding_model
Create an embedding model for RAG.
from akg_agents.core_v2.llm import create_embedding_model
embedding = create_embedding_model()
# Uses embedding config from settings.json
6. OpenAICompatibleEmbeddings
A LangChain-compatible embedding model that works with any OpenAI-compatible embedding API.
from akg_agents.core_v2.llm import OpenAICompatibleEmbeddings
embeddings = OpenAICompatibleEmbeddings(
base_url="https://api.siliconflow.cn/v1",
api_key="your-key",
model_name="BAAI/bge-large-zh-v1.5"
)
vectors = embeddings.embed_documents(["Hello", "World"])
query_vector = embeddings.embed_query("Hello")