LLM Integration

1. Overview

The LLM module provides a unified interface for large language model access, supporting two API protocols:

OpenAI-compatible protocol (default) — For most LLM providers
Anthropic protocol — Only for Kimi Coding Plan (api.kimi.com/coding)

Key components:

LLMProvider — OpenAI-compatible API provider
AnthropicProvider — Anthropic protocol API provider (Kimi Coding Plan)
LLMClient — High-level client with token counting and streaming
Factory functions — create_llm_client() and create_embedding_model()
OpenAICompatibleEmbeddings — Embedding model for RAG

2. Supported Providers

OpenAI-Compatible Protocol (`provider_type="openai"`, default)

Provider	Examples
OpenAI	GPT-4o, o3, o4-mini
DeepSeek	deepseek-chat, deepseek-reasoner
Claude	claude-3-5-sonnet (via OpenAI-compatible layer)
GLM (Zhipu)	GLM-4 series
Moonshot / Kimi	kimi-k2, kimi-k2.5 (standard API)
Qwen / DashScope	qwen-plus, qwq
Doubao / Volcengine	doubao-seed
SiliconFlow	Various models via SiliconFlow
vLLM	Local deployment
Ollama	Local models

Anthropic Protocol (`provider_type="anthropic"`)

Provider	Example Models	Notes
Kimi Coding Plan	kimi-for-coding	`https://api.kimi.com/coding` endpoint uses Anthropic protocol

Note: Kimi's standard API (e.g., kimi-k2) uses OpenAI-compatible protocol. Only Kimi Coding Plan (api.kimi.com/coding) requires provider_type="anthropic".

See settings.example.more.json for provider-specific configuration examples including thinking/reasoning parameters.

3. Provider Selection

Select API protocol via provider_type parameter:

from akg_agents.core_v2.llm import create_llm_client

# OpenAI-compatible protocol (default, for most providers)
client = create_llm_client(
    model_name="deepseek-chat",
    base_url="https://api.deepseek.com/beta/",
    api_key="your-key",
    provider_type="openai"  # optional, default value
)

# Kimi Coding Plan (requires Anthropic protocol)
client = create_llm_client(
    model_name="kimi-for-coding",
    base_url="https://api.kimi.com/coding",
    api_key="your-key",
    provider_type="anthropic"  # must be specified
)

Or via config file/environment variables:

{
  "models": {
    "standard": {
      "base_url": "https://api.kimi.com/coding",
      "api_key": "sk-kimi-xxx",
      "model_name": "kimi-for-coding",
      "provider_type": "anthropic"
    }
  }
}

export AKG_AGENTS_PROVIDER_TYPE="anthropic"

4. LLMProvider (OpenAI-Compatible Protocol)

LLMProvider is the low-level API client based on AsyncOpenAI.

provider = LLMProvider(
    model_name="deepseek-reasoner",
    base_url="https://api.deepseek.com/beta/",
    api_key="your-api-key",
    extra_body={"thinking": {"type": "enabled"}}  # Provider-specific params, passed through to API
)

# Non-streaming
result = await provider.generate(messages, temperature=0.2)

# Streaming
async for chunk in provider.generate_stream(messages, temperature=0.2):
    print(chunk)

The extra_body parameter is passed directly to the API request body, allowing you to configure provider-specific features like thinking/reasoning. Different providers use different parameter formats — see settings.example.more.json for examples.

4. LLMClient

LLMClient wraps LLMProvider with additional features:

Token counting: Tracks total tokens used
Streaming UI: Sends streaming output to UI via session_id
Reasoning content: Automatically handles reasoning_content from thinking models

from akg_agents.core_v2.llm import create_llm_client

client = create_llm_client(model_level="standard", session_id="my_session")

result = await client.generate(
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hello!"}
    ],
    stream=True,
    agent_name="MyAgent"
)

content = result["content"]
reasoning = result.get("reasoning_content", "")

Constructor Parameters

Parameter	Type	Default	Description
`provider`	LLMProvider	—	LLM provider instance
`session_id`	string	`None`	UI session ID for streaming
`temperature`	float	`0.2`	Sampling temperature
`max_tokens`	int	`8192`	Maximum output tokens
`top_p`	float	`0.9`	Top-p sampling

5. Factory Functions

create_llm_client

Create an LLMClient from configuration.

from akg_agents.core_v2.llm import create_llm_client

# From config level
client = create_llm_client(model_level="complex")
client = create_llm_client(model_level="standard", session_id="xxx")
client = create_llm_client(model_level="fast")

# Custom level (defined in settings.json)
client = create_llm_client(model_level="coder")

# Direct parameters (override config)
client = create_llm_client(
    model_name="gpt-4",
    base_url="https://api.openai.com/v1",
    api_key="your-key",
    temperature=0.5
)

create_embedding_model

Create an embedding model for RAG.

from akg_agents.core_v2.llm import create_embedding_model

embedding = create_embedding_model()
# Uses embedding config from settings.json

6. OpenAICompatibleEmbeddings

A LangChain-compatible embedding model that works with any OpenAI-compatible embedding API.

from akg_agents.core_v2.llm import OpenAICompatibleEmbeddings

embeddings = OpenAICompatibleEmbeddings(
    base_url="https://api.siliconflow.cn/v1",
    api_key="your-key",
    model_name="BAAI/bge-large-zh-v1.5"
)

vectors = embeddings.embed_documents(["Hello", "World"])
query_vector = embeddings.embed_query("Hello")