Search Agent User Guide
Table of contents
- Overview
- Key features
- Architecture
- Workflow
- Configuration reference
- Quick start
- BrowseComp-Plus Milvus indexing
Overview
The Search Agent (DeepSearchAgent) is an intelligent search agent built on the openJiuwen framework. It supports deep search and multi-step reasoning: it handles complex questions, calls tools over several steps, and returns accurate answers. It uses a state-space search model (maintain and expand states to reach an answer) and async concurrency so multiple reasoning branches can be explored in parallel, improving throughput.
Key features
Main theme: logical reasoning
- Entity awareness: who / what / how things relate in the question
- Tree-like reasoning: multiple angles, similar to human deliberation
- Parallel exploration: try several solution paths at once
1. Entity-awareness engine
- Dynamic entity graph: detect key entities and how they connect
- State trajectory: per-entity history of state changes for traceable reasoning
2. Tree-like reasoning network
- Branch management: decompose hard questions into a branching tree; branches can be explored concurrently
- Pruning: deprioritize or drop low-value branches and focus capacity on promising ones
3. Intelligent action exploration
- Breadth vs depth: avoid collapsing too early to a local optimum; keep diversity of approaches
- Score-guided sampling: weighted random sampling favors high-confidence actions while preserving exploration
- Async concurrent execution:
asyncioschedules manystate_creationtasks in parallel;ActionPoolis used from a single coroutine context and is not thread-safe
4. Other capabilities
- Stateful reasoning: multi-step reasoning driven by a state machine
- Tooling: web search, retrieval, extraction-style tools, etc.
- Configurability: tunable search and runtime parameters
Architecture
Core building blocks
-
Three workflows
init_state: parse the question and build the initial search statefind_action: propose feasible search actions from the current statestate_creation: run an action, call tools, and expand the state space
-
State management
State: one node in the search graph (variables, depth, id, evidence ids, etc.)Action: one executable search step (question, state, proposal metadata)ActionPool: pool of pending actions for sampling in the async loop (not thread-safe; sampling weights are influenced bySearchWorkflowConfig.action_sampling)
-
Tools
WebSearch: query a search engineWebFetch: fetch and interpret page contentRetrieve: dense/sparse/hybrid vector retrieval over a knowledge base
Workflow shape
In search mode the agent roughly follows:
- Initialize state — entity-ish structure and initial variables
- Discover actions — candidate branches
- Execute actions — concurrent tool use
- Answer — emit a final answer when criteria are met
┌─────────────┐
│ init_state │ → initial state (entities / variables)
└─────────────┘
↓
┌─────────────┐
│ find_action │ → candidate actions from current state
└─────────────┘
↓
┌─────────────┐
│state_creation│ → run actions, extend state (tools / validation)
└─────────────┘
↓
(loop until answer or limits)
Workflow
1. Initialization
# 1. Initialize state
init_state = await Runner.run_workflow("init_state_1", {
"query": query,
"config": init_config,
...
})
# 2. Initial actions
actions = await Runner.run_workflow("find_action_1", {
"state": init_state,
"query": query,
...
})
# 3. Enqueue actions
action_pool.add(actions)
2. Search loop
while not final_answer and time < time_limit:
# 1. Sample from the pool
sampled_actions = action_pool.sample(available_slots)
# 2. Run state_creation concurrently
for action in sampled_actions:
task = asyncio.create_task(run_state_creation_workflow(action, tool_map, semaphore))
running_tasks.add(task)
# 3. Wait for completions
done, _ = await asyncio.wait(running_tasks, ...)
# 4. Handle results
for task in done:
result = await task
if result.found_answer:
return result
# 5. Expand with new actions
for new_state in result.new_states:
new_actions = await Runner.run_workflow("find_action_1", {
"state": new_state,
"query": query,
...
})
action_pool.add(new_actions)
3. Action sampling
ServiceConfig.search_workflow.action_sampling (ActionSamplingConfig) works together with ActionPool:
| Field | Type | Default | Description |
|---|---|---|---|
depth_weight |
bool | True | Enable depth penalty / shallow bonus (same thresholds as in action_pool.py) |
promote_unique_states |
bool | False | Down-weight duplicate state hashes in the pool to encourage diversity |
random_sample |
bool | False | If True, draw actions uniformly at random and skip weighted sampling |
ActionPool scoring behavior includes:
- Score blend:
action.proposal.scoreplus per-variablecandidate_strength - Depth shaping: penalty when depth > 5, bonus when depth < 2
- Unique-state promotion: when
promote_unique_statesis True, down-weight repeated state hashes - Random mode: when
random_sampleis True, use uniform random sampling
Configuration reference
All types live in openjiuwen_deepsearch/config/config.py. There are two top-level buckets:
AgentConfig: user-facing knobs (LLM, keys, per-question search limits, Milvus for retrieve, etc.)ServiceConfig: deployment/runtime defaults (workflow timeouts,search_workflow, telemetry-style flags, etc.)
Part 1 — AgentConfig
Note: Callers normally supply AgentConfig. The search subgraph wiring (init / find / state-creation agents) is configured under ServiceConfig.search_workflow — see Part 2.
Search-related pieces on AgentConfig:
search_workflow_per_question_params—PerQuestionParams(tool_map,time_limit, …)search_workflow_milvus_config—MilvusConfigwhen usingretrieve
1. PerQuestionParams (AgentConfig.search_workflow_per_question_params)
Per-question (one search episode) limits.
| Field | Type | Default | Description |
|---|---|---|---|
max_workers |
int | 5 | Max concurrent coroutines running actions (tune for APIs / hardware) |
retry_count_on_empty_action_space |
int | 3 | When the pool is empty and no workers are busy, re-run find_action at most this many times |
time_limit |
int | 4800 | Wall-clock limit per question (seconds); default 80 minutes |
tool_map |
Literal["search_fetch", "retrieve"] | "search_fetch" |
Which tool stack to use |
actions_explored_limit |
int | 200 | Each completed state_creation increments the counter; if > 0 and the counter reaches this value, search stops. Increase for longer runs |
fail_limit |
int | 0 | Max consecutive failures; 0 means no cap |
answer_mode_top_k |
int | 1 | How many candidate answers to collect before picking the best; <= 1 returns on first answer |
provide_best_guess |
bool | False | On timeout without a confirmed answer, whether to return the best guess by candidate_strength |
tool_map modes
-
search_fetch— usesWebSearchandWebFetch.- Good for: live web, fresh pages, open-domain QA.
- Requires:
serper_api_key,jina_api_key(onAgentConfig) for the bundled integrations.
-
retrieve— usesRetrieve(vector store).- Good for: KB QA, closed corpora, precise chunk retrieval.
- Requires: a built index (see BrowseComp-Plus Milvus indexing) and
AgentConfig.search_workflow_milvus_configwithmilvus_host,milvus_port,database_name,collection_name,embedder_model_name,embedder_api_key,embedder_base_url,embedder_timeout. These align with index-script env vars such asMILVUS_URI,MILVUS_DB_NAME,EMBED_*; the indexer favors environment variables, while retrieve at runtime readsMilvusConfigonAgentConfig.
2. MilvusConfig (AgentConfig.search_workflow_milvus_config, retrieve mode)
When tool_map == "retrieve", point the agent at Milvus + the embedding HTTP API.
| Field | Type | Default | Description |
|---|---|---|---|
milvus_host |
str | "localhost" |
Milvus host |
milvus_port |
int | 19530 |
Milvus port |
database_name |
str | "deepsearch_benchmarks" |
DB name |
collection_name |
str | "browsecompplus_with_bm25" |
Collection name |
embedder_model_name |
str | "" |
Embedding model id; must match the model used when the index was built |
embedder_api_key |
bytearray | empty | Required for retrieve; empty breaks tool construction |
embedder_base_url |
str | "" |
Embeddings URL, e.g. http://localhost:11450/v1/embeddings |
embedder_timeout |
int | 100 |
HTTP timeout (seconds) |
Retriever mode
dense: vector similarity onlysparse: lexical / BM25-stylehybrid: fuse dense + sparse (recommended default in config)
Retrieval + merge behavior
- Initial retrieval — fetch up to
top_k * top_k_multiply_factorchunks for the chosenmode. - Merge — chunks from the same document (e.g. BrowseComp-Plus) merge into one “document block”; if a non-first chunk hits, the first chunk may be backfilled for context.
- Final ranking — merged blocks sort by their max chunk score; top
top_kblocks go to the agent.
You may change defaults in config.py or merge overrides from Config().agent_config.model_dump() before AgentConfig.model_validate.
Part 2 — ServiceConfig
Note: ServiceConfig is aimed at operators / SDK defaults: workflow timeouts, per-node retries, stats switches, and especially search_workflow.
The search workflow is ServiceConfig.search_workflow: type SearchWorkflowConfig, containing init / find / state-creation agent configs.
SearchWorkflowConfig (ServiceConfig.search_workflow)
SearchWorkflowConfig bundles action sampling plus three agent configs:
| Field | Type | Description |
|---|---|---|
action_sampling |
ActionSamplingConfig | Depth weighting, unique-state down-weighting, random sampling (see Action sampling) |
init_state_agent |
InitStateAgentConfig | Initial state LLM subgraph |
find_action_agent |
FindActionAgentConfig | Action proposal subgraph |
state_creation_agent |
StateCreationAgentConfig | Tool execution + expansion; includes retrieval_settings, validator_agent, … |
For each sub-agent, llm_config is a Dict[Literal["general", "plan_understanding", "info_collecting", "writing_checking"], LLMConfig] — the same idea as top-level AgentConfig.llm_config, but without vlm_chart_generating.
InitStateAgentConfig (search_workflow.init_state_agent)
| Field | Type | Default | Description |
|---|---|---|---|
max_tries |
int | 10 | Max retries for init |
llm_config |
see above | {} |
Per-category LLM configs |
FindActionAgentConfig (search_workflow.find_action_agent)
| Field | Type | Default | Description |
|---|---|---|---|
llm_config |
see above | {} |
Per-category LLM configs |
action_proposals_limit |
int | 5 | Max proposed actions per find_action call |
action_pool_depleted_strategy |
Literal["simple_retry", "dependent_retry"] | "dependent_retry" |
When the pool is empty: simple_retry re-runs find_action with minimal context; dependent_retry adds context about explored directions |
ValidatorAgentConfig (state_creation_agent.validator_agent)
| Field | Type | Default | Description |
|---|---|---|---|
validate_new_states |
bool | False | Validate newly created states |
validate_answer |
bool | False | Validate final answers |
llm_config |
see above | {} |
Per-category LLM configs |
RetrievalSettingsConfig (state_creation_agent.retrieval_settings)
| Field | Type | Default | Description |
|---|---|---|---|
retrieval_prompt |
Literal["retrieve", "retrieve_given_multihop_query"] | "retrieve" |
Prompt template for LLM-generated retrieval queries. retrieve: simple keyword-style queries (BrowseComp+ default). retrieve_given_multihop_query: multi-hop style (GEAR-like; less validated). |
top_k |
int | 3 | Final number of document blocks returned to the agent |
top_k_multiply_factor |
int | 5 | Initial pool size multiplier: top_k × top_k_multiply_factor candidates before merge/rerank |
add_instruction |
bool | True | Append extra instructions when the LLM drafts retrieval queries; often helps with instruction-aware embedding backends |
mode |
Literal["dense", "sparse", "hybrid"] | "hybrid" |
dense / sparse / hybrid retrieval |
StateCreationAgentConfig (search_workflow.state_creation_agent)
| Field | Type | Default | Description |
|---|---|---|---|
log_fetch |
bool | False | Log retrieval / fetch traffic |
log_search |
bool | False | Log search traffic |
web_fetch_log_file |
str | "gnosis/tool_log/web_fetch_log.jsonl" |
Fetch / retrieval log path |
web_search_log_file |
str | "gnosis/tool_log/web_search_log.jsonl" |
Web search log path |
use_candidate_strength |
bool | True | Feed candidate_strength into action scoring |
discovered_clues_mode |
Literal["report", "blacklist"] | "blacklist" |
report: surface clues; blacklist: suppress repeats |
max_llm_calls_per_run |
int | 100 | Cap LLM calls inside one state_creation run |
context_limit_reached_strategy |
Literal["fail", "reduced_retrieval_request", "delete_tool_responses", "delete_tool_input_and_responses"] | "reduced_retrieval_request" |
When context overflows during run-action: reduced_retrieval_request halves top_k / top_k_multiply_factor and retries (retrieve path only) |
llm_config |
see above | {} |
Per-category LLM configs |
retrieval_settings |
RetrievalSettingsConfig | factory defaults | Retrieval behavior only (no Milvus host — that stays on AgentConfig.search_workflow_milvus_config) |
validator_agent |
ValidatorAgentConfig | factory defaults | Validator subgraph |
Common fields on LLMConfig include model_name, model_type, base_url, api_key, hyper_parameters, timeout, max_tries, append_think_tags_to_messages — see LLMConfig in openjiuwen_deepsearch/config/config.py.
| Field | Type | Default | Description |
|---|---|---|---|
model_name |
str | "" |
Model id (required) |
model_type |
Literal["openai", "siliconflow"] | "openai" |
Backend flavor (matches main.py CLI) |
base_url |
str | "" |
HTTP base URL |
api_key |
bytearray | empty | API key (required) |
hyper_parameters |
dict | {} |
e.g. temperature, max_tokens |
extension |
dict | {} |
Vendor-specific extensions |
timeout |
int | 600 | HTTP timeout (seconds) |
max_tries |
int | 4 | Max retries per call |
append_think_tags_to_messages |
bool | False | Append “think” tags to messages |
Other ServiceConfig fields
See the full ServiceConfig model in openjiuwen_deepsearch/config/config.py for workflow timeouts, collector limits, tracer concurrency, and debug flags.
Quick start
Install
# From repository root
pip install .
Tests and coverage (search mode)
# Default: skip real LLM / network tests (CI-friendly)
pytest -q tests/search_agent -m "not llm"
# All search_agent tests (llm tests need keys + explicit flag)
export RUN_LLM_TESTS=1
pytest -q tests/search_agent
# Mock orchestration / integration only
pytest -q tests/search_agent -m integration
# Real LLM + search_fetch (needs OPENROUTER_API_KEY, JINA_API_KEY, SERPER_API_KEY)
RUN_LLM_TESTS=1 pytest -q tests/search_agent -m llm
# Qwen-only branch (benchmarking/qwen_config.json + small_qwen); fixed log dir
export LLM_E2E_LOG_DIR="$PWD/tmp_llm_e2e_logs"
export RUN_LLM_TESTS=1
pytest -q tests/search_agent/test_llm_search_fetch_e2e.py -m llm -k "small_qwen"
# Coverage (CI-friendly: excludes llm)
pytest -q tests/search_agent -m "not llm" \
--cov=openjiuwen_deepsearch \
--cov-report=term-missing \
--cov-fail-under=40
Notes:
pytest-covis in the dev dependency group.- Start
--cov-fail-underlow (e.g.40) and raise it as tests grow. - Markers:
unit,integration,llm.llmtests requireRUN_LLM_TESTS=1and keys; otherwise they skip. - Optional
LLM_E2E_LOG_DIR: when set,test_llm_search_fetch_e2ewrites under<dir>/<profile>/(e.g.gpt_mini_stack,small_qwen); otherwise pytest’stmp_path. -k small_qwenfilters the Qwen profile (qwen_config.json/small_qwen).
Environment variables
Typical keys (your app usually reads AgentConfig / CLI; env vars help locally and in CI):
# Generic OpenAI-compatible gateway
export OPENAI_API_KEY="your_openai_api_key"
# search_fetch: Jina + Serper (same as main.py --jina_api_key / --serper_api_key)
export JINA_API_KEY="your_jina_api_key"
export SERPER_API_KEY="your_serper_api_key"
# tests/search_agent @pytest.mark.llm e2e (e.g. test_llm_search_fetch_e2e) expect:
# RUN_LLM_TESTS=1 and OPENROUTER_API_KEY, JINA_API_KEY, SERPER_API_KEY
# retrieve: embedding secret belongs in AgentConfig.search_workflow_milvus_config.embedder_api_key;
# if you only have EMBEDDER_API_KEY in the shell, map it when building agent_config
export EMBEDDER_API_KEY="your_embedder_api_key"
Programmatic usage
import asyncio
from openjiuwen_deepsearch.framework.openjiuwen.agent.agent_factory import AgentFactory
from openjiuwen_deepsearch.config.config import Config, AgentConfig
async def main():
agent_factory = AgentFactory()
agent_config = Config().agent_config.model_dump()
agent_config["search_mode"] = "search"
# ... merge overrides (keys, milvus, service_config["search_workflow"], etc.)
candidate_config = AgentConfig.model_validate(agent_config)
agent_config = candidate_config.model_dump()
agent = agent_factory.create_agent(agent_config)
query = "..."
async for chunk in agent.run(
message=query,
conversation_id="test_session_001",
agent_config=agent_config,
):
print(chunk)
asyncio.run(main())
CLI example
Two ways to tune defaults:
- Edit defaults in
openjiuwen_deepsearch/config/config.py. - Use
main.py: it starts fromConfig().agent_config.model_dump()and overlays CLI flags.
Retrieve mode: build an index first (below), then fill AgentConfig.search_workflow_milvus_config (MilvusConfig). Do not put Milvus connection strings in RetrievalSettingsConfig (that type only holds retrieval behavior like top_k and mode).
Search mode CLI: main.py requires full LLM flags. For search_fetch, --jina_api_key and --serper_api_key are mandatory. Pass the question with --query (multiple tokens are joined with spaces).
python -m main \
--mode query \
--search_mode search \
--tool_map search_fetch \
--query "Your question here" \
--llm_model_name "your-model" \
--llm_model_type openai \
--llm_base_url "https://api.example.com/v1" \
--llm_api_key "your-llm-key" \
--jina_api_key "your-jina-key" \
--serper_api_key "your-serper-key"
This repository does not ship scripts_will_be_deleted_later.run_8_queries or similar batch benchmark drivers by default; add your own if needed.
BrowseComp-Plus Milvus indexing
This repo includes tooling to prepare the BrowseComp-Plus benchmark corpus and index it into Milvus for deep-research-style evaluation: download / decrypt from Hugging Face, chunk long pages, and build a hybrid (dense + BM25) searchable collection.
1. Overview
Steps at a glance:
- Decrypt — fetch and de-obfuscate the Hugging Face dataset
- Process — chunk long HTML/text with
TokenizerChunker - Index — upsert dense vectors + BM25 sparse features into Milvus
Quick path
- Run Milvus per the official guide: Milvus standalone (Docker).
- Run an OpenAI-compatible embeddings HTTP service compatible with
OpenJiuwenAPIEmbedder(seeopenjiuwen_deepsearch/algorithm/search_tools/retrieval/embedder.py). - Configure
openjiuwen_deepsearch/algorithm/search_index/create_browsecompplus_index.pyvia environment variables or module-level constants (the script prefers_env(...)). Before indexing, setEMBED_API_URL,EMBED_API_KEY,EMBED_MODEL_NAME, andHUGGINGFACE_MODEL_NAME(no in-code defaults for the model / tokenizer ids). - Run:
uv add datasets transformers
# Run any separate download / decrypt scripts first if your fork provides them
uv run -m openjiuwen_deepsearch.algorithm.search_index.create_browsecompplus_index
The JSONL should already be decrypted (e.g. browsecompplus.jsonl) and match DATA_LOCATION.
Dataset record shape
Each example typically includes:
query_id— stable idquery— hard reasoning questionanswer— gold stringevidence_docs— supporting docs (docid,text,url, …)gold_docs— docs that contain the answernegative_docs— hard negatives for retrieval evaluation
2. Dataset notes
BrowseComp-Plus isolates retriever vs LLM agent effects for deep-research stacks, using a fixed ~100k web-document slice. The public drop is obfuscated to reduce leakage.
3. Install (indexer)
uv add datasets transformers pymilvus requests tqdm
You also need:
- A running Milvus (install doc)
- A reachable OpenAI-compatible embedding server (model id must match
EMBED_MODEL_NAME)
4. Index into Milvus
After preparing decrypted JSONL, set DATA_LOCATION, EMBED_API_URL, EMBED_API_KEY, and MILVUS_* as in the table below, then:
uv run -m openjiuwen_deepsearch.algorithm.search_index.create_browsecompplus_index
The script:
- Connects to Milvus (default
MILVUS_URI=http://localhost:19530, plusMILVUS_TOKEN, etc.) and selects / creates the database - Builds
OpenJiuwenAPIEmbedder(api_url,api_token,timeout) - Loads
DATA_LOCATIONJSONL - Chunks with
BrowseCompChunker(TokenizerChunker, up to 2048 tokens per chunk) - Embeds in batches and writes dense + BM25 sparse rows into
MILVUS_COLLECTION_NAME
5. Configuration table
Variables in create_browsecompplus_index.py are read with _env("KEY", default); if the env var is unset, the in-file default applies (EMBED_MODEL_NAME and HUGGINGFACE_MODEL_NAME default to empty and must be set).
| Variable (env name) | Meaning | Typical in-code default |
|---|---|---|
DATA_LOCATION |
Path to decrypted JSONL | browsecompplus.jsonl |
MILVUS_URI |
Milvus HTTP URI | http://localhost:19530 |
MILVUS_TOKEN |
Auth token | root:Milvus |
MILVUS_DB_NAME |
Database name | deepsearch_benchmarks |
MILVUS_COLLECTION_NAME |
Collection name | browsecompplus_with_bm25 |
HUGGINGFACE_MODEL_NAME |
HF tokenizer id for chunking | Qwen/Qwen3-Embedding-8B |
EMBED_MODEL_NAME |
Embedding model id passed to the API | empty (must set) |
EMBED_API_URL |
Embeddings HTTP URL | empty (must set or startup fails) |
EMBED_API_KEY |
Embeddings API key | empty (must set or startup fails) |
EMBED_TIMEOUT |
HTTP timeout (seconds) | 60 |
BATCH_SIZE |
Docs per batch | 10 |
INDEX_MAX_RECORDS |
Index only first N rows; 0 = all |
0 |
6. Main-block flow (if __name__ == "__main__")
Aligned with the current script:
- Milvus client —
MilvusClient(uri=MILVUS_URI, token=MILVUS_TOKEN, database=...) - Database — create
MILVUS_DB_NAMEif missing;using_database - Embedder — require
EMBED_API_URL,EMBED_API_KEY,EMBED_MODEL_NAME, andHUGGINGFACE_MODEL_NAME;OpenJiuwenAPIEmbedder(...); timeout falls back to 60s ifEMBED_TIMEOUT ≤ 0; tokenizer fromHUGGINGFACE_MODEL_NAMEforBrowseCompChunker - Load JSONL —
read_jsonl, builddoc_id2doc, query-id maps, … - Schema —
setup_milvus_collection(): PKid, denseembedding,content, sparsecontent_sparse, BM25 function,AUTOINDEX+ COSINE for dense,SPARSE_INVERTED_INDEXfor sparse - Index —
BrowseCompChunker+index_documents_milvus()callingencoder_model.encode()
7. Instruction tuning
OpenJiuwenAPIEmbedder can prepend a fixed task instruction on the query path when encode(..., is_query=True) is used. Many instruction-tuned embedding models benefit from this pattern.
- Query time — prepend a task instruction, e.g. “Given a web search query, retrieve relevant passages that answer the query”.
- Indexing time — do not prepend instructions; encode raw document text for stable vectors.
The embedder’s encode(..., is_query=True|False) flag controls this: True adds the task instruction; False is for indexing.
8. Milvus schema (summary)
Matches setup_milvus_collection() in create_browsecompplus_index.py:
id— string PK ({docid}__{chunk_idx})embedding— dense vector; dimension must match the embedding model used at index time; COSINEcontent— raw chunk text (BM25 analyzer input)content_sparse— Milvus BM25 sparse vectordocid— source document idtitle,authors,datetime— parsed metadatagold_query_id,evidence_query_id— related query id lists
This guide is the English counterpart to README_search.md. When in doubt, prefer the version that matches your checkout date; both should track openjiuwen_deepsearch/config/config.py and main.py.