RAG (Retrieval-Augmented Generation) Module Design Documentation
Overview
The RAG module in AIKG is currently implemented through VectorStore classes, providing vector-based document retrieval capabilities. Through vector retrieval, it can quickly find similar relevant document content.
Core Features
- Vector Storage: Efficient vector indexing based on FAISS
- Embedding Models: Support for HuggingFace embedding models
- Automatic Document Generation: Automatic generation of retrieval documents from operator metadata
- Multiple Retrieval Methods: Similarity search, Maximum Marginal Relevance search
- Index Management: Support for insert, delete, clear operations
Core Components
VectorStore (Abstract Base Class)
The foundation of the RAG system, providing vector storage and retrieval capabilities.
Key Features:
- Abstract base class for all vector storage implementations
- Singleton pattern for resource efficiency
- HuggingFace embedding model support (default: GanymedeNil/text2vec-large-chinese)
- FAISS-based vector indexing
- Automatic document generation from operator metadata
Specialized Vector Stores
CoderVectorStore
Vector storage specialized for code generation scenarios.
Core Features:
- Focuses on computation-related features: ["op_name", "op_type", "input_specs", "output_specs", "computation"]
- Implements hierarchical search capabilities
- Supports code similarity matching
EvolveVectorStore
Vector storage specialized for evolutionary optimization scenarios.
Core Features:
- Handles schedule-related features: ["base", "pass", "text"]
- Supports multiple schedule aspects for diverse optimization
- Specialized handling of schedule block fields
Embedding Model Support
Model Loading Mechanism
VectorStore supports flexible embedding model loading strategies:
Loading Priority:
- Specified Model: Prioritize loading the HuggingFace model specified in configuration
- Environment Variable: If specified model fails, try loading local model from EMBEDDING_MODEL_PATH environment variable
- Graceful Degradation: If all loading methods fail, automatically disable vector store functionality
Device Configuration
- CPU Mode: Default configuration, suitable for development and testing
- CUDA Mode: Enable GPU acceleration via
embedding_device: "cuda"
Index Management
Automatic Index Building
- Indexes are built automatically from metadata.json files
- Supports recursive directory traversal
- Graceful handling of empty databases
- Persistent storage using FAISS
Usage Guide
Document Storage Structure
Each document and its metadata files are stored in a separate folder:
{doc_path}/
├── metadata.json # Metadata file
└── {document_file} # Document content file
The doc_path parameter points to the folder path containing the document, relative to database_path.
Index Operation Interfaces
insert
Function: Add new documents to vector storage
Parameters:
doc_path: Path of document to insert
Operation Process:
- Load metadata.json from specified path
- Generate document object
- Remove existing identical documents (deduplication)
- Add to vector storage and save index
delete
Function: Remove specified documents from vector storage
Parameters:
doc_path: Path of document to delete
Deletion Process:
- Iterate through existing document IDs
- Match file paths
- Delete matching documents
- Save updated index
clear
Function: Clear all documents from vector storage
Operation Process:
- Delete all document IDs
- Save empty index file
Retrieval Interfaces
similarity_search
Function: Execute semantic search and return matching documents
Parameters:
query: Query stringk: Number of documents to return (default: 5)fetch_k: Number of candidate documents (default: 20, for improving recall)
Returns: List of matching Document objects
max_marginal_relevance_search
Function: Execute Maximum Marginal Relevance search, balancing similarity and diversity
Parameters:
query: Query stringk: Number of documents to return (default: 5)
Returns: List of Document objects reordered by MMR
Features:
lambda_mult=0.2: Extreme diversity settingfetch_k=max(20, 5 * k): Dynamic candidate count
similarity_search_with_score
Function: Execute semantic search and return matching documents with similarity scores
Parameters:
query: Query stringk: Number of documents to return (default: 5)fetch_k: Number of candidate documents (default: 20)
Returns: List of (Document, score) tuples
Document Generation
Automatic Document Creation
VectorStore automatically generates retrieval documents from operator metadata. Subclasses need to implement the gen_document method to define specific document generation logic.
Specialized Document Generation
CoderVectorStore Document Generation
- Extracts computation-related features from metadata
- Builds documents containing operator type, file path, and other information
- Supports feature invariant filtering
EvolveVectorStore Document Generation
- Specialized handling of schedule block fields
- Expands scheduling information into key-value pair format
- Supports feature extraction for multiple schedule aspects
Usage Examples
Basic Vector Search
from ai_kernel_generator.database.coder_vector_store import CoderVectorStore
# Initialize vector store
vector_store = CoderVectorStore(
database_path="/path/to/database",
config=config
)
# Execute similarity search
docs = vector_store.similarity_search(query, k=5)
# Execute MMR search for diversity
docs = vector_store.max_marginal_relevance_search(query, k=5)
Index Management Operations
# Insert new document
vector_store.insert("ascend910b4/triton/md5_hash_123")
# Delete document
vector_store.delete("ascend910b4/triton/md5_hash_123")
# Clear all documents
vector_store.clear()
Performance Optimization
Singleton Pattern
- Prevents resource duplication
- Thread-safe implementation with locking
- Efficient memory usage
FAISS Optimization
- Fast similarity search
- Configurable fetch_k for improved recall
- Configurable lambda_mult for MMR
Error Handling
- Graceful degradation when embedding models fail
- Automatic fallback to local models
- Comprehensive logging
Future Extensions
Potential Extensions
- API Documentation Integration: Support for AscendC API manual
- Multi-Source RAG: Integration with external knowledge sources
- Custom Document Adapters: Support for PDF, markdown, and other formats
- Advanced Fusion Strategies: More sophisticated result combination methods
- Query Expansion: Automatic query enhancement for better retrieval