Question
Answer (19)
以下是settings.yaml的代码,看起来需要配置的东西有点多,或者不支持这个项目?
This config file contains required core defaults that must be set, along with a handful of common optional settings.
For a full list of available settings, see https://microsoft.github.io/graphrag/config/yaml/
LLM settings
There are a number of settings to tune the threading and token limits for LLM calls - check the docs.
models: default_chat_model: type: chat model_provider: openai auth_type: api_key # or azure_managed_identity api_key: $GRAPHRAG_API_KEY} # set this in the generated .env file, or remove if managed identity model model: text-embedding-3-small api_base: https://free.v36.cm/v1/ api_version: dall-e-2 concurrent_requests: 25 async_mode: threaded # or asyncio retry_strategy: exponential_backoff max_retries: 10 tokens_per_minute: null requests_per_minute: null
Input settings
input: storage: type: file # or blob base_dir: "input" file_type: text # [csv, text, json]
chunks: size: 1200 overlap: 100 group_by_columns: [id]
Output/storage settings
If blob storage is specified in the following four sections,
connection_string and container_name must be provided
output: type: file # [file, blob, cosmosdb] base_dir: "output"
cache: type: file # [file, blob, cosmosdb] base_dir: "cache"
reporting: type: file # [file, blob] base_dir: "logs"
vector_store: default_vector_store: type: lancedb db_uri: output\lancedb container_name: default
Workflow settings
embed_text: model_id: default_embedding_model vector_store_id: default_vector_store
extract_graph: model_id: default_chat_model prompt: "prompts/extract_graph.txt" entity_types: [organization,person,geo,event] max_gleanings: 1
summarize_descriptions: model_id: default_chat_model prompt: "prompts/summarize_descriptions.txt" max_length: 500
extract_graph_nlp: text_analyzer: extractor_type: regex_english # [regex_english, syntactic_parser, cfg] async_mode: threaded # or asyncio
cluster_graph: max_cluster_size: 10
extract_claims: enabled: false model_id: default_chat_model prompt: "prompts/extract_claims.txt" description: "Any claims or facts that could be relevant to information discovery." max_gleanings: 1
community_reports: model_id: default_chat_model graph_prompt: "prompts/community_report_graph.txt" text_prompt: "prompts/community_report_text.txt" max_length: 2000 max_input_length: 8000
embed_graph: enabled: false # if true, will generate node2vec embeddings for nodes
umap: enabled: false # if true, will generate UMAP embeddings for nodes (embed_graph must also be enabled)
snapshots: graphml: false embeddings: false
Query settings
The prompt locations are required here, but each search method has a number of optional knobs that can be tuned.
See the config docs: https://microsoft.github.io/graphrag/config/yaml/#query
local_search: chat_model_id: default_chat_model embedding_model_id: default_embedding_model prompt: "prompts/local_search_system_prompt.txt"
global_search: chat_model_id: default_chat_model map_prompt: "prompts/global_search_map_system_prompt.txt" reduce_prompt: "prompts/global_search_reduce_system_prompt.txt" knowledge_prompt: "prompts/global_search_knowledge_system_prompt.txt"
drift_search: chat_model_id: default_chat_model embedding_model_id: default_embedding_model prompt: "prompts/drift_search_system_prompt.txt" reduce_prompt: "prompts/drift_search_reduce_prompt.txt"
basic_search: chat_model_id: default_chat_model embedding_model_id: default_embedding_model prompt: "prompts/basic_search_system_prompt.txt"
使用微软GraphRAG的知识图谱是一个多步骤的过程,你可以通过以下步骤配置它:
-
理解配置文件: settings.yaml文件包含了核心配置和一些可选设置。确保你理解哪些是必须设置的,如模型、API密钥、并发请求限制等。
-
核心配置:
- LLM设置:配置chat和embedding模型,包括模型提供商(如OpenAI)、认证类型(API_key或Azure Managed Identity)、API密钥、API基础地址、版本等。确保
GRAPHRAG_API_KEY已设置或为使用管理身份时已移除。
- LLM设置:配置chat和embedding模型,包括模型提供商(如OpenAI)、认证类型(API_key或Azure Managed Identity)、API密钥、API基础地址、版本等。确保
-
输入设置:
- 选择存储类型(文件或blob),设置基础目录、文件格式和分块大小、重叠和分组列。
-
输出/存储设置:
- 根据需求配置输出文件、缓存、报告和矢量存储类型,提供连接字符串和容器名(如适用)。
-
工作流设置:
- 定义每个任务(如提取图、摘要描述等)使用的模型、提示和参数,如最大收获数量、分析类型等。
-
查询设置:
- 对于不同的搜索方法(如local_search、global_search等),需要设置相应的提示,可调整其他可选参数以优化性能。
-
环境变量:
- 将
GRAPHRAG_API_KEY替换为生成的.env文件中的值,以确保API密钥的安全性。
- 将
-
执行:
- 保存配置文件,然后根据配置启动GraphRAG应用,开始使用知识图谱进行查询和分析。
确保在配置过程中根据项目的实际需求调整参数,并参考官方文档 <a href="https://microsoft.github.io/graphrag/config/yaml/">https://microsoft.github.io/graphrag/config/yaml/</a> 获取详细信息。