Skip to content

The ai-search plugin enhances the accuracy and timeliness of AI model responses by integrating real-time results from search engines (Google/Bing/Arxiv/Elasticsearch etc.). The plugin automatically injects search results into the prompt template and determines whether to add reference sources in the final response based on configuration.

Plugin execution stage: Default stage Plugin execution priority: 440

NameData TypeRequirementDefault ValueDescription
defaultEnableboolOptionaltrueWhether the plugin functionality is enabled by default. When set to false, the plugin will only be activated when the request contains a web_search_options field
needReferenceboolOptionalfalseWhether to add reference sources in the response
referenceFormatstringOptional"**References:**\n%s"Reference content format, must include %s placeholder
referenceLocationstringOptional”head”Reference position: “head” at the beginning of the response, “tail” at the end of the response
defaultLangstringOptional-Default search language code (e.g. zh-CN/en-US)
promptTemplatestringOptionalBuilt-in templatePrompt template, must include {search_results} and {question} placeholders
searchFromarray of objectRequired-Refer to search engine configuration below, at least one engine must be configured
searchRewriteobjectOptional-Search rewrite configuration, used to optimize search queries using an LLM service

The search rewrite feature uses an LLM service to analyze and optimize the user’s original query, which can:

  1. Determine whether the user’s question requires a search engine query. If it does not, the search-related logic will not be executed
  2. Convert natural language queries into keyword combinations better suited for search engines
  3. For Arxiv paper searches, automatically identify relevant paper categories and add category constraints
  4. For private knowledge base searches, break down long queries into multiple precise keyword combinations

It is strongly recommended to enable this feature when using Arxiv or Elasticsearch engines. For Arxiv searches, it can accurately identify paper domains and optimize English keywords; for private knowledge base searches, it can provide more precise keyword matching, significantly improving search effectiveness.

NameData TypeRequirementDefault ValueDescription
llmServiceNamestringRequired-LLM service name
llmServicePortnumberRequired-LLM service port
llmApiKeystringOptional-LLM service API key
llmUrlstringRequired-LLM service API URL
llmModelNamestringRequired-LLM model name
timeoutMillisecondnumberOptional30000API call timeout (milliseconds)
maxCountnumberOptional3Maximum number of search queries generated by the search rewrite
NameData TypeRequirementDefault ValueDescription
typestringRequired-Engine type (google/bing/arxiv/elasticsearch/quark)
apiKeystringRequired-Search engine API key/Aliyun AccessKey
serviceNamestringRequired-Backend service name
servicePortnumberRequired-Backend service port
countnumberOptional10Number of results returned per search
startnumberOptional0Search result offset (start returning from the start+1 result)
timeoutMillisecondnumberOptional5000API call timeout (milliseconds)
optionArgsmapOptional-Search engine specific parameters (key-value format)
NameData TypeRequirementDefault ValueDescription
cxstringRequired-Google Custom Search Engine ID, used to specify search scope
NameData TypeRequirementDefault ValueDescription
arxivCategorystringOptional-Search paper category (e.g. cs.AI, cs.CL etc.)
NameData TypeRequirementDefault ValueDescription
indexstringRequired-Elasticsearch index name to search
contentFieldstringRequired-Content field name to query
semanticTextFieldstringRequired-Embedding field name to query
linkFieldstringOptional-Result link field name, needed when needReference is configured
titleFieldstringOptional-Result title field name, needed when needReference is configured
usernamestringOptional-Elasticsearch username
passwordstringOptional-Elasticsearch password

The Reciprocal Rank Fusion (RRF) query used in hybrid search requires Elasticsearch version 8.8 or higher.

Currently, document vectorization relies on Elasticsearch’s embedding model, which requires an Elasticsearch Enterprise license or a 30-day Trial license. To install the built-in embedding model in Elasticsearch, please refer to this documentation. If you want to install a third-party embedding model, please refer to this guide.

For a complete tutorial on integrating the ai-search plugin with Elasticsearch, please refer to: Building a RAG Application with LangChain + Higress + Elasticsearch.

NameData TypeRequirementDefault ValueDescription
contentModestringOptional”summary”Content mode: “summary” uses snippet, “full” uses full text (markdownText first, then mainText if empty)

Basic Configuration (Single Search Engine)

Section titled “Basic Configuration (Single Search Engine)”
needReference: true
searchFrom:
- type: google
apiKey: "your-google-api-key"
cx: "search-engine-id"
serviceName: "google-svc.dns"
servicePort: 443
count: 5
optionArgs:
fileType: "pdf"
searchFrom:
- type: arxiv
serviceName: "arxiv-svc.dns"
servicePort: 443
arxivCategory: "cs.AI"
count: 10
searchFrom:
- type: quark
serviceName: "quark-svc.dns"
servicePort: 443
apiKey: "quark api key"
contentMode: "full" # Optional values: "summary"(default) or "full"
defaultLang: "en-US"
promptTemplate: |
# Search Results:
{search_results}
# Please answer this question:
{question}
searchFrom:
- type: google
apiKey: "google-key"
cx: "github-search-id" # Search engine ID specifically for GitHub content
serviceName: "google-svc.dns"
servicePort: 443
- type: google
apiKey: "google-key"
cx: "news-search-id" # Search engine ID specifically for Google News content
serviceName: "google-svc.dns"
servicePort: 443
- type: bing
apiKey: "bing-key"
serviceName: "bing-svc.dns"
servicePort: 443
optionArgs:
answerCount: "5"

Since search engines limit the number of results per query (e.g. Google limits to 100 results per query), you can get more results by:

  1. Setting a smaller count value (e.g. 10)
  2. Specifying result offset with start parameter
  3. Concurrently initiating multiple query requests, with each request’s start value incrementing by count

For example, to get 30 results, configure count=10 and concurrently initiate 3 queries with start values 0,10,20 respectively:

searchFrom:
- type: google
apiKey: "your-google-api-key"
cx: "search-engine-id"
serviceName: "google-svc.dns"
servicePort: 443
start: 0
count: 10
- type: google
apiKey: "your-google-api-key"
cx: "search-engine-id"
serviceName: "google-svc.dns"
servicePort: 443
start: 10
count: 10
- type: google
apiKey: "your-google-api-key"
cx: "search-engine-id"
serviceName: "google-svc.dns"
servicePort: 443
start: 20
count: 10

Note that excessive concurrency may lead to rate limiting, adjust according to actual situation.

Elasticsearch Configuration (For Private Knowledge Base Integration)

Section titled “Elasticsearch Configuration (For Private Knowledge Base Integration)”
searchFrom:
- type: elasticsearch
serviceName: "es-svc.static"
index: "knowledge_base"
contentField: "content"
semanticTextField: "semantic_text"
# username: "elastic"
# password: "password"
needReference: true
referenceFormat: "### Data Sources\n%s"
searchFrom:
- type: bing
apiKey: "your-bing-key"
serviceName: "search-service.dns"
servicePort: 8080
needReference: true
referenceLocation: "tail" # Add references at the end of the response instead of the beginning
searchFrom:
- type: bing
apiKey: "your-bing-key"
serviceName: "search-service.dns"
servicePort: 8080
searchFrom:
- type: google
apiKey: "your-google-api-key"
cx: "search-engine-id"
serviceName: "google-svc.dns"
servicePort: 443
searchRewrite:
llmServiceName: "llm-svc.dns"
llmServicePort: 443
llmApiKey: "your-llm-api-key"
llmUrl: "https://api.example.com/v1/chat/completions"
llmModelName: "gpt-3.5-turbo"
timeoutMillisecond: 15000

Configure the plugin to only be enabled when the request contains a web_search_options field:

defaultEnable: false
searchFrom:
- type: google
apiKey: "your-google-api-key"
cx: "search-engine-id"
serviceName: "google-svc.dns"
servicePort: 443

When the request contains a web_search_options field, even if it’s an empty object ("web_search_options": {}), the plugin will be activated.

You can dynamically adjust the number of search queries by adding a search_context_size parameter in the web_search_options field of the request:

{
"web_search_options": {
"search_context_size": "medium"
}
}

The search_context_size supports three levels:

  • low: Generates 1 search query (suitable for simple questions)
  • medium: Generates 3 search queries (default)
  • high: Generates 5 search queries (suitable for complex questions)

This setting overrides the maxCount value in the configuration, allowing clients to dynamically adjust search depth based on question complexity.

  1. The prompt template must include {search_results} and {question} placeholders, optionally use {cur_date} to insert current date (format: January 2, 2006)
  2. The default template includes search results processing instructions and response specifications, you can use the default template unless there are special needs
  3. Multiple search engines query in parallel, total timeout = maximum timeoutMillisecond value among all search engine configurations + processing time
  4. Arxiv search doesn’t require API key, but you can specify paper category (arxivCategory) to narrow search scope