Higress Technical Glossary

Deep dive into core concepts of AI Gateway, API Gateway, and cloud-native technologies. Each term includes clear definitions, practical examples, and Higress use cases.

46 terms 4 categories
🤖

AI / LLM

Core concepts in artificial intelligence and large language models

Token

Token

Token is the basic unit for processing text in large language models. Models split input text into a series of tokens for processing, where each token may be a word, subword, or character. Token count directly affects API call costs and response time.

llm prompt context-window

Large Language Model

Large Language Model

LLM (Large Language Model) is a deep learning-based natural language processing model trained on massive text data, capable of understanding and generating human language. Representative products include GPT, Claude, Qwen, DeepSeek, etc.

token prompt agent

Prompt

Prompt

Prompt is the input instruction or question sent by users to large language models. High-quality prompt design (Prompt Engineering) is crucial for obtaining accurate and useful model outputs.

llm token rag

AI Agent

AI Agent

AI Agent is an intelligent system capable of autonomously perceiving the environment, making decisions, and executing tasks. It combines the reasoning capabilities of large language models with the execution capabilities of external tools to automate complex tasks.

llm mcp rag

AI Hallucination

AI Hallucination

Hallucination refers to the phenomenon where large language models generate information that seems reasonable but is actually inaccurate, unfounded, or inconsistent with their training data. This occurs because models predict the next token probabilistically rather than truly understanding facts.

llm rag prompt-engineering

Prompt Engineering

Prompt Engineering

Prompt Engineering is the technique of designing, refining, and optimizing instructions (Prompts) input to AI models to guide them to generate higher quality and more accurate outputs. It includes various strategies such as structured prompts and few-shot guidance.

prompt llm agent

Model Routing

Model Routing

Model Routing is a technique that dynamically selects the most appropriate LLM based on request complexity, cost requirements, response time, or content type. It can balance cost and performance to achieve optimal utilization of AI resources.

llm ai-gateway failover

AI Content Safety

AI Content Safety

AI Content Safety refers to reviewing inputs (Prompts) and outputs (Responses) of large language models, identifying and blocking violations, harmful, pornographic, or sensitive content to ensure AI applications comply with regulatory requirements.

llm prompt waf

Model Context Protocol

Model Context Protocol

MCP (Model Context Protocol) is an open standard protocol proposed by Anthropic for connecting AI models with external data sources and tools. It defines a unified interface specification, enabling AI applications to securely access various resources.

agent function-calling api-gateway

Retrieval-Augmented Generation

Retrieval-Augmented Generation

RAG is a technique that combines information retrieval with text generation. It first retrieves relevant documents from a knowledge base, then uses the retrieval results as context input to the LLM to generate answers, effectively reducing model hallucinations and providing up-to-date information.

llm embedding vector-database

Embedding

Embedding

Embedding is a technique for converting data such as text and images into high-dimensional vectors. These vectors can capture semantic information of the data, making semantically similar content close in vector space, which is the foundation for semantic search and RAG.

rag vector-database llm

Function Calling

Function Calling

Function Calling is the capability of LLMs to interact with external systems. Models can decide to call predefined functions based on user intent and generate parameters that match function signatures, enabling operations such as querying databases and calling APIs.

agent mcp llm

Context Window

Context Window

Context Window is the maximum token count limit that an LLM can process in a single session. It determines how much conversation history and reference information the model can "remember". Larger context windows support longer conversations and more reference documents.

token llm prompt

AI Gateway

AI Gateway

AI Gateway is an API gateway designed specifically for AI applications, providing unified access to large model APIs, protocol conversion, traffic management, security protection, and observability capabilities. It is a core component of enterprise AI infrastructure.

api-gateway llm token
🚪

Gateway

API gateway and traffic management terminology

API Gateway

API Gateway

API Gateway is a unified entry point in microservices architecture, responsible for request routing, protocol conversion, authentication and authorization, rate limiting and circuit breaking, monitoring and logging, etc. It shields clients from backend service complexity and provides a unified API access layer.

rate-limiting load-balancing ingress

Rate Limiting

Rate Limiting

Rate Limiting is a technique for controlling API request rates by limiting the number of requests per unit time, protecting backend services from overload and ensuring system stability and fair resource allocation. Common algorithms include token bucket, leaky bucket, sliding window, etc.

circuit-breaker api-gateway token

Load Balancing

Load Balancing

Load Balancing distributes traffic across multiple backend service instances to improve system availability and processing capacity. Common strategies include round-robin, weighted round-robin, least connections, consistent hashing, etc.

api-gateway service-discovery health-check

Circuit Breaker

Circuit Breaker

Circuit Breaker is a fault tolerance mechanism that automatically "opens" the request chain when downstream services fail, quickly returning error responses to prevent fault propagation. It automatically "closes" to resume normal calls after service recovery.

rate-limiting failover health-check

Failover

Failover

Failover is a mechanism where the system automatically switches to backup resources when a fault is detected. It ensures that requests can be automatically routed to healthy backup services when the primary service is unavailable, ensuring business continuity.

circuit-breaker health-check load-balancing

Health Check

Health Check

Health Check is a mechanism that periodically probes backend service status, detecting whether services are running normally through active or passive methods. Unhealthy instances are automatically removed from the load balancing pool to prevent requests from being routed to faulty nodes.

load-balancing failover service-discovery

Reverse Proxy

Reverse Proxy

Reverse Proxy is located on the server side, receiving client requests and forwarding them to backend servers, hiding the real server address from clients. It can implement load balancing, caching, SSL termination, security protection, and other functions.

api-gateway load-balancing ssl-termination

Service Discovery

Service Discovery

Service Discovery is a mechanism in microservices architecture that automatically detects and locates service instances. Services register with a registry center upon startup, and other services obtain available instance addresses through the registry center to achieve dynamic service calls.

api-gateway load-balancing nacos

Canary Release

Canary Release

Canary Release is a gradual release strategy that deploys new versions to a small subset of users first, then gradually expands the scope after observing the running status. It reduces release risks and supports quick rollback.

api-gateway load-balancing failover

gRPC Remote Procedure Call

gRPC Remote Procedure Call

gRPC is a modern, high-performance, open-source Remote Procedure Call (RPC) framework developed by Google. It uses Protocol Buffers as the interface description language and underlying serialization format, supporting bidirectional streaming and efficient cross-language calls.

api-gateway reverse-proxy http3

WebSockets

WebSockets

WebSockets is a protocol for full-duplex communication over a single TCP connection. It allows servers to actively push data to clients, enabling real-time interactive web applications.

api-gateway reverse-proxy http3

Observability

Observability

Observability helps developers understand the running status of complex systems and quickly locate problems in production environments through three pillars: Metrics, Distributed Tracing, and Logging.

api-gateway health-check kubernetes
☁️

Cloud Native

Kubernetes, containers, and microservices concepts

Kubernetes

Kubernetes

Kubernetes is an open-source container orchestration platform that automates deployment, scaling, and management of containerized applications. It provides capabilities such as service discovery, load balancing, storage orchestration, and automatic rollback, making it the de facto standard for cloud-native infrastructure.

ingress gateway-api envoy

Kubernetes Ingress

Kubernetes Ingress

Ingress is an API object in Kubernetes that manages external access to the cluster, defining HTTP/HTTPS routing rules to route external traffic to Services within the cluster. Ingress Controller is responsible for implementing these routing rules.

kubernetes gateway-api api-gateway

Kubernetes Gateway API

Kubernetes Gateway API

Gateway API is Kubernetes' next-generation gateway standard, providing richer routing capabilities and clearer role separation compared to Ingress. It supports various protocols such as HTTP, TCP, and gRPC, and is the evolution direction of Ingress.

kubernetes ingress api-gateway

Envoy Proxy

Envoy Proxy

Envoy is CNCF's high-performance edge and service proxy designed for cloud-native applications. It supports dynamic configuration, rich observability, advanced load balancing, and other features, making it a core component of projects like Istio and Higress.

service-mesh wasm istio

Service Mesh

Service Mesh

Service Mesh is an infrastructure layer for inter-microservice communication, handling network communication between services through Sidecar proxies, providing capabilities such as load balancing, service discovery, encryption, and observability, separating these concerns from business code.

envoy istio sidecar

Istio Service Mesh

Istio Service Mesh

Istio is an open-source service mesh platform providing capabilities such as traffic management, security, and observability. It consists of a control plane (istiod) and a data plane (Envoy proxies), making it the preferred solution for enterprise-grade service meshes.

service-mesh envoy kubernetes

WebAssembly

WebAssembly

WebAssembly is a portable binary instruction format that supports compilation from multiple programming languages. In gateway scenarios, Wasm plugins can safely and efficiently extend gateway functionality, support hot updates, and ensure security through sandbox isolation.

envoy api-gateway plugin

Nacos

Nacos

Nacos is Alibaba's open-source service discovery and configuration management platform, supporting dynamic service discovery, configuration management, DNS services, and other functions. It is a popular choice for registry centers and configuration centers in microservices architecture.

service-discovery kubernetes api-gateway

Sidecar Pattern

Sidecar Pattern

Sidecar is a deployment pattern that deploys auxiliary functions (such as proxies, log collection) as independent containers running in parallel with the main application container. This pattern achieves separation of concerns and can enhance functionality without modifying application code.

service-mesh envoy istio

Control Plane

Control Plane

Control Plane is the "brain" of distributed systems, responsible for managing configurations, formulating routing policies, and monitoring the status of the data plane. It does not handle actual user business traffic but issues commands to the data plane.

data-plane envoy kubernetes

Data Plane

Data Plane

Data Plane is responsible for actually processing and forwarding business traffic. It executes specific operations such as routing, filtering, rate limiting, encryption/decryption based on configurations issued by the control plane.

control-plane envoy api-gateway

Custom Resource Definition

Custom Resource Definition

CRD is Kubernetes' extension mechanism that allows users to define their own API object types. Through CRD, you can manage custom business resources using kubectl just like native Pods and Services.

kubernetes ingress gateway-api
🔐

Security

Authentication, authorization, and security protection terms

JSON Web Token

JSON Web Token

JWT is a compact, URL-safe token format used to securely transmit information between parties. It consists of three parts: Header, Payload, and Signature, commonly used for authentication and information exchange.

oauth oidc api-key

OAuth 2.0

OAuth 2.0

OAuth 2.0 is an industry-standard authorization framework that allows third-party applications to access user resources with user authorization without sharing user credentials. It defines various authorization flows such as authorization code, implicit, password, and client credentials.

jwt oidc api-gateway

OpenID Connect

OpenID Connect

OIDC is an identity authentication protocol based on OAuth 2.0, adding an identity layer on top of the OAuth authorization flow. It provides a standardized way to obtain user information and is the foundation of modern Single Sign-On (SSO).

oauth jwt sso

Mutual TLS

Mutual TLS

mTLS is bidirectional TLS authentication where not only does the server prove its identity to the client, but the client also proves its identity to the server. It provides stronger security guarantees than one-way TLS and is an important component of zero-trust architecture.

ssl-termination zero-trust service-mesh

Web Application Firewall

Web Application Firewall

WAF is a security protection system for web applications that analyzes HTTP traffic to identify and block common web attacks such as SQL injection, XSS, and CSRF. It is an important line of defense for application security.

api-gateway security ddos

API Key

API Key

API Key is a simple API authentication method where clients carry pre-assigned keys in requests to prove identity. It is simple to implement but has lower security, suitable for internal services or low-risk scenarios.

jwt oauth rate-limiting

SSL Termination

SSL Termination

SSL Termination is a technique for decrypting HTTPS traffic at the gateway layer, where the gateway handles TLS handshake and encryption/decryption, and backend services only need to process plain HTTP requests. This simplifies certificate management for backend services while allowing the gateway to inspect and process request content.

reverse-proxy mtls api-gateway

Zero Trust Architecture

Zero Trust Architecture

Zero Trust is a security model with the core principle of "never trust, always verify". It assumes that both inside and outside the network are insecure, and every request needs to verify identity and permissions, minimizing attack surface and lateral movement risks.

mtls oidc api-gateway

Get Started with Higress

Now that you understand these core concepts, experience the power of Higress