Higress Technical Glossary
Deep dive into core concepts of AI Gateway, API Gateway, and cloud-native technologies. Each term includes clear definitions, practical examples, and Higress use cases.
AI / LLM
Core concepts in artificial intelligence and large language models
Token
Token
Token is the basic unit for processing text in large language models. Models split input text into a series of tokens for processing, where each token may be a word, subword, or character. Token count directly affects API call costs and response time.
Large Language Model
Large Language Model
LLM (Large Language Model) is a deep learning-based natural language processing model trained on massive text data, capable of understanding and generating human language. Representative products include GPT, Claude, Qwen, DeepSeek, etc.
Prompt
Prompt
Prompt is the input instruction or question sent by users to large language models. High-quality prompt design (Prompt Engineering) is crucial for obtaining accurate and useful model outputs.
AI Agent
AI Agent
AI Agent is an intelligent system capable of autonomously perceiving the environment, making decisions, and executing tasks. It combines the reasoning capabilities of large language models with the execution capabilities of external tools to automate complex tasks.
AI Hallucination
AI Hallucination
Hallucination refers to the phenomenon where large language models generate information that seems reasonable but is actually inaccurate, unfounded, or inconsistent with their training data. This occurs because models predict the next token probabilistically rather than truly understanding facts.
Prompt Engineering
Prompt Engineering
Prompt Engineering is the technique of designing, refining, and optimizing instructions (Prompts) input to AI models to guide them to generate higher quality and more accurate outputs. It includes various strategies such as structured prompts and few-shot guidance.
Model Routing
Model Routing
Model Routing is a technique that dynamically selects the most appropriate LLM based on request complexity, cost requirements, response time, or content type. It can balance cost and performance to achieve optimal utilization of AI resources.
AI Content Safety
AI Content Safety
AI Content Safety refers to reviewing inputs (Prompts) and outputs (Responses) of large language models, identifying and blocking violations, harmful, pornographic, or sensitive content to ensure AI applications comply with regulatory requirements.
Model Context Protocol
Model Context Protocol
MCP (Model Context Protocol) is an open standard protocol proposed by Anthropic for connecting AI models with external data sources and tools. It defines a unified interface specification, enabling AI applications to securely access various resources.
Retrieval-Augmented Generation
Retrieval-Augmented Generation
RAG is a technique that combines information retrieval with text generation. It first retrieves relevant documents from a knowledge base, then uses the retrieval results as context input to the LLM to generate answers, effectively reducing model hallucinations and providing up-to-date information.
Embedding
Embedding
Embedding is a technique for converting data such as text and images into high-dimensional vectors. These vectors can capture semantic information of the data, making semantically similar content close in vector space, which is the foundation for semantic search and RAG.
Function Calling
Function Calling
Function Calling is the capability of LLMs to interact with external systems. Models can decide to call predefined functions based on user intent and generate parameters that match function signatures, enabling operations such as querying databases and calling APIs.
Context Window
Context Window
Context Window is the maximum token count limit that an LLM can process in a single session. It determines how much conversation history and reference information the model can "remember". Larger context windows support longer conversations and more reference documents.
AI Gateway
AI Gateway
AI Gateway is an API gateway designed specifically for AI applications, providing unified access to large model APIs, protocol conversion, traffic management, security protection, and observability capabilities. It is a core component of enterprise AI infrastructure.
Gateway
API gateway and traffic management terminology
API Gateway
API Gateway
API Gateway is a unified entry point in microservices architecture, responsible for request routing, protocol conversion, authentication and authorization, rate limiting and circuit breaking, monitoring and logging, etc. It shields clients from backend service complexity and provides a unified API access layer.
Rate Limiting
Rate Limiting
Rate Limiting is a technique for controlling API request rates by limiting the number of requests per unit time, protecting backend services from overload and ensuring system stability and fair resource allocation. Common algorithms include token bucket, leaky bucket, sliding window, etc.
Load Balancing
Load Balancing
Load Balancing distributes traffic across multiple backend service instances to improve system availability and processing capacity. Common strategies include round-robin, weighted round-robin, least connections, consistent hashing, etc.
Circuit Breaker
Circuit Breaker
Circuit Breaker is a fault tolerance mechanism that automatically "opens" the request chain when downstream services fail, quickly returning error responses to prevent fault propagation. It automatically "closes" to resume normal calls after service recovery.
Failover
Failover
Failover is a mechanism where the system automatically switches to backup resources when a fault is detected. It ensures that requests can be automatically routed to healthy backup services when the primary service is unavailable, ensuring business continuity.
Health Check
Health Check
Health Check is a mechanism that periodically probes backend service status, detecting whether services are running normally through active or passive methods. Unhealthy instances are automatically removed from the load balancing pool to prevent requests from being routed to faulty nodes.
Reverse Proxy
Reverse Proxy
Reverse Proxy is located on the server side, receiving client requests and forwarding them to backend servers, hiding the real server address from clients. It can implement load balancing, caching, SSL termination, security protection, and other functions.
Service Discovery
Service Discovery
Service Discovery is a mechanism in microservices architecture that automatically detects and locates service instances. Services register with a registry center upon startup, and other services obtain available instance addresses through the registry center to achieve dynamic service calls.
Canary Release
Canary Release
Canary Release is a gradual release strategy that deploys new versions to a small subset of users first, then gradually expands the scope after observing the running status. It reduces release risks and supports quick rollback.
gRPC Remote Procedure Call
gRPC Remote Procedure Call
gRPC is a modern, high-performance, open-source Remote Procedure Call (RPC) framework developed by Google. It uses Protocol Buffers as the interface description language and underlying serialization format, supporting bidirectional streaming and efficient cross-language calls.
WebSockets
WebSockets
WebSockets is a protocol for full-duplex communication over a single TCP connection. It allows servers to actively push data to clients, enabling real-time interactive web applications.
Observability
Observability
Observability helps developers understand the running status of complex systems and quickly locate problems in production environments through three pillars: Metrics, Distributed Tracing, and Logging.
Cloud Native
Kubernetes, containers, and microservices concepts
Kubernetes
Kubernetes
Kubernetes is an open-source container orchestration platform that automates deployment, scaling, and management of containerized applications. It provides capabilities such as service discovery, load balancing, storage orchestration, and automatic rollback, making it the de facto standard for cloud-native infrastructure.
Kubernetes Ingress
Kubernetes Ingress
Ingress is an API object in Kubernetes that manages external access to the cluster, defining HTTP/HTTPS routing rules to route external traffic to Services within the cluster. Ingress Controller is responsible for implementing these routing rules.
Kubernetes Gateway API
Kubernetes Gateway API
Gateway API is Kubernetes' next-generation gateway standard, providing richer routing capabilities and clearer role separation compared to Ingress. It supports various protocols such as HTTP, TCP, and gRPC, and is the evolution direction of Ingress.
Envoy Proxy
Envoy Proxy
Envoy is CNCF's high-performance edge and service proxy designed for cloud-native applications. It supports dynamic configuration, rich observability, advanced load balancing, and other features, making it a core component of projects like Istio and Higress.
Service Mesh
Service Mesh
Service Mesh is an infrastructure layer for inter-microservice communication, handling network communication between services through Sidecar proxies, providing capabilities such as load balancing, service discovery, encryption, and observability, separating these concerns from business code.
Istio Service Mesh
Istio Service Mesh
Istio is an open-source service mesh platform providing capabilities such as traffic management, security, and observability. It consists of a control plane (istiod) and a data plane (Envoy proxies), making it the preferred solution for enterprise-grade service meshes.
WebAssembly
WebAssembly
WebAssembly is a portable binary instruction format that supports compilation from multiple programming languages. In gateway scenarios, Wasm plugins can safely and efficiently extend gateway functionality, support hot updates, and ensure security through sandbox isolation.
Nacos
Nacos
Nacos is Alibaba's open-source service discovery and configuration management platform, supporting dynamic service discovery, configuration management, DNS services, and other functions. It is a popular choice for registry centers and configuration centers in microservices architecture.
Sidecar Pattern
Sidecar Pattern
Sidecar is a deployment pattern that deploys auxiliary functions (such as proxies, log collection) as independent containers running in parallel with the main application container. This pattern achieves separation of concerns and can enhance functionality without modifying application code.
Control Plane
Control Plane
Control Plane is the "brain" of distributed systems, responsible for managing configurations, formulating routing policies, and monitoring the status of the data plane. It does not handle actual user business traffic but issues commands to the data plane.
Data Plane
Data Plane
Data Plane is responsible for actually processing and forwarding business traffic. It executes specific operations such as routing, filtering, rate limiting, encryption/decryption based on configurations issued by the control plane.
Custom Resource Definition
Custom Resource Definition
CRD is Kubernetes' extension mechanism that allows users to define their own API object types. Through CRD, you can manage custom business resources using kubectl just like native Pods and Services.
Security
Authentication, authorization, and security protection terms
JSON Web Token
JSON Web Token
JWT is a compact, URL-safe token format used to securely transmit information between parties. It consists of three parts: Header, Payload, and Signature, commonly used for authentication and information exchange.
OAuth 2.0
OAuth 2.0
OAuth 2.0 is an industry-standard authorization framework that allows third-party applications to access user resources with user authorization without sharing user credentials. It defines various authorization flows such as authorization code, implicit, password, and client credentials.
OpenID Connect
OpenID Connect
OIDC is an identity authentication protocol based on OAuth 2.0, adding an identity layer on top of the OAuth authorization flow. It provides a standardized way to obtain user information and is the foundation of modern Single Sign-On (SSO).
Mutual TLS
Mutual TLS
mTLS is bidirectional TLS authentication where not only does the server prove its identity to the client, but the client also proves its identity to the server. It provides stronger security guarantees than one-way TLS and is an important component of zero-trust architecture.
Web Application Firewall
Web Application Firewall
WAF is a security protection system for web applications that analyzes HTTP traffic to identify and block common web attacks such as SQL injection, XSS, and CSRF. It is an important line of defense for application security.
API Key
API Key
API Key is a simple API authentication method where clients carry pre-assigned keys in requests to prove identity. It is simple to implement but has lower security, suitable for internal services or low-risk scenarios.
SSL Termination
SSL Termination
SSL Termination is a technique for decrypting HTTPS traffic at the gateway layer, where the gateway handles TLS handshake and encryption/decryption, and backend services only need to process plain HTTP requests. This simplifies certificate management for backend services while allowing the gateway to inspect and process request content.
Zero Trust Architecture
Zero Trust Architecture
Zero Trust is a security model with the core principle of "never trust, always verify". It assumes that both inside and outside the network are insecure, and every request needs to verify identity and permissions, minimizing attack surface and lateral movement risks.
Get Started with Higress
Now that you understand these core concepts, experience the power of Higress