ManagementGuideService ExamplesAI
Octelium as an AI Gateway

Overview

Using Octelium provides a complete self-hosted, open source infrastructure to build your AI gateways to both SaaS as well as self-hosted AI LLM models. When used as an AI gateway, Octelium provides the following:

  • A unified scalable infrastructure for all your applications written in any programming language, to securely access any AI LLM API in a secretless way without having to manage and distribute API keys and access tokens (read more about secretless access here).
  • Self-hosted your models via managed containers (read more here). You can see a dedicated example for Ollama here.
  • Centralized identity-based, application-layer (L7) aware ABAC access control on a per-request basis (read more about access control here).
  • Unified, scalable identity management for both HUMAN as well as WORKLOAD Users (read more here).
  • Request/output sanitization and manipulation with Lua scripts and Envoy ExtProc compatible interfaces (read more here) to build you ad-hoc rate limiting, semantic caching, guardrails and DLP logic.
  • OpenTelemetry-native, identity-based, L7 aware visibility and auditing that captures requests and responses including serialized JSON body content.
  • GitOps-friendly declarative, programmable management (read more here).

AI Gateway

A Simple Gateway

This is a simple example where you can have a Gemini API Service, publicly exposed (read more about the public clientless BeyondCorp mode here). First we need to create a Secret for the CockroachDB database's password as follows:

octeliumctl create secret gemini-api-key

Now we create the Service for the Gemini API as follows:

1
kind: Service
2
metadata:
3
name: gemini
4
spec:
5
mode: HTTP
6
isPublic: true
7
config:
8
upstream:
9
url: https://generativelanguage.googleapis.com
10
http:
11
path:
12
addPrefix: /v1beta/openai
13
auth:
14
bearer:
15
fromSecret: gemini-api-key

You can now apply the creation of the Service as follows (read more here):

octeliumctl apply /PATH/TO/SERVICE.YAML

Client Side

Octelium enables authorized Users (read more about access control here) to access the Service both via the client-based mode as well as publicly via the clientless BeyondCorp mode (read more here). In this guide, we are going to use the clientless mode to access the Service via the standard OAuth2 client credentials in order for your workloads that can be written in any programming language to access the Service without having to use any special SDKs or have access to external clients All you need is to create an OAUTH2 Credential as illustrated here. Now, here is an example written in Typescript:

1
import OpenAI from "openai";
2
3
import { OAuth2Client } from "@badgateway/oauth2-client";
4
5
async function main() {
6
const oauth2Client = new OAuth2Client({
7
server: "https://<DOMAIN>/",
8
clientId: "spxg-cdyx",
9
clientSecret: "AQpAzNmdEcPIfWYR2l2zLjMJm....",
10
tokenEndpoint: "/oauth2/token",
11
authenticationMethod: "client_secret_post",
12
});
13
14
const oauth2Creds = await oauth2Client.clientCredentials();
15
16
const client = new OpenAI({
17
apiKey: oauth2Creds.accessToken,
18
baseURL: "https://gemini.<DOMAIN>",
19
});
20
21
const chatCompletion = await client.chat.completions.create({
22
messages: [
23
{ role: "user", content: "How do I write a Golang HTTP reverse proxy?" },
24
],
25
model: "gemini-2.0-flash",
26
});
27
28
console.log("Result", chatCompletion);
29
}

Dynamic Routing

You can also route to a certain LLM provider based on the content of the request body (read more about dynamic configuration here), here is an example:

1
kind: Service
2
metadata:
3
name: total-ai
4
spec:
5
mode: HTTP
6
isPublic: true
7
config:
8
upstream:
9
url: https://generativelanguage.googleapis.com
10
http:
11
enableRequestBuffering: true
12
body:
13
mode: JSON
14
path:
15
addPrefix: /v1beta/openai
16
auth:
17
bearer:
18
fromSecret: gemini-api-key
19
dynamicConfig:
20
configs:
21
- name: openai
22
upstream:
23
url: https://api.openai.com
24
http:
25
path:
26
addPrefix: /v1
27
auth:
28
bearer:
29
fromSecret: openai-api-key
30
- name: deepseek
31
upstream:
32
url: https://api.deepseek.com
33
http:
34
path:
35
addPrefix: /v1
36
auth:
37
bearer:
38
fromSecret: deepseek-api-key
39
rules:
40
- condition:
41
match: ctx.request.http.bodyMap.model == "gpt-4o-mini"
42
configName: openai
43
- condition:
44
match: ctx.request.http.bodyMap.model == "deepseek-chat"
45
configName: deepseek
46
# Fallback to the default config
47
- condition:
48
matchAny: true
49
configName: default

For more complex and dynamic routing rules (e.g. message-based routing), you can use the full power of Open Policy Agent (OPA) (read more here).

Access Control

When it comes to access control, Octelium provides a rich layer-7 aware, identity-based, context-aware ABAC access control on a per-request basis where you can control access based on the HTTP request's path, method, and even serialized JSON using policy-as-code with CEL and Open Policy Agent (OPA) (You can read more in detail about Policies and access control here). Here is a generic example:

1
kind: Service
2
metadata:
3
name: my-api
4
spec:
5
mode: HTTP
6
config:
7
upstream:
8
url: https://api.example.com
9
http:
10
enableRequestBuffering: true
11
body:
12
mode: JSON
13
authorization:
14
inlinePolicies:
15
- spec:
16
rules:
17
- effect: ALLOW
18
condition:
19
all:
20
of:
21
- match: ctx.user.spec.groups.hasAll("dev", "ops")
22
- match: ctx.request.http.bodyMap.messages.size() < 4
23
- match: ctx.request.http.bodyMap.model in ["gpt-3.5-turbo", "gpt-4o-mini"]
24
- match: ctx.request.http.bodyMap.temperature < 0.7

This was just a very simple example of access control for an OpenAI-compliant LLM API. Furthermore, you can use Open Policy Agent (OPA) to create much more complex access control decisions.

Request/Response Manipulation

You can also use Octelium's plugins, currently Lua scripts and Envoy's ExtProc, to sanitize and manipulate your request and responses (read more about HTTP plugins here). Here is an example:

1
kind: Service
2
metadata:
3
name: safe-gemini
4
spec:
5
mode: HTTP
6
isPublic: true
7
config:
8
upstream:
9
url: https://generativelanguage.googleapis.com
10
http:
11
enableRequestBuffering: true
12
body:
13
mode: JSON
14
path:
15
addPrefix: /v1beta/openai
16
plugins:
17
- name: main
18
condition:
19
match: ctx.request.http.path == "/chat/completions" && ctx.request.http.method == "POST"
20
lua:
21
inline: |
22
function onRequest(ctx)
23
local body = json.decode(octelium.req.getRequestBody())
24
25
if body.temperature > 0.7 then
26
body.temperature = 0.7
27
end
28
29
if body.model == "gemini-2.5-pro" then
30
body.model = "gemini-2.5-flash"
31
end
32
33
if #body.messages > 4 then
34
octelium.req.exit(400)
35
return
36
end
37
38
body.messages[0].role = "system"
39
body.messages[0].content = "You are a helpful assistant that provides concise answers"
40
41
for idx, message in ipairs(body.messages) do
42
if strings.lenUnicode(message.content) > 500 then
43
octelium.req.exit(400)
44
return
45
end
46
end
47
48
if body.temperature > 0.4 then
49
local c = http.client()
50
c:setBaseURL("http://guardrail-api.default.svc")
51
local req = c:request()
52
req:setBody(json.encode(body))
53
local resp, err = req:post("/v1/check")
54
if err then
55
octelium.req.exit(500)
56
return
57
end
58
59
if resp:code() == 200 then
60
local apiResp = json.decode(resp:body())
61
if not apiResp.isAllowed then
62
octelium.req.exit(400)
63
return
64
end
65
end
66
end
67
68
if strings.contains(strings.toLower(body.messages[1].content), "paris") then
69
local c = http.client()
70
c:setBaseURL("http://semantic-caching.default.svc")
71
local req = c:request()
72
req:setBody(json.encode(body))
73
local resp, err = req:post("/v1/get")
74
if err then
75
octelium.req.exit(500)
76
return
77
end
78
79
if resp:code() == 200 then
80
local apiResp = json.decode(resp:body())
81
octelium.req.setResponseBody(json.encode(apiResp.response))
82
octelium.req.exit(200)
83
return
84
end
85
end
86
87
octelium.req.setRequestBody(json.encode(body))
88
end

Observability

Octelium also provides OpenTelemetry-ready, application-layer L7 aware visibility and access logging in real time (see an example for HTTP here) that includes capturing request/response serialized JSON body content (read more here). You can read more about visibility here.

Here are a few more features that you might be interested in:

  • Routing not just by request paths, but also by header keys and values, request body content including JSON (read more here).
  • Request/response header manipulation (read more here).
  • Cross-Origin Resource Sharing (CORS) (read more here).
  • Secretless access to upstreams and injecting bearer, basic, or custom authentication header credentials (read more here).
  • Application layer-aware ABAC access control via policy-as-code using CEL and Open Policy Agent (read more here).
  • OpenTelemetry-ready, application-layer L7 aware auditing and visibility (read more here).
© 2026 octelium.comOctelium Labs, LLCAll rights reserved
Octelium and Octelium logo are trademarks of Octelium Labs, LLC.
WireGuard is a registered trademark of Jason A. Donenfeld