Open Source AI Gateway to your AI LLM Providers

Using Octelium as a scalable, secure AI Gateway proving secure zero trust client-based as well as client-less BeyondCorp access for both HUMAN as well as WORKLOAD Users (read more here) to any AI LLM provider is usually seamless. This is a simple example where you can have a Gemini API Service, publicly exposed (read more about the public client-less BeyondCorp mode here).

First we need to create a Secret for the CockroachDB database's password as follows:

octeliumctl create secret gemini-api-key

Now we create the Service for the Gemini API as follows:

1kind: Service
2metadata:
3  name: gemini
4spec:
5  mode: HTTP
6  isPublic: true
7  config:
8    upstream:
9      url: https://generativelanguage.googleapis.com
10    http:
11      path:
12        addPrefix: /v1beta/openai
13      auth:
14        bearer:
15          fromSecret: gemini-api-key

You can now apply the creation of the Service as follows (read more here):

octeliumctl apply /PATH/TO/SERVICE.YAML

Octelium enables authorized Users (read more about access control here) to access the Service both via the client-based mode as well as publicly via the client-less BeyondCorp mode (read more here). In this guide, we are going to use the client-less mode to access the Service via the standard OAuth2 client credentials in order for your workloads that can be written in any programming language to access the Service without having to use any special SDKs or have access to external clients All you need is to create an OAUTH2 Credential as illustrated here. Now, here is an example written in Typescript:

1import OpenAI from "openai";
2
3import { OAuth2Client } from "@badgateway/oauth2-client";
4
5async function main() {
6  const oauth2Client = new OAuth2Client({
7    server: "https://<DOMAIN>/",
8    clientId: "spxg-cdyx",
9    clientSecret: "AQpAzNmdEcPIfWYR2l2zLjMJm....",
10    tokenEndpoint: "/oauth2/token",
11    authenticationMethod: "client_secret_post",
12  });
13
14  const oauth2Creds = await oauth2Client.clientCredentials();
15
16  const client = new OpenAI({
17    apiKey: oauth2Creds.accessToken,
18    baseURL: "https://gemini.<DOMAIN>",
19  });
20
21  const chatCompletion = await client.chat.completions.create({
22    messages: [
23      { role: "user", content: "How do I write a Golang HTTP reverse proxy?" },
24    ],
25    model: "gemini-2.0-flash",
26  });
27
28  console.log("Result", chatCompletion);
29}

You can also route to a certain LLM provider based on the content of the request body (read more about dynamic configuration here), here is an example:

1kind: Service
2metadata:
3  name: total-ai
4spec:
5  mode: HTTP
6  isPublic: true
7  dynamicConfig:
8    configs:
9      - name: gemini
10        upstream:
11          url: https://generativelanguage.googleapis.com
12        http:
13          path:
14            addPrefix: /v1beta/openai
15          auth:
16            bearer:
17              fromSecret: gemini-api-key
18      - name: openai
19        upstream:
20          url: https://api.openai.com
21        http:
22          path:
23            addPrefix: /v1
24          auth:
25            bearer:
26              fromSecret: openai-api-key
27      - name: deepseek
28        upstream:
29          url: https://api.deepseek.com
30        http:
31          path:
32            addPrefix: /v1
33          auth:
34            bearer:
35              fromSecret: deepseek-api-key
36    rules:
37      - condition:
38          match: ctx.request.http.bodyMap.model == "gpt-4o-mini"
39        configName: openai
40      - condition:
41          match: ctx.request.http.bodyMap.model == "deepseek-chat"
42        configName: deepseek
43      - condition:
44          matchAny: true
45        configName: gemini

For more complex and dynamic routing rules (e.g. message-based routing), you can use the full power of Open Policy Agent (OPA) (read more here).

When it comes to access control, Octelium provides a rich layer-7 aware, identity-based, context-aware ABAC access control on a per-request basis where you can control access based on the HTTP request's path, method, and even serialized JSON using policy-as-code with CEL and Open Policy Agent (OPA) (You can read more in detail about Policies and access control here). Here is a generic example:

1kind: Service
2metadata:
3  name: my-api
4spec:
5  mode: HTTP
6  config:
7    upstream:
8      url: https://api.example.com
9  authorization:
10    inlinePolicies:
11      - spec:
12          rules:
13            - effect: ALLOW
14              condition:
15                all:
16                  of:
17                    - match: ctx.user.spec.groups.hasAll("dev", "ops")
18                    - match: ctx.request.http.bodyMap.messages.size() < 4
19                    - match: ctx.request.http.bodyMap.model in ["gpt-3.5-turbo", "gpt-4o-mini"]
20                    - match: ctx.request.http.bodyMap.temperature < 0.7

This was just a very simple example of access control for an OpenAI-compliant LLM API. Furthermore, you can use Open Policy Agent (OPA) to create much more complex access control decisions.

Octelium also provides OpenTelemetry-ready, application-layer L7 aware visibility and access logging in real time (see an example for HTTP here). You can read more about visibility here.

Here are a few more features that you might be interested in:

Routing not just by request paths, but also by header keys and values, request body content including JSON (read more here).
Request/response header manipulation (read more here).
Cross-Origin Resource Sharing (CORS) (read more here).
Secret-less access to upstreams and injecting bearer, basic, or custom authentication header credentials (read more here).
Application layer-aware ABAC access control via policy-as-code using CEL and Open Policy Agent (read more here).
OpenTelemetry-ready, application-layer L7 aware auditing and visibility (read more here).