Octelium as an MCP Gateway

ManagementGuideService ExamplesAI

Overview

Octelium provides a unified, secure, free and open source, scalable, self-hosted infrastructure to build your MCP-based architectures, MCP gateways and modern agentic MCP and A2A meshes at scale. Octelium provides the following:

A unified scalable infrastructure for all your MCP clients, written in any programming language, to securely access all MCP servers running behind NAT anywhere (e.g. private clouds, IoT, your own laptop, etc...), via both client-based as well as clientless access over standard OAuth2 and bearer authentication.
Deploy and scale your containerized SSE/streamable HTTP-based MCP servers in constrained Kubernetes pods managed by the Octelium Cluster (read more about managed containers here).
Centralized identity-based, application-layer (L7) aware access control that is based on the content of JSON-RPC messages (read more about access control here).
Unified, scalable identity management for all your MCP clients.
Request/output sanitization and manipulation of MCP JSON-RPC messages via Lua scripts and Envoy ExtProc plugins (read more here).
OpenTelemetry-native, identity-based, L7 aware visibility and auditing that captures requests and responses including serialized JSON body content.
Seamless horizontal scalability and availability since Octelium operates on top of Kubernetes (read more about how Octelium works here).
GitOps-friendly declarative, programmable management (read more here).

In short, Octelium not only completely takes care of providing secure access to your MCP in any environment behind NAT, but it also enables you to offload identity management and authentication, L7-aware authorization, MCP server deployment and scalability, input/output MCP message validation and manipulation, as well as visibility out of the codebase of your MCP clients and servers to focus solely on your business logic.

MCP Gateway

MCP Servers

In this guide we are going to use a very simply MCP server that provides just addition and subtraction operations and is serving over streamable HTTP transport. The MCP server is written in Python via FastMCP as follows:

1import asyncio
2import os
3
4from fastmcp import FastMCP
5
6mcp = FastMCP("Demo Octelium MCP Server")
7
8@mcp.tool()
9def add(a: int, b: int) -> int:
10    return a + b
11
12@mcp.tool()
13def subtract(a: int, b: int) -> int:
14    return a - b
15
16if __name__ == "__main__":
17    asyncio.run(
18        mcp.run_async(
19            transport="streamable-http",
20            host="0.0.0.0",
21            port=os.getenv("PORT", 8080),
22        )
23    )

We now explore 3 possibilities on how to use our MCP server as an upstream for an Octelium Service that represents it (read more about Services here):

The MCP server can be running in any internal network behind NAT (e.g. private cloud, your own laptop, IoT, etc...). Here is a simple example where the upstream is running at http://localhost:8080 and is remotely served by a connected octelium client used by the User mcp-01 (read more about serving Services via connected Users here):

1kind: Service
2metadata:
3  name: my-mcp
4spec:
5  port: 8080
6  mode: HTTP
7  isPublic: true
8  config:
9    upstream:
10      url: http://localhost:8080
11      user: mcp-01

NOTE

The isPublic field enables the public clientless (i.e. BeyondCorp) access mode. Read more here.

You can also deploy and scale your containerized streameable HTTP-based MCP server and serve it as a Service by reusing the underlying Kubernetes infrastructure that runs the Octelium Cluster (read more about managed containers here), including Docker images from private container registries (e.g. Docker register, GitHub's ghcr, etc...). Managed containers are deployed, scaled via Octelium (read more about managed containers here), and served directly as a Service upstream. Here is an example:

1kind: Service
2metadata:
3  name: my-mcp
4spec:
5  mode: HTTP
6  isPublic: true
7  config:
8    upstream:
9      container:
10        port: 8080
11        image: ghcr.io/org/my-mcp:1.2.3
12        credentials:
13          usernamePassword:
14            username: ghcr-username
15            password:
16              fromSecret: ghcr-token
17        resourceLimit:
18          cpu:
19            millicores: 1000
20          memory:
21            megabytes: 2000

NOTE

Octelium also provides dynamic configuration to route to different upstreams (e.g. multiple MCP server versions) based on identity and/or context (read more here) via policy-as-code on a per-request basis. You can even use the managed container mode to simultaneously deploy multiple containers and route among them (read more here).

The third possibility is if your SSE/streamable HTTP MCP server is listening publicly over the internet but protected by some L7 credential such as an API key or a bearer access token. Octelium also supports secretless access for Users to public MCP servers that are protected by standard bearer access tokens, basic authentication, API keys set in custom headers as well as OAuth2 client credential flows without having to manage and distribute such credentials to your Users. You can read more here. Here is a simple example for an MCP server that is protected by bearer authentication:

1kind: Service
2metadata:
3  name: my-mcp
4spec:
5  mode: HTTP
6  isPublic: true
7  config:
8    upstream:
9      url: https://my-mcp.example.com
10    http:
11      auth:
12        bearer:
13          fromSecret: my-api-key

Now whether your MCP server is listening behind NAT or deployed as a manged container, you can apply the creation of your MCP server Service via the octeliumctl apply command (read more here) as follows:

octeliumctl apply /PATH/TO/SERVICE.YAML

NOTE

If you have many MCP servers that need to be categorized by a certain domain or a functionality, you might want to take a look at Namespaces (read more here) where you can organize your MCP server Services and affect their hostnames as well as access control to a whole set of Services that share a certain purpose or functionality according to your needs.

MCP Clients

Now we move on to the client-side of MCPs to understand how to actually securely access our MCP server. Octelium provides both client-based mode via the octelium connect command to access your Services privately over WireGuard/QUIC tunnels via the octelium CLI or container (read more here), as well as via clientless access mode (read more here) for your MCP clients to access all your MCP servers. When it comes to user and identity management, Octelium supports two User types:

HUMAN Users who can access Service usually by authenticating to the Cluster via their web browsers through an OpenID Connect, SAML 2.0 or GitHub OAuth2 IdentityProvider (read more here).
WORKLOAD Users, used by non-human entities such as servers, containers and applications written in any programming language. Such Users can authenticate to the Cluster through authentication tokens (read more here), OpenID Connect assertions, OAuth2 client credentials (read more here) and or are issued access tokens directly (read more here).

Clientless Access

The main value of using Octelium is providing a unified and scalable identity management for all your clients where you can have a single standard OAuth2 client credential or bearer authentication for your MCP clients to access all authorized MCP servers without having to use any special SDKs or clients from the clients' side or even having to be aware of the Octelium's Cluster existence at all.

MCP clients can use the standard OAuth2 client credentials flow to obtain a bearer access token and publicly access the MCP server Service. You can read in detail on how to issue an OAuth client credential to access a Service here.

NOTE

You can also directly issue access token Credentials here and use them directly in standard bearer authentication.

Access Control

Now we move on to access control. Octelium's application-layer (L7) awareness seamlessly enables you to control access at the HTTP-layer based on HTTP request paths, headers, methods and more importantly in our use case for MCP, JSON body of the requests. Here is an example:

1kind: Service
2metadata:
3  name: my-mcp
4spec:
5  port: 8080
6  mode: HTTP
7  isPublic: true
8  config:
9    upstream:
10      url: https://mcp.example.com
11    http:
12      enableRequestBuffering: true
13      body:
14        mode: JSON
15        maxRequestSize: 100000
16  authorization:
17    inlinePolicies:
18      - spec:
19          rules:
20            - effect: ALLOW
21              condition:
22                all:
23                  of:
24                    - match: ctx.request.http.bodyMap.jsonrpc == "2.0"
25                    - match: ctx.request.http.bodyMap.method == "tools/call"
26                    - match: ctx.request.http.bodyMap.params.name in ["add", "subtract"]
27                    - match: ctx.request.http.bodyMap.params.arguments.a < 1000
28                    - match: ctx.request.http.bodyMap.params.arguments.b > 1000

You are not restricted to attaching a Policy or an InlinePolicy to the Service. You can define your own Polices and InlinePolicies and attach them to specific Users representing specific MCP clients or hosts. You can also attach your Policies to certain Groups of Users (read more here) or Namespaces (read more here).

Visibility

Octelium provides OpenTelemetry-native, application-layer L7 aware visibility and access logging in real time (see an example for HTTP here) that includes capturing request/response serialized JSON body content (read more here). You can read more about visibility here. Here is an example of a JSON AccessLog (see a full example here) of a request highlighting the request and response details:

1{
2  "apiVersion": "core/v1",
3  "kind": "AccessLog",
4  "metadata": {
5    // Omitted for brevity
6  },
7  "entry": {
8      // Omitted for brevity
9    },
10    "info": {
11      "http": {
12        "request": {
13          "path": "/mcp",
14          "userAgent": "node",
15          "method": "POST",
16          "bodyBytes": 124,
17          "uri": "/mcp",
18          "bodyMap": {
19            "jsonrpc": "2.0",
20            "method": "tools/call",
21            "params": {
22              "name": "add",
23              "_meta": {
24                "progressToken": 3
25              },
26              "arguments": {
27                "a": 4,
28                "b": 7
29              }
30            },
31            "id": 3
32          }
33        },
34        "response": {
35          "code": 200,
36          "bodyBytes": 151,
37          "contentType": "text/event-stream"
38        },
39        "httpVersion": "HTTP11"
40      }
41    }
42  }
43}

Request and Response Manipulation

You can also use Octelium's plugins, currently mainly Lua scripts and Envoy's ExtProc, to sanitize your MCP request parameters, validate request's parameters using your own defined JSON schema, modify and manipulate responses (read more about HTTP plugins here). Here is an example using Lua:

1kind: Service
2metadata:
3  name: my-mcp
4spec:
5  port: 8080
6  mode: HTTP
7  isPublic: true
8  config:
9    upstream:
10      url: https://mcp.example.com
11    http:
12      plugins:
13        - name: check-inputs
14          condition:
15            match: ctx.request.http.path == "/mcp"
16          lua:
17            inline: |
18              function onRequest(ctx)
19                local body = json.decode(octelium.req.getRequestBody())
20                body.params.arguments.userUID = ctx.user.metadata.uid
21                body.params.arguments.userEmail = ctx.user.spec.email
22                if body.params.name == "another-add" then
23                  body.params.name = "add"
24                end
25
26                if body.params.arguments.a < 0 then
27                  body.params.arguments.a = 0
28                end
29
30                if body.params.arguments.b > 1000 then
31                  body.params.arguments.b = 1000
32                end
33
34                if body.params.name == "sub" then
35                  local c = http.client()
36                  c:setBaseURL("http://check-mcp.default.svc")
37                  local req = c:request()
38
39                  local schemaResp, err = req:post("/v1/get-json-schema")
40                  if err then
41                    octelium.req.exit(500)
42                    return
43                  end
44
45                  if schemaResp:code() == 200 then
46                    if not json.isSchemaValid(schemaResp:body(), json.encode(body.params)) then
47                      octelium.req.exit(400)
48                      return
49                    end
50                  end
51
52                  req:setBody(json.encode(body.params))
53                  local resp, err = req:post("/v1/check-subtract-params")
54                  if err then
55                    octelium.req.exit(500)
56                    return
57                  end
58
59                  if resp:code() == 200 then
60                    local apiResp = json.decode(resp:body())
61                    if not apiResp.userIsAllowed then
62                      octelium.req.exit(403)
63                      return
64                    end
65                  end
66                end
67
68                octelium.req.setRequestBody(json.encode(body))
69              end

Octelium also provides other plugins (read more about HTTP plugins here) such as rate limiting, caching, JSON schema validation and direct response that can be used to control your MCP client requests. Here is another example:

1kind: Service
2metadata:
3  name: my-mcp
4spec:
5  port: 8080
6  mode: HTTP
7  isPublic: true
8  config:
9    upstream:
10      url: https://mcp.example.com
11    http:
12      plugins:
13        - name: main-rate-limit
14          condition:
15            matchAny: true
16          ratelimit:
17            limit: 100
18            window:
19              minutes: 2
20        - name: validate-call
21          condition:
22            match: ctx.request.http.bodyMap.params == "sub"
23          jsonSchema:
24            inline: <YOUR_JSON_SCHEMA>

Here are a few more features that you might be interested in:

Request/response header manipulation (read more here).
Application layer-aware ABAC access control via policy-as-code using CEL and Open Policy Agent (read more here).
Exposing the API publicly for anonymous access (read more here).
OpenTelemetry-ready, application-layer L7 aware auditing and visibility (read more here).

On this Page

Request and Response Manipulation