Octelium enables you to seamlessly provide secure access to Ollama via both the private client-based mode over WireGuard/QUIC as well as the public client-less secure access (read more about the client-less BeyondCorp access mode here) for lightweight open models such as Google's Gemma 3, DeepSeek R1, Meta's Llama, etc....
Here is a simple example where you can seamlessly deploy an Ollama server as a managed container and serve it as an Octelium Service (read more about managed containers here):
1kind: Service2metadata:3name: ollama4spec:5port: 114346mode: HTTP7isPublic: true8config:9upstream:10container:11port: 1143412image: ollama/ollama13resourceLimit:14cpu:15millicores: 300016memory:17megabytes: 4000
The above configuration is simply a CPU-only mode. If your underlying Kubernetes installation supports requesting and scheduling GPUs (read more here), you can modify the above the configuration to be as follows:
1kind: Service2metadata:3name: ollama4spec:5port: 114346mode: HTTP7isPublic: true8config:9upstream:10container:11port: 1143412image: ollama/ollama13resourceLimit:14cpu:15millicores: 300016memory:17megabytes: 400018ext:19# Change this according to your Kubernetes cluster available values20nvidia.com/gpu: "1"
You can now apply the creation of the Service as follows (read more here):
octeliumctl apply /PATH/TO/SERVICE.YAML
You can also serve an Ollama server that's served by a connected User (read more here) as follows:
1kind: Service2metadata:3name: ollama4spec:5port: 114346mode: HTTP7isPublic: true8config:9upstream:10url: http://localhost:1143411user: ollama-server
To inform the Ollama client (you can download the client here) with the address of our Service running the Ollama server, you need to set the OLLAMA_HOST
environment variable to the address of the Service. For client-based access (read more about connecting to the Cluster here), you need to set the environment variable as follows:
export OLLAMA_HOST=ollama:11434
Now, from your machine, you can run a model like Gemma3 as follows:
ollama run gemma3:1b
You can also access Ollama via the client-less BeyondCorp mode from within your applications written in programming language without having to use the octelium
client or any special SDK. For example, you can use the OAuth2 client credential mode (read more here) and use its bearer access token as in the following Python code example:
1from ollama import Client23client = Client(4host='https://ollama.<DOMAIN>',5headers={'authorization': 'Bearer XXXX'}6)78response = client.chat(model='llama3.2', messages=[9{10'role': 'user',11'content': 'What is Octelium?',12},13])
Alternatively to the OAuth2 client credentials flow Credential, you can also generate an access token Credential and use it directly as a bearer token. Read more here.
Here are a few more features that you might be interested in: