In this guide we are going to try to provide a faithful vendor-neutral definition to the term "zero trust" and then we will try to explain that definition in a more simplified and detailed way.
Definition of Zero Trust
Zero trust is a security model that provides a collection of concepts whose main purpose is to prevent unauthorized access to data in an environment that is considered fundamentally hostile and potentially already compromised via enforcing least privileged access to data. Zero trust dismisses the idea of enforcing defenses or granting trust around static/wide network-based perimeters which has long been implemented by systems that follow the traditional perimeter-based security model such as remote access/corporate VPNs, and instead, trust is continuously evaluated and dynamically granted at the granular level of individual resources on a per-request basis via policies that are mainly based on the identity of the user/subject as well as the access context (e.g. time of the day, geo-location, application-layer specific information such as HTTP methods and paths, user/device attributes, and potentially information insights collected from external tools such SIEM or threat intelligence tools, etc...).
NIST's special publication 800-207 is widely considered the primary vendor-neutral guideline for organizations that seek to implement zero trust and it defines zero trust as follows:
Zero trust is a cybersecurity paradigm focused on resource protection and the premise that trust is never granted implicitly but must be continually evaluated. Zero trust architecture is an end-to end approach to enterprise resource and data security that encompasses identity (person and nonperson entities), credentials, access management, operations, endpoints, hosting environments, and the interconnecting infrastructure. The initial focus should be on restricting resources to those with a need to access and grant only the minimum privileges (e.g., read, write, delete) needed to perform the mission. Traditionally, agencies (and enterprise networks in general) have focused on perimeter defense and authenticated subjects are given authorized access to a broad collection of resources once on the internal network. As a result, unauthorized lateral movement within the environment has been one of the biggest challenges for federal agencies.
And then it gives a more formal and concise definition as follows:
A collection of concepts and ideas designed to minimize uncertainty in enforcing accurate, least privilege per-request access decisions in information systems and services in the face of a network viewed as compromised.
MIT Lincoln Laboratory provides a similar concise definition of zero trust as follows:
Zero trust is a set of security principles that treats every component, service and user of a system as continuously exposed to and potentially compromised by an adversary.
While the primary goal of zero trust is to protect resources via enforcing the principle of least privilege (PoLP) by making access control as granular and as dynamic as possible in order to shrink the attack surface as much as possible and consequently prevent lateral movement and unauthorized access, zero trust architectures usually also offer much clearer and deeper visibility and auditing capabilities, scalability and user experience compared to traditional VPNs especially as your system needs gets more and more complicated.
It's important to understand that zero trust is a security model as opposed to, for example, a well defined technical IEEE standard or IETF specification. It provides a set of concepts and paradigms that mainly involve authentication, authorization, networking, secret management, privileged access management among other disciplines yet it is not particularly concerned with the low level details of implementation. In fact, many traditional commercial remote access VPNs already apply a few concepts encouraged by the zero trust model without really being true zero trust architectures themselves.
It is also noteworthy to understand that zero trust is not restricted to just providing a more secure alternative to remote access VPNs since the zero trust model itself is not confined to protecting just internal and private resources but it can be generalized to protect data in general including protected public data such as SaaS APIs and databases.
Zero Trust: As Opposed to What?
In order to really understand what the zero trust security model is, we will first need to understand what is not. And in order to do that, we will have to understand the traditional security model which is typically called the perimeter-based security model. We are going to briefly discuss 3 cases: Creating your own remote access OpenVPN/WireGuard private network, then typical modern commercial remote access/corporate VPNs and lastly zero trust architectures. As we go from one case to the next, you will see that the conditions upon which access or "trust" is granted become more and more granular as well as dynamic.
DIY Virtual Private Network
When you set up own vanilla OpenVPN or WireGuard network to access your private resources, access control is pretty much non-existent beyond authentication via valid VPN credentials. Once users are authenticated via the VPN credentials and are inside the network, they can access any resource within the network "perimeter" as there is no authorization mechanism whatsoever inside that network. Moreover, the credentials required to access such perimeters (i.e. WireGuard or OpenVPN private keys) are typical long lived in the case of OpenVPN (i.e. typically months or even years) or have no expiration date whatsoever in the case of WireGuard. A malicious actor could obtain such long-lived and unrestricted credentials a few months or even years after they had been issued and use them to access the entire perimeter unhindered.
You can see that trust in this system is granted based on a single condition: simply having the VPN credentials. Trust in this case is all-or-nothing. Moreover, permissions are unlimited within the perimeter, not only in terms of scope but also in terms of context (e.g. time, location, etc...) allowing unconditional access to data and lateral movement across resources to whoever is inside the perimeter.
Modern Remote Access VPNs
In a typical modern commercial corporate/remote access VPN, trust is much more restricted compared to the case above. Identity management is usually introduced to the system where users have to authenticate themselves using identity providers that support OpenID Connect or SAML 2.0 and optionally 2FA/MFA methods to obtain hopefully much shorter lived credentials compared to the case above. Access control is tightened by introducing authorization in the system at layer-3 using network segmentation where users are allowed to only access certain smaller perimeters (e.g. subnets, IPs, etc...) usually according to their roles via a role-based access control (RBAC) system.
What is Wrong with VPNs?
While many modern remote access VPNs now offer identity-based access control that limits trust within the system to certain smaller perimeters compared to the all-or-nothing behavior in the case above, the problem with VPNs in general stems from the simple fact that VPNs operate and control access at the network-layer or layer-3. As will be discussed below, the network-layer is simply too low level and too loose not only to force fine-grained access control in order to prevent unauthorized access and lateral movement but also to provide clear and meaningful visibility and auditing.
Identification of "Perimeters" and Resources
In order to prevent unauthorized access to data provided by resources (e.g. HTTP-based APIs, databases, etc...), we have to first identify what those resources are in order to protect them by enforcing access control around them. The real problem stems from the fact that network perimeters and resources are no longer static entities that can be easily identified, isolated and protected at layer-3. Your perimeter is no longer a static single private network like it used to be many years ago as your resources are becoming more and more scattered across many environments. More importantly, resources have become more and more dynamic and are no longer identified by a single static private IP address like they used to. You now have microservices are usually created, scaled up or down on demand, sometimes automatically without any human intervention by platforms such as Kubernetes as well as cloud vendors. Lastly, you have protected public SaaS resources by various SaaS vendors where the concept network perimeters completely falls apart.
Granularity of Access Control
Since VPNs operate at layer-3, they can only control access at the network-layer. As mentioned before, the network-layer is almost never suitable to identify individual resources, therefore VPNs can, at best, control access around small enough perimeters (i.e. small network subnets or sometimes certain IP addresses) that hopefully represent one or a few static resources in order to limit lateral movement in case of an unauthorized access. Moreover, the lack application-layer awareness in VPNs make them unable to extend access control beyond allowing or denying IP packets to actually control access at the application layer through decoding and understand layer-7 specific requests (e.g. controlling access for certain HTTP request paths and methods, certain SSH users, certain database commands and queries, etc... ).
Visibility and Auditing
Since VPNs operate at layer-3, there is very little they can do to provide clear visibility and auditing since as discussed above as they cannot clearly identify resources. VPNs are merely capable of capturing layer-3/layer-4 information such as source and destination IP addresses and ports. Even if the source IP address are tied to the user identity, it's very hard to make use of destination IP addresses especially in dynamic environments where the IP addresses of resources are not static. Moreover, the lack of application-layer awareness means that VPNs can only understand IP packets as opposed to, for example, HTTP requests, database commands or SSH sessions. This lack of clear request and connection information leads meaningless and cumbersome-to-manage audit logs especially at scale.
Application-layer Credentials
VPNs are great at connecting two remote endpoints to one another through a secure channel. However, that by itself is still a single part of the whole process of providing secure access to a resource. Many internal resources such as SSH servers, databases, APIs require their own application-layer specific credentials in order to authenticate and authorize the user and grant them access. In order to do that, you will have to use additional tools such as secret managers and vaults to store such credentials or use protocols such as Kerberos. Such techniques not only negatively impact user experience since they represent an additional step to obtain the resource-specific credential, which by itself can still be too loose in terms of permissions and/or time limits, but they might actually increase the attack surface since the credentials obtained by these tools might still be mismanaged, stolen, stored in public git repos, etc...
SaaS Resources
Many internal resources that used to be served on-prem and private clouds have been delegated to the public clouds and SaaS services throughout the last decade. This makes the notion of a "network perimeter" even more vague if not pointless. Such protected public resources are completely out of the direct control of remote access VPNs and yet they still need their own identity management, access control, credential management and visibility exactly like the internal resources that are being accessed through remote access VPNs.
Traditional Networking-related Problems
The problems with VPNs do not just end with the inherent flaws related to access control and visibility. You still got decades-old networking related problems that can only get worse as your system becomes more and more complex. Some examples:
-
Routing conflicts between overlapping remote networks are inevitable since private networks mostly use the same IPv4 ranges (i.e.
10.0.0.0/8
,172.16.0.0/12
and192.168.0.0/16
). In a multi-cloud system, sooner or later these remote networks' ranges will be in conflict with one another at the client-side routing table. Not to mention that any of these ranges could be in conflict with the client-side routing table itself. -
Trying to make IPv6 downstreams talk to IPv4 upstreams or vice versa would require using NAT64 and DNS64 whose implementations are inconsistent OS-specific pieces of software.
-
Managing a centralized DNS that governs all resources scattered across the different remote networks is a huge undertaking that can only get worse with having to deal with dynamic resources whose endpoints are constantly changing and outside your direct control by environments (e.g. cloud providers, Kubernetes clusters, etc...) that are naturally unaware of the existence of the remote access VPN in the first place.
Zero Trust Security Model
As mentioned above, the zero trust security model starts with one simple assumption: the environment in which data is being accessed is entirely hostile and cannot be trusted. Therefore, the idea of enforcing access control using static defenses around network perimeters is dropped altogether, and instead, access control is enforced at the level of individual resources where trust is continuously evaluated on a per-request basis and dynamically granted mainly based on the identity of the user as well as the access context. A zero trust architecture (ZTA) must have the following 2 main characteristics:
- Access control is enforced around resources rather than perimeters Perimeter-based security is not sufficient since once the perimeter is breached, trust is implicit and lateral movement is unhindered to a broad and unclear group of resources inside it. The problems is aggravated by the fact that today's dynamic networking includes more clouds, microservices and SaaS-based resources where perimeters can no longer be clearly identified and isolated.
- Trust is continuously evaluated and dynamically granted on a per-request basis Trust is not static but rather continuously evaluated on every single access request via dynamic policies that decide whether the access request is allowed or denied based on the identity of the requestor/user. However, identity, while being at the center of the evaluation of such policies, might still not be sufficient to prevent unauthorized access in case of a credential theft or misuse within the wrong context. Thus, access control should be made more fined grained and dynamic by taking as much contextual information as possible into account in order to truly prevent unauthorized access.
Zero Trust Architectures
Probably the most popular ZTA known today is Google's BeyondCorp. This architecture was publicly introduced in 2014 and it has somewhat become synonymous with the term zero trust. This architecture is a great manifestation of the philosophy of zero trust as it assumes that the network perimeter is as hostile as the internet itself. Therefore, applying the above characteristics to any internal resource allows such resource to be safely exposed to the internet. BeyondCorp is usually used for web-based resources so that authenticated and authorized users can access internal resources as if they were typical public web resources that require authentication enforced by identity providers such as Okta and Auth0.
Octelium
Octelium is a modern, self-hosted, free and open source, scalable platform for zero trust network access. Octelium is built from the ground up to control access at the resource-level using identity-aware proxies as opposed to at the network level (layer-3) as is the case in traditional remote access VPNs. Octelium is a multi-mode zero trust architecture (ZTA) that simultaneously enables both humans and workloads to access protected resources using both known zero trust network access architectures: privately using the client-based ZTNA mode over WireGuard/QUIC tunnels which can be used for all kinds of resources like in VPNs, as well as publicly using the client-less BeyondCorp architecture. Unlike many ZTAs, Octelium is designed to be generic enough to be used as a unified Zero Trust Network Access (ZTNA) platform/ BeyondCorp architecture for humans and workloads, a modern L-7 aware zero-config VPN, a self-hosted secure tunnels and reverse proxy infrastructure, a PaaS-like hosting/deployment platform for both secure access as well as anonymous public access, a secure API gateway, an AI gateway to any AI LLM providers as well as a personal infrastructure for a homelab. You can read more about how Octelium works in details here.
Further Readings
This guide is meant to provide a concise overview about the zero trust model. The concepts of zero trust have evolved since the term first appeared in 2004. You're advised to read more about the subject from different sources to get a clear and unbiased idea about what zero trust is all about.
- Zero Trust Architecture by NIST.
- BeyondCorp: A New Approach to Enterprise Security by Google.
- Zero Trust Maturity Model by CISA.