What is Zero Trust?

Overview

What is Zero Trust?

This guide attempts to provide a faithful vendor-neutral definition to the term "zero trust" and then proceeds to explain that definition in a more simplified and detailed way.

Definition of Zero Trust

Zero trust is a security model that provides a collection of concepts whose main purpose is to prevent unauthorized access to data in an environment that is considered fundamentally hostile and potentially already compromised by enforcing least privileged access to data. Zero trust dismisses the idea of enforcing defenses or granting trust around static/wide network-based perimeters, which have long been implemented by systems that follow the traditional perimeter-based security model such as remote access/corporate VPNs, and instead, trust is continuously evaluated and dynamically granted at the granular level of individual resources on a per-request basis via policies that are mainly based on the identity of the user/subject as well as the access context (e.g. time of the day, geo-location, application-layer specific information such as HTTP methods and paths, user/device attributes, and potentially information insights collected from external tools such as SIEM or threat intelligence tools, etc...).

NIST's special publication 800-207 is widely considered the primary vendor-neutral guideline for organizations that seek to implement zero trust and it defines zero trust as follows:

Zero trust is a cybersecurity paradigm focused on resource protection and the premise that trust is never granted implicitly but must be continually evaluated. Zero trust architecture is an end-to-end approach to enterprise resource and data security that encompasses identity (person and nonperson entities), credentials, access management, operations, endpoints, hosting environments, and the interconnecting infrastructure. The initial focus should be on restricting resources to those with a need to access and grant only the minimum privileges (e.g., read, write, delete) needed to perform the mission. Traditionally, agencies (and enterprise networks in general) have focused on perimeter defense and authenticated subjects are given authorized access to a broad collection of resources once on the internal network. As a result, unauthorized lateral movement within the environment has been one of the biggest challenges for federal agencies.

And then it gives a more formal and concise definition as follows:

A collection of concepts and ideas designed to minimize uncertainty in enforcing accurate, least privilege per-request access decisions in information systems and services in the face of a network viewed as compromised.

MIT Lincoln Laboratory provides a similar concise definition of zero trust as follows:

Zero trust is a set of security principles that treats every component, service and user of a system as continuously exposed to and potentially compromised by an adversary.

While the primary goal of zero trust is to protect resources via enforcing the principle of least privilege (PoLP) by making access control as granular and as dynamic as possible in order to shrink the attack surface as much as possible and consequently prevent lateral movement and unauthorized access, zero trust architectures (ZTAs) usually also offer much clearer and deeper visibility and auditing capabilities, scalability and user experience compared to traditional VPNs especially as your system grows more complicated.

It's important to understand that zero trust is a security model as opposed to, for example, a well-defined technical IEEE standard or IETF specification. It provides a set of concepts and paradigms that mainly involve authentication, authorization, networking, secret management, privileged access management among other disciplines; however, it is not particularly concerned with the low level details of implementation. In fact, many traditional commercial remote access VPNs already apply a few concepts encouraged by the zero trust model without really being truly zero trust architectures (ZTAs) themselves.

It is also noteworthy that zero trust is not restricted to just providing a more secure alternative to remote access VPNs since the zero trust model itself is not confined to protecting just internal and private resources but it can be generalized to protect data in general including protected public data such as SaaS APIs and databases.

Zero Trust: As Opposed to What?

In order to really understand what the zero trust security model is, we will first need to understand what it is not. And in order to do that, we will have to understand the traditional security model which is typically called the perimeter-based security model. We are going to briefly discuss 3 cases: Creating your own remote access OpenVPN/WireGuard private network, then the typical modern commercial remote access/corporate VPNs and lastly, zero trust architectures (ZTAs). As we go from one case to the next, you will see that the conditions upon which access or "trust" is granted become more and more granular as well as dynamic.

DIY Virtual Private Network

When you set up your own vanilla OpenVPN or WireGuard network to access your private resources, access control is pretty much non-existent beyond authentication through valid VPN credentials. Once users are authenticated via the VPN credentials and are inside the network, they can access any resource within the network "perimeter" as there is no authorization mechanism within that network whatsoever. Moreover, the credentials required to access such perimeters (i.e. WireGuard or OpenVPN private keys) are typically long-lived in the case of OpenVPN (i.e. typically months or even years) or have no expiration date whatsoever in the case of WireGuard. A malicious actor could obtain such long-lived and unrestricted credentials a few months or even years after they were issued and use them to access the entire perimeter unhindered.

You can see that trust in this system is granted based on a single condition: simply having the VPN credentials. Trust in this case is all-or-nothing. Moreover, permissions are unlimited within the perimeter, not only in terms of scope but also in terms of context (e.g. time, location, etc...) allowing unconditional access to data and unrestrained lateral movement across resources to whoever is inside the perimeter.

Modern Remote Access VPNs

In a typical modern commercial corporate/remote access VPN, trust is much more restricted compared to the case above. Identity management is usually introduced to the system where users have to authenticate themselves using identity providers that support OpenID Connect or SAML 2.0 and optionally 2FA/MFA methods to obtain hopefully much shorter-lived credentials compared to the case above. Access control is tightened by introducing authorization in the system at layer-3 using network segmentation where users are allowed to only access certain smaller perimeters (e.g. subnets, IPs, etc...) usually according to their roles through a role-based access control (RBAC) system.

What is Wrong with VPNs?

While many modern remote access VPNs now offer identity-based access control that limits trust within the system to certain smaller perimeters compared to the all-or-nothing behavior in the case above, the problem with VPNs in general stems from the simple fact that VPNs operate and control access at the network-layer or layer-3. As will be discussed below, the network-layer is simply too low level and too loose not only to force fine-grained access control in order to prevent unauthorized access and lateral movement but also to provide clear and meaningful visibility and auditing.

Identification of "Perimeters" and Resources

In order to prevent unauthorized access to data provided by resources (e.g. HTTP-based APIs, databases, etc...), we first have to identify what those resources are in order to protect them by enforcing access control around them. The real problem stems from the fact that network perimeters and resources are no longer static entities that can be easily identified, isolated and protected at layer-3. Your perimeter is no longer a static single private network like it used to be many years ago as your resources are becoming more and more scattered across many environments. More importantly, resources have become more and more dynamic and are no longer identified by a single, static private IP address like they used to. You now have microservices that are usually created, scaled up or down on demand, sometimes automatically without any human intervention by platforms such as Kubernetes as well as cloud vendors. Lastly, you have protected public SaaS resources from various SaaS vendors where the concept network perimeters completely falls apart.

Granularity of Access Control

Since VPNs operate at layer-3, they can only control access at the network-layer. As mentioned before, the network-layer is almost never suitable to identify individual resources, therefore VPNs can, at best, control access around small enough perimeters (i.e. small network subnets or sometimes certain IP addresses) that hopefully represent one or a few static resources in order to limit lateral movement in case of an unauthorized access. Moreover, the lack of application-layer awareness in VPNs makes them unable to extend access control beyond allowing or denying IP packets to actually control access at the application layer through decoding and understand layer-7 specific requests (e.g. controlling access for certain HTTP request paths and methods, certain SSH users, certain database commands and queries, etc... ).

Visibility and Auditing

Since VPNs operate at layer-3, there is very little they can do to provide clear visibility and auditing since, as discussed above, they cannot clearly identify resources. VPNs are merely capable of capturing layer-3/layer-4 information such as source and destination IP addresses and ports. Even if the source IP address is tied to the user identity, it's very hard to make use of destination IP addresses especially in dynamic environments where the IP addresses of resources are not static. Moreover, the lack of application-layer awareness means that VPNs can only understand IP packets as opposed to, for example, HTTP requests, database commands or SSH sessions. This lack of clear request and connection information leads to meaningless and difficult to manage audit logs especially at scale.

Application-layer Credentials

VPNs are great at connecting two remote endpoints to one another through a secure channel. However, that by itself is still a single aspect of the whole process of providing secure access to a resource. Many internal resources such as SSH servers, databases, APIs require their own application-layer specific credentials in order to authenticate and authorize the user and grant them access. In order to do that, you will have to use additional tools, such as secret managers and vaults, to store such credentials or use protocols such as Kerberos. Such techniques not only negatively impact user experience since they represent an additional step to obtain the resource-specific credential, which by itself can still be too loose in terms of permissions and/or time limits, but they might actually increase the attack surface since the credentials obtained by these tools might still be mismanaged, stolen, stored in public Git repositories, etc...

SaaS Resources

Many internal resources that used to be served on-premise and private clouds have been delegated to the public clouds and SaaS services throughout the last decade. This makes the notion of a "network perimeter" even more vague if not pointless. Such protected public resources are completely out of the direct control of remote access VPNs and yet they still need their own identity management, access control, credential management and visibility, exactly like the internal resources that are being accessed through remote access VPNs.

The problems with VPNs do not just end with the inherent flaws related to access control and visibility. There are still decades-old networking related problems that can only get worse as your system becomes more and more complex. Some examples:

Routing conflicts between overlapping remote networks are inevitable since private networks mostly use the same IPv4 ranges (i.e. 10.0.0.0/8, 172.16.0.0/12 and 192.168.0.0/16). In a multi-cloud system, sooner or later these remote networks' ranges will be in conflict with one another at the client-side routing table. Not to mention that any of these ranges could be in conflict with the client-side routing table itself.
Trying to make IPv6 downstreams communicate to IPv4 upstreams or vice versa would require using NAT64 and DNS64 whose implementations are inconsistent OS-specific pieces of software.
Managing a centralized DNS that governs all resources scattered across the different remote networks is a huge undertaking that can only get worse with having to deal with dynamic resources whose endpoints are constantly changing and outside your direct control by environments (e.g. cloud providers, Kubernetes clusters, etc...) that are naturally unaware of the existence of the remote access VPN in the first place.

Zero Trust Security Model

As mentioned above, the zero trust security model starts with one simple assumption: the environment in which data is being accessed is entirely hostile and cannot be trusted. Therefore, the idea of enforcing access control using static defenses around network perimeters is dropped altogether, and instead, access control is enforced at the level of individual resources where trust is continuously evaluated on a per-request basis and dynamically granted primarily based on the identity of the user as well as the access context. A zero trust architecture (ZTA) must have the following 2 main characteristics:

Access control is enforced around resources rather than perimeters Perimeter-based security is not sufficient since once the perimeter is breached, trust is implicit and lateral movement is unhindered to a broad and unclear group of resources inside it. The problem is aggravated by the fact that today's dynamic networking includes more clouds, microservices and SaaS resources where perimeters can no longer be clearly identified and isolated.
Trust is continuously evaluated and dynamically granted on a per-request basis Trust is not static but rather continuously evaluated on every single access request via dynamic policies that decide whether the access request is allowed or denied based on the identity of the requestor/user. However, identity, while being at the center of the evaluation of such policies, might still not be sufficient to prevent unauthorized access in case of credential theft or misuse within the wrong context. Thus, access control should be made more fine-grained and dynamic by taking as much contextual information as possible into account in order to truly prevent unauthorized access.

Zero Trust Architectures

Perhaps the most popular ZTA known today is Google's BeyondCorp. This architecture was publicly introduced in 2014 and has somewhat become synonymous with the term zero trust. This architecture is a great manifestation of the philosophy of zero trust as it assumes that the network perimeter is as hostile as the internet itself. Therefore, applying the above characteristics to any internal resource allows that resource to be safely exposed to the internet. BeyondCorp is usually used for web-based resources so that authenticated and authorized users can access internal resources as if they were typical public web resources that require authentication enforced by identity providers such as Okta and Auth0.

Octelium

Octelium is a modern, self-hosted, free and open source, scalable platform for zero trust network access. Octelium is built from the ground up to control access at the resource-level using identity-aware proxies as opposed to at the network level (layer-3) as is the case in traditional remote access VPNs. Octelium is a multi-mode zero trust architecture (ZTA) that simultaneously enables both humans and workloads to access protected resources using the both known zero trust network access architectures: privately using the client-based ZTNA mode over WireGuard/QUIC tunnels which can be used for all kinds of resources as in VPNs, as well as publicly using the clientless BeyondCorp architecture. Unlike many ZTAs, Octelium is designed to be generic enough to be used as a unified Zero Trust Network Access (ZTNA) platform/ BeyondCorp architecture for humans and workloads, a modern L7 aware zero-config VPN, a self-hosted secure tunnels and reverse proxy infrastructure, a PaaS-like hosting/deployment platform for both secure access as well as anonymous public access, a secure API gateway, an AI gateway to any AI LLM providers as well as a personal infrastructure for a homelab. You can read more about how Octelium works in details here.