Today, containers are the preferred approach to deploy software or create build environments in CI/CD lifecycles. However, since the emergence of container solutions and environments like Docker and Kubernetes, security researchers have consistently found ways to escape from containers once they are compromised. Most attacks are based on configuration errors. But it is also possible to escalate privileges and escape to the container’s host system by exploiting vulnerabilities in the host’s operating system.
This blog shows how to modify an existing Linux kernel exploit in order to use it for container escapes and how the CrowdStrike Falcon® platform can help to prevent and hunt for similar threats.
Before we outline the modifications required to turn the exploit into a container escape, we first look at what the original exploit achieved.
Valentina Palmiotti published a full exploit for CVE-2021-3490 that can be used to locally escalate privileges to root on affected systems. The vulnerability was rooted in the eBPF subsystem of the Linux kernel and fixed in version 5.10.37. eBPF allows user space processes to load custom programs into the kernel and attach them to so-called events, thus giving user space the ability to observe kernel internals and, in specifically supported cases, to implement custom logic for networking, access control and other tasks. These eBPF programs have to pass a verifier before being loaded, which is supposed to guarantee that the code does not contain loops and does not write to memory outside of its dedicated area. This step should ensure that eBPF programs terminate and are not able to manipulate kernel memory, which would potentially allow attackers to escalate privileges. However, this verifier contained several vulnerabilities in the past. CVE-2021-3490 is one of them and can ultimately be used to achieve a kernel read and write primitive.
Building on the kernel read primitive, it is possible to leak a kernel pointer. eBPF programs can communicate with processes running in user space using so-called “eBPF maps.” Every eBPF map is described by a
The kernel exports pointers to certain variables, objects and functions in a symbol table to make them accessible by kernel modules. This table is called
One clarification has to be made about the above code excerpt: There are two different formats of
Nevertheless, using this technique, the exploit identifies the address of
Namespaces have become a fundamental feature of Linux and are crucial to the idea of container environments. They allow separating system resources between different processes such that one process can observe a completely different set than others. For example, mount namespaces control observable filesystem mount points such that two processes can have different views of the filesystem. This allows a container’s filesystem to have a different root directory than the host. Process ID namespaces on the other side give processes a completely unique process tree. The first process in a process ID namespace always has the identifier (PID) 1. It is considered as the
However, this approach does not work if a container was compromised and the attacker’s goal is to escape into the container’s host environment.
Why This Doesn’t Work in Containers
Linux kernel exploits are an alternative method to escape container environments to the host in case no mistakes in the container configuration were made. They can be used because containers share the host’s kernel and therefore its vulnerabilities, regardless of the Linux distribution the container is based on. However, exploit developers have to pay attention to some obstacles compared to privilege escalation outside of container environments.
First, container solutions are able to restrict the capabilities of processes running inside a container. For example, the capability
Second, on a more practical note, the techniques of the original exploit described above will not work out-of-the-box. As already described, containers rely heavily on namespaces. Because containers typically have their own associated process ID namespace, it is not as straightforward to identify the exploit process running in the container by its PID, because, for example, the exploit may have PID 42 from the container’s perspective but PID 1337 from the host’s perspective. However, the parent namespace can still observe all processes running in child namespaces. Therefore, those processes have a PID in both parent and child namespace. Ultimately, the initial process ID namespace described by
Changes for Container Escapes
It is possible to modify the exploit so that a container escape is conducted and privileges are escalated to
Using this technique it is possible to identify the correct
This allows the attacker to overwrite the correct
One last addition must be made. As stated above, containers normally have limited capabilities. Capabilities are used to restrict the permissions of processes running in containers. To obtain full privileges, the exploit also has to overwrite the capabilities mask of the exploit’s process in the
The technique described in this blog to identify the
Container Escape Mitigations
Detecting this and similar exploits is very hard as they are data-only and misuse only legitimate system calls. The CrowdStrike Falcon platform can assist in preventing attacks using similar techniques for privilege escalation. As a defense-in-depth strategy, the following steps can be taken to harden Linux hosts and container environments to prevent exploitation of CVE-2021-3490 and future attacks.
- Upgrade the kernel version. With a critical kernel vulnerability like CVE-2021-3490, it is paramount that available fixes are applied by upgrading the kernel version.
- Provide only required capabilities to the container. By limiting the capabilities of the container, the root account of the container becomes limited in its capabilities, which significantly reduces the chances of container escape and exploitation of kernel vulnerabilities. For example, to exploit the CVE-2021-3490 using the described technique, the attacker needs CAP_BPF or CAP_SYS_ADMIN granted. Note that privileged containers have those capabilities. Therefore, you should monitor your environment for such containers with CrowdStrike Falcon® Cloud Workload Protection (CWP), as discussed in point 4 below.
- Use a seccomp profile. While Kubernetes does not apply a seccomp profile without configuration, Docker’s default seccomp profiles protect against a number of dangerous system calls that can help attackers to break out of the container environment. Correct Seccomp profiles can help significantly reduce the container attack surface. CVE-2021-3490 requires the
system call to exploit the vulnerability, which is blocked in Docker’s default seccomp profile. Hence, exploitation of CVE-2021-3490 in a container environment using a strong seccomp profile would fail.bpf
- Monitor host and containerized environment for a breach. In case a privileged workload or a host is compromised by attackers, the organization needs state-of-the-art monitoring and detection capabilities to prevent and detect advanced persistent threats (APTs), eCrime and nation-state actors. CrowdStrike can help with this. Falcon Cloud Workload Protection identifies any indicators of misconfiguration (IOMs) in your containerized environment to uncover a weakness. Falcon Cloud Workload Protection prevents and detects malicious activity on your host and containers to prevent and detect — in real time — breaches by eCrime and nation-state adversaries. For example, if a privileged container or a container without a seccomp profile is executed, the following notifications would appear:
Also, Falcon CWP helps to hunt for threats using the eBPF subsystem to escalate privileges by logging if the
Container technology is a good solution to separate and fine-tune resources to different processes. However, while existing solutions add another layer of security due to the restriction of capabilities and available syscalls, the available attack surface inside a container still contains the host’s kernel. Every eased restriction — for example, allowing the use of eBPF — will increase the attack surface. If a threat actor is able to take advantage of a vulnerability inside the host’s kernel and an exploit is available, the host can be compromised, regardless of other security layers and restrictions such as namespaces.
This blog showed exactly that: Not much effort is needed to turn a full exploit chain for a local privilege escalation into one that is able to escape containers as well. The basic rules of network hygiene (patch early and often) not only apply to containers but to the hosts that deploy those in a cloud environment as well. Moreover, solutions such as Docker and Kubernetes can reduce the attack surface drastically if configured properly. CrowdStrike Falcon Cloud Workload Protectioncan assist in identifying and hunting for weaknesses in the deployed configuration that could lead to a compromise.
- Request a free trial of the industry-leading CrowdStrike Falcon platform.
- Read about adversaries tracked by CrowdStrike in 2021 in the 2022 CrowdStrike Global Threat Report and in the 2022 Falcon OverWatch Threat Hunting Report.
- Request a free CrowdStrike Intelligence threat briefing and learn how to stop adversaries targeting your organization.
- Watch an introductory video on the CrowdStrike Falcon console and register for an on-demand demo of the market-leading CrowdStrike Falcon platform in action.