Exploring the Exploitability of “Bad Neighbor”: The Recent ICMPv6 Vulnerability (CVE-2020-16898)

Exploring the Exploitability of “Bad Neighbor”: The Recent ICMPv6 Vulnerability (CVE-2020-16898)

Original text by FOLLOW ZECOPS

At the Patch Tuesday on October 13, Microsoft published a patch and an advisory for CVE-2020-16898, dubbed “Bad Neighbor”, which was undoubtedly the highlight of the monthly series of patches. The bug has received a lot of attention since it was published as an RCE vulnerability, meaning that with a successful exploitation it could be made wormable. Initially, it was graded with a high CVSS score of 9.8/10, though it was later lowered to 8.8.

In days following the publication, several write-ups and POCs were published. We looked at some of them:

The writeup by pi3 contains details that are not mentioned in the writeup by Quarkslab. It’s important to note that the bug can only be exploited when the source address is a link-local address. That’s a significant limitation, meaning that the bug cannot be exploited over the internet. In any case, both writeups explain the bug in general and then dive into triggering a buffer overflow, causing a system crash, without exploring other options.

We wanted to find out whether something else could be done with this vulnerability, aside from triggering the buffer overflow and causing a blue screen (BSOD)

In this writeup, we’ll share our findings.

The bug in a nutshell

The bug happens in the tcpip!Ipv6pHandleRouterAdvertisement function, which is responsible for handling incoming ICMPv6 packets of the type Router Advertisement (part of the Neighbor Discovery Protocol).

The packet structure is (RFC 4861):

As can be seen from the packet structure, the packet consists of a 16-bytes header, followed by a variable amount of option structures. Each option structure begins with a type field and a length field, followed by specific fields for the relevant option type.

The bug happens due to an incorrect handling of the Recursive DNS Server Option (type 25, RFC 5006):

The Length field defines the length of the option in units of 8 bytes. The option header size is 8 bytes, and each IPv6 address adds additional 16 bytes to the length. That means that if the structure contains n IPv6 addresses, the length is supposed to be set to 1+2*n. The bug happens when the length is an even number, causing the code to incorrectly interpret the beginning of the next option structure.

Visualizing the POC of 0xeb-bp

As a starting point, let’s visualize 0xeb-bp’s POC and get some intuition about what’s going on and why it causes a stack overflow. Here is the ICMPv6 packet as constructed in the source code:

As you can see, the ICMPv6 packet is followed by two Recursive DNS Server options (type 25), and then a 256-bytes buffer. The two options have an even length of 4, which triggers the bug.

The tcpip!Ipv6pHandleRouterAdvertisement function that parses the packet does two iterations over the option structures. The first iteration does simple checks such as verifying the length field of the structures. The second iteration actually parses the option structures. Because of the bug, each iteration interprets the packet differently.

Here’s how the first iteration sees the packet:

Each option structure is just skipped according to the length field after doing some basic checks.

Here’s how the second iteration sees it:

This time, in the case of a Recursive DNS Server option, the length field is used to determine the amount of IPv6 addresses, which is calculated as following:

amount_of_addr = (length – 1) / 2

Then, the IPv6 addresses are processed, and the next iteration continues after the last processed IPv6 address, which, in case of an even length value, happens to be in the middle of the option structure compared to what the first iteration sees. This results in processing an option structure which wasn’t validated in the first iteration. 

Specifically in this POC, 34 is not a valid length for option of the type 24, but because it wasn’t validated, the processing continues and too many bytes are copied on the stack, causing a stack overflow. Noteworthy, fragmentation is required for triggering the stack overflow (see the Quarkslab writeup for details).

Zooming out

Now we know how to trigger a stack overflow using CVE-2020-16898, but what are the checks that are made in each of the mentioned iterations? What other checks, aside from the length check, can we bypass using this bug? Which option types are supported, and is the handling different for each of them? 

We didn’t find answers to these questions in any writeup, so we checked it ourselves.

Here are the relevant parts of the Ipv6pHandleRouterAdvertisement function, slightly simplified:


if (!IsLinkLocalAddress(SrcAddress) && !IsLoopbackAddress(SrcAddress))
    // error

// Initialization and other code...

NET_BUFFER NetBuffer = /* ... */;

// First loop
while (NetBuffer->DataLength >= 2)
{
    BYTE TempTypeLen[2];
    BYTE* TempTypeLenPtr = NdisGetDataBuffer(NetBuffer, 2, TempTypeLen, 1, 0);
    WORD OptionLenInBytes = TempTypeLenPtr[1] * 8;
    if (OptionLenInBytes == 0 || OptionLenInBytes > NetBuffer->DataLength)
        // error

    BYTE OptionType = TempTypeLenPtr[0];
    switch (OptionType)
    {
    case 1: // Source Link-layer Address
        // ...
        break;

    case 3: // Prefix Information
        if (OptionLenInBytes != 0x20)
            // error

        BYTE TempPrefixInfo[0x20];
        BYTE* TempPrefixInfoPtr = NdisGetDataBuffer(NetBuffer, 0x20, TempPrefixInfo, 1, 0);
        BYTE PrefixInfoPrefixLength = TempRouteInfoPtr[2];
        if (PrefixInfoPrefixLength > 128)
            // error
        break;

    case 5: // MTU
        // ...
        break;

    case 24: // Route Information Option
        if (OptionLenInBytes > 0x18)
            // error

        BYTE TempRouteInfo[0x18];
        BYTE* TempRouteInfoPtr = NdisGetDataBuffer(NetBuffer, 0x18, TempRouteInfo, 1, 0);
        BYTE RouteInfoPrefixLength = TempRouteInfoPtr[2];
        if (RouteInfoPrefixLength > 128 ||
            (RouteInfoPrefixLength > 64 && OptionLenInBytes < 0x18) ||
            (RouteInfoPrefixLength > 0 && OptionLenInBytes < 0x10))
            // error
        break;

    case 25: // Recursive DNS Server Option
        if (OptionLenInBytes < 0x18)
            // error

        // Added after the patch - this it the fix
        //if (OptionLenInBytes - 8 % 16 != 0)
        //    // error
        break;

    case 31: // DNS Search List Option
        if (OptionLenInBytes < 0x10)
            // error
        break;
    }

    NetBuffer->DataOffset += OptionLenInBytes;
    NetBuffer->DataLength -= OptionLenInBytes;
    // Other adjustments for NetBuffer...
}

// Rewind NetBuffer and do other stuff...

// Second loop...
while (NetBuffer->DataLength >= 2)
{
    BYTE TempTypeLen[2];
    BYTE* TempTypeLenPtr = NdisGetDataBuffer(NetBuffer, 2, TempTypeLen, 1, 0);
    WORD OptionLenInBytes = TempTypeLenPtr[1] * 8;
    if (OptionLenInBytes == 0 || OptionLenInBytes > NetBuffer->DataLength)
        // error

    BOOL AdvanceBuffer = TRUE;

    BYTE OptionType = TempTypeLenPtr[0];
    switch (OptionType)
    {
    case 3: // Prefix Information
        BYTE TempPrefixInfo[0x20];
        BYTE* TempPrefixInfoPtr = NdisGetDataBuffer(NetBuffer, 0x20, TempPrefixInfo, 1, 0);
        BYTE PrefixInfoPrefixLength = TempRouteInfoPtr[2];
        // Lots of code. Assumptions:
        // PrefixInfoPrefixLength <= 128
        break;

    case 24: // Route Information Option
        BYTE TempRouteInfo[0x18];
        BYTE* TempRouteInfoPtr = NdisGetDataBuffer(NetBuffer, 0x18, TempRouteInfo, 1, 0);
        BYTE RouteInfoPrefixLength = TempRouteInfoPtr[2];
        // Some code. Assumptions:
        // PrefixInfoPrefixLength <= 128
        // Other, less interesting assumptions about PrefixInfoPrefixLength
        break;

    case 25: // Recursive DNS Server Option
        Ipv6pUpdateRDNSS(..., NetBuffer, ...);
        AdvanceBuffer = FALSE;
        break;

    case 31: // DNS Search List Option
        Ipv6pUpdateDNSSL(..., NetBuffer, ...);
        AdvanceBuffer = FALSE;
        break;
    }

    if (AdvanceBuffer)
    {
        NetBuffer->DataOffset += OptionLenInBytes;
        NetBuffer->DataLength -= OptionLenInBytes;
        // Other adjustments for NetBuffer...
    }
}

// More code...

}

As can be seen from the code, only 6 option types are supported in the first loop, the others are ignored. In any case, each header is skipped precisely according to the Length field.

Even less options, 4, are supported in the second loop. And similarly to the first loop, each header is skipped precisely according to the Length field, but this time with two exceptions: types 24 (the Route Information Option) and 25 (Recursive DNS Server Option) have functions which adjust the network buffer pointers by themselves, creating an opportunity for inconsistencies. 

That’s exactly what is happening with this bug – the Ipv6pUpdateRDNSS function doesn’t adjust the network buffer pointers as expected when the length field is even.

Breaking assumptions

Essentially, this bug allows us to break the assumptions made by the second loop that are supposed to be verified in the first loop. The only option types that are relevant are the 4 types which appear in both loops, that’s also why we didn’t include the other 2 in the code of the first loop. One such assumption is the value of the length field, and that’s how the buffer overflow POC works, but let’s revisit them all and see what can be achieved.

  • Option type 3 – Prefix Information
    • The option structure size must be 0x20 bytes. Breaking this assumption is what allows us to trigger the stack overflow, by providing a larger option structure. We can also provide a smaller structure, but that doesn’t have much value in this case.
    • The Prefix Length field value must be at most 128. Breaking this assumption allows us to set the field to an invalid value in the range of 129-255. This can indeed be used to cause an out-of-bounds data write, but in all such cases that we could find, the out-of-bounds write happens on the stack in a location which is overridden later anyway, so causing such out-of-bounds writes has no practical value.

      For example, one such out-of-bounds write happens in tcpip!Ipv6pMakeRouteKey, called by tcpip!IppValidateSetAllRouteParameters.
  • Option type 24 – Route Information Option
    • The option structure size must not be larger than 0x18 bytes. Same implications as for option type 3.
    • The Prefix Length field value must be at most 128. Same implications as for option type 3.
    • The Prefix Length field value must fit the structure option size. That isn’t really interesting since any value in the range 0-128 is handled correctly. The worst thing that could happen here is a small out-of-bounds read.
  • Option type 25 – Recursive DNS Server Option
    • The option structure size must not be smaller than 0x18 bytes. This isn’t interesting, since the size must be at least 8 bytes anyway (the length field is verified to be larger than zero in both loops), and any such structure is handled correctly, even though a size of 8-bytes is not valid according to the specification.
    • The option structure size must be in the form of 8+n*16 bytes. This check was added after fixing CVE-2020-16898.
  • Option type 31 – DNS Search List Option
    • The option structure size must not be smaller than 0x10 bytes. Same implications as for option type 25.

As you can see, there was a slight chance of doing something other than the demonstrated stack overflow by breaking the assumption of the valid prefix length value for option type 3 or 24. Even though it’s literally about smuggling a single bit, sometimes that’s enough. But it looks like this time we weren’t that lucky.

Revisiting the Stack Overflow

Before giving up, we took a closer look at the stack. The POCs that we’ve seen are overriding the stack such that the stack cookie (the __security_cookie value) is overridden, causing a system crash before the function returns.

We checked whether overriding anything on the stack can help achieve code execution before the function returns. That can be a local variable in the “Local variables (2)” space, or any variable in the previous frames that might be referenced inside the function. Unfortunately, we came to the conclusion that all the variables in the “Local variables (2)” space are output buffers that are modified before access, and no data from the previous frames is accessed.

Summary

We conclude with high confidence that CVE-2020-16898 is not exploitable without an additional vulnerability. It is possible that we may have missed something. Any insights / feedback is welcome. Even though we weren’t able to exploit the bug, we enjoyed the research, and we hope that you enjoyed this writeup as well.

Preparation toward running Docker on ARM Mac: Building multi-arch images with Docker BuildX

Preparation toward running Docker on ARM Mac: Building multi-arch images with Docker BuildX

Original text by Akihiro Suda

Today, Apple announced that they will ditch Intel and switch to ARM-based chips. This means that Docker for Mac will only be able to run ARM images natively.

o, Docker will no longer be useful when you want to run the same image on Mac and on x86_64 cloud instances? Nope. Docker will remain useful, as long as the image is built with support for multi-architectures.

Image for post

If you are still building images only for x86_64, you should switch your images to multi-arch as soon as possible, ahead of the actual release of ARM Mac.

Note: ARM Mac will be probably able to run x86_64 images via an emulator, but the performance penalty will be significant.

Building multi-arch images with Docker BuildX

To build multi-architecture images, you need to use Docker BuildX plugin (

docker buildx build
 ) instead of the well-known 
docker build
 command.

If you have already heard about the BuildKit mode ( 

export DOCKER_BUILDKIT=1
 ) of the built-in 
docker build
 command, you might be confused now, and you might be wondering BuildKit was deprecated. No worry, BuildKit is still alive. Actually Docker BuildX is built on BuildKit technology as well, but significantly enhanced compared to the BuildKit mode of the built-in 
docker build
. Aside from multi-arch build, Docker BuildX also comes with a lot of innovative features such as distributed build on Kubernetes clusters. See my previous post for the further information.

Docker BuildX is installed by default if you are using Docker for Mac on macOS. If you are using Docker for Linux on your own Linux VM (or baremetal Linux), you might need to install Docker BuildX separately. See https://github.com/docker/buildx#installing for the installation steps on Linux.

The three options for multi-arch build

Docker BuildX provides the three options for building multi-arch images:

  1. QEMU mode (easy, slow)
  2. Cross-compilation mode (difficult, fast)
  3. Remote mode (easy, fast, but needs an extra machine)

The easiest option is to use QEMU mode (Option 1), but it incurs significant performance penalty. So, using cross compilation (Option 2) is recommended if you don’t mind adapting Dockerfiles for cross compilation toolchains. If you already have an ARM machine (doesn’t need to be Mac of course), the third option is probably the best choice for you.

Option 1: QEMU mode (easy, slow)

This mode uses QEMU User Mode Emulation for running ARM toolchains on a x86_64 Linux environment. Or the quite opposite after the actual release of ARM Mac: x86_64 toolchains on ARM.

The QEMU integration is already enabled by default on Docker for Mac. If you are using Docker for Linux, you need to enable the QEMU integration using the 

linuxkit/binfmt
 tool:

$ uname -sm
Linux x86_64$ docker run --rm --privileged linuxkit/binfmt:v0.8$ ls -1 /proc/sys/fs/binfmt_misc/qemu-*
/proc/sys/fs/binfmt_misc/qemu-aarch64
/proc/sys/fs/binfmt_misc/qemu-arm
/proc/sys/fs/binfmt_misc/qemu-ppc64le
/proc/sys/fs/binfmt_misc/qemu-riscv64
/proc/sys/fs/binfmt_misc/qemu-s390x

Then initialize Docker BuildX as follows. This step is required on Docker for Mac as well as on Docker for Linux.

$ docker buildx create --use --name=qemu$ docker buildx inspect --bootstrap
...Nodes:
Name: qemu0
Endpoint: unix:///var/run/docker.sock
Status: running
Platforms: linux/amd64, linux/arm64, linux/riscv64, linux/ppc64le, linux/s390x, linux/386, linux/arm/v7, linux/arm/v6

Make sure the command above shows both 

linux/amd64
 and
linux/arm64
 as the supported platforms.

As an example, let’s try dockerizing a “Hello, world” application (GNU Hello) for these architectures. Here is the Dockerfile:

FROM debian:10 AS build
RUN apt-get update && \
apt-get install -y curl gcc make
RUN curl https://ftp.gnu.org/gnu/hello/hello-2.10.tar.gz | tar xz
WORKDIR hello-2.10
RUN LDFLAGS=-static \
./configure && \
make && \
mkdir -p /out && \
mv hello /outFROM scratch
COPY --from=build /out/hello /hello
ENTRYPOINT ["/hello"]

This Dockerfile doesn’t need to have anything specific to ARM, because QEMU can cover the architecture differences between x86_64 and ARM.

The image can be built and pushed to a registry using the following command. It took 498 seconds on my MacBookPro 2016.

$ docker buildx build \
--push -t example.com/hello:latest \
--platform=linux/amd64,linux/arm64 .
[+] Building 498.0s (17/17) FINISHED

The images contains binaries for both x86_64 and ARM. The image can be executed on any machine with these architectures, without enabling QEMU.

$ docker run --rm example.com/hello:latest
Hello, world!$ ssh me@my-arm-instance[me@my-arm-instance]$ uname -sm
Linux aarch64[me@my-arm-instance]$ docker run --rm example.com/hello:latest
Hello, world!

Option 2: Cross-compilation mode (difficult, fast)

The QEMU mode incurs significant performance penalty for interpreting ARM instructions on x86_64 (or x86_64 on ARM after the actual release of ARM Mac). If the QEMU mode is too slow for your application, consider using the cross-compilation mode instead.

To cross-compile the GNU Hello example, you need to modify the Dockerfile significantly:

FROM --platform=$BUILDPLATFORM debian:10 AS build
RUN apt-get update && \
apt-get install -y curl gcc make
RUN curl -o /cross.sh https://raw.githubusercontent.com/tonistiigi/binfmt/18c3d40ae2e3485e4de5b453e8460d6872b24d6b/binfmt/scripts/cross.sh && chmod +x /cross.sh
RUN curl https://ftp.gnu.org/gnu/hello/hello-2.10.tar.gz | tar xz
WORKDIR hello-2.10
ARG TARGETPLATFORM
RUN /cross.sh install gcc | sh
RUN LDFLAGS=-static \
./configure --host $(/cross.sh cross-prefix) && \
make && \
mkdir -p /out && \
mv hello /outFROM scratch
COPY --from=build /out/hello /hello
ENTRYPOINT ["/hello"]

This Dockerfile contains two essential variables: 

BUILDPLATFORM
 and 
TARGETPLATFORM
 .

BUILDPLATFORM
 is pinned to the host platform (
"linux/amd64"
 ). 
TARGETPLATFORM
 is conditionally set to the target platforms ( 
"linux/amd64"
 and 
"linux/arm64"
 ) and is used for setting up cross compilation tool chains like 
aarch64-linux-gnu-gcc
 via a helper script 
<a href="https://raw.githubusercontent.com/tonistiigi/binfmt/18c3d40ae2e3485e4de5b453e8460d6872b24d6b/binfmt/scripts/cross.sh">cross.sh</a>
 . This kind of helper script can be a mess depending on the application’s build scripts, especially in C applications.

For comparison, cross-compiling Go programs is relatively straightforward:

FROM --platform=$BUILDPLATFORM golang:1.14 AS build
ARG TARGETARCH
ENV GOARCH=$TARGETARCH
RUN go get github.com/golang/example/hello && \
go build -o /out/hello github.com/golang/example/helloFROM scratch
COPY --from=build /out/hello /hello
ENTRYPOINT ["/hello"]

The image can be built and pushed as follows without enabling QEMU. Cross-compiling GNU hello took only 95.3 seconds.

$ docker buildx create --use --name=cross$ docker buildx inspect --bootstrap cross
Nodes:
Name: cross0
Endpoint: unix:///var/run/docker.sock
Status: running
Platforms: linux/amd64, linux/386$ docker buildx build \
--push -t example.com/hello:latest \
--platform=linux/amd64,linux/arm64 .
[+] Building 95.3s (16/16) FINISHED

Option 3: Remote mode (easy, fast, but needs an extra machine)

The third option is to use a real ARM machine (e.g. an Amazon EC2 A1 instance) for compiling ARM binaries. This option is as easy as Option 1 and yet as fast as Option 2.

To use this option, you need to have an ARM machine accessible via 

docker
 CLI. The easiest way is to use 
ssh://&lt;USERNAME&gt;@&lt;HOST&gt;
 URL as follows:

$ docker -H ssh://me@my-arm-instance info
...
Architecture: aarch64
...

Also, depending on the network configuration, you might want to add 

ControlPersist
 settings in 
~/.ssh/config
 for faster and stabler SSH connection.

ControlMaster auto
ControlPath ~/.ssh/control-%C
ControlPersist yes

This remote ARM instance can be registered to Docker BuildX using the following commands:

$ docker buildx create --name remote --use$ docker buildx create --name remote \
--append ssh://me@my-arm-instance$ docker buildx build \
--push -t example.com/hello:latest \
--platform=linux/amd64,linux/arm64 .
[+] Building 113.4s (22/22) FINISHED

The Dockerfile is same as Option 1.

FROM debian:10 AS build
RUN apt-get update && \
apt-get install -y curl gcc make
RUN curl https://ftp.gnu.org/gnu/hello/hello-2.10.tar.gz | tar xz
WORKDIR hello-2.10
RUN LDFLAGS=-static \
./configure && \
make && \
mkdir -p /out && \
mv hello /outFROM scratch
COPY --from=build /out/hello /hello
ENTRYPOINT ["/hello"]

In this example, an Intel Mac is used as the local and an ARM machine is used as the remote. But after the release of actual ARM Mac, you will do the opposite: use an ARM Mac as the local and an x86_64 machine as the remote.

Performance comparison

  1. QEMU mode: 498.0s
  2. Cross-compilation mode: 95.3s
  3. Remote mode: 113.4s

The QEMU mode is the easiest, but taking 498 seconds seems too slow for compiling “hello world”. The cross-compilation mode is the fastest but modifying Dockerfile can be a mess. I suggest using the remote mode whenever possible.

We’re hiring!

NTT is looking for engineers who work in Open Source communities like Kubernetes & Docker projects. If you wish to work on such projects please do visit our recruitment page.

To know more about NTT contribution towards open source projects please visit our Software Innovation Center page. We have a lot of maintainers and contributors in several open source projects.

Our offices are located in the downtown area of Tokyo (Tamachi, Shinagawa) and Musashino.