Stack Protector
Preface
上次遇到一个heap overflow(《About “malloc(): corrupted top size”》)的问题,其实到现在还没有好的解决办法,不过对stack 的overflow倒是有GCC的option做保护。在以前的项目中有用过,不过没有详细研究,这篇就来deepdive下。
Content
Basic
首先stack是做啥用的?嗯,stack很重要,比如刷leetcode的时候做树深度或者广度优先搜索的时候有两种方法,一种叫迭代法,一种叫递归法,其中迭代法很麻烦,需要自己记录我遍历了哪些节点,接下来要遍历哪些,而递归法则简单多了,在函数里设置好结束条件,依次递归访问树的页节点就完事了。为什么会这么简单?这里就是stack的作用,它帮你把中间结果,局部变量,要返回的上一层节点的LR都保存下来了。stack被破坏掉很可能引起程序的crash,还可能引起严重的security问题。
GCC Option of Stack protect
以下copy自gcc的man。
1
2
3
4
5
6
7
8
9
10
11
-fstack-protector
Emit extra code to check for buffer overflows, such as stack smashing attacks. This is done by adding a guard variable to functions with vulnerable objects. This includes functions that call "alloca", and functions with buffers larger than 8 bytes. The guards are initialized when a function is entered and then checked when the function exits. If a guard check fails, an error message is printed and the program exits.
-fstack-protector-all
Like -fstack-protector except that all functions are protected.
-fstack-protector-strong
Like -fstack-protector but includes additional functions to be protected --- those that have local array definitions, or have references to local frame addresses.
-fstack-protector-explicit
Like -fstack-protector but only protects those functions which have the "stack_protect" attribute
gcc的manual解释的很清楚了,后边用例子对照一下。另stack-protector-explicit可以指定只对特定的函数做stack protect(但在博主的实验环境中看起来并不是Like -fstack-protector而是Like -fstack-protector-all),这里就不讨论了。
How does it protect stack
原理其实很简单,由于stack的特性是由高地址往低地址存贮诸如局部变量和LR,而我们程序中memory的增长方向则是由低地址到高地址。众所周知,局部变量是放在stack中的,当访问局部变量特别是array类型的变量时,有可能会越界访问,特别是写操作,会把stack中的内容改变,进而改变程序运行过程,造成某些不可预测的结果,这个时候反而crash是最好的结果了。gcc的stack protector是用一个canary的值把函数的LR和保存的变量隔离开,在离开本函数前调用__stack_chk_fail检查canary的值是否改变来判断是否有stack smashing发生。这里引用一张图帮助理解:
Example and analysis
以下代码编译运行环境如下:
Toolchain: GNU Toolchain for the A-profile Architecture 8.3-2019.02
SoC: ARMv8 Corextex-A series CPU
准备的例子如下:
1
2
3
4
5
6
void test_stackprotector(void)
{
unsigned char buff[16]; /* an array > 8 bytes */
memset(buff, 0xde, 16);
}
1
2
3
4
5
6
void test_stackprotector_strong(void)
{
unsigned char buff[4]; /* an array < 8 bytes */
memset(buff, 0xbf, 4);
}
1
2
3
4
5
6
void test_stackprotector_all(void)
{
unsigned char tmp; /* local variable not an array*/
tmp = 0xbd;
}
1
2
3
4
5
6
7
8
void test(void)
{
test_stackprotector();
test_stackprotector_strong();
test_stackprotector_all();
}
设置__stack_chk_guard为0xdeadbeafa55a5aa5,并依次用编译选项-fstack-protector、-fstack-protector-strong、-fstack-protector-all,一是看看编译出的汇编有什么差别,以及运行时的stack是怎样安排的。
-fstack-protector
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
000000000000a828 <test_stackprotector>:
a828: a9bd7bfd stp x29, x30, [sp, #-48]!
a82c: 910003fd mov x29, sp
a830: b0000040 adrp x0, 13000 <switch_status>
a834: 91012000 add x0, x0, #0x48
a838: f9400001 ldr x1, [x0]
a83c: f90017e1 str x1, [sp, #40]
a840: d2800001 mov x1, #0x0 // #0
a844: 910063e0 add x0, sp, #0x18
a848: d2800202 mov x2, #0x10 // #16
a84c: 52801bc1 mov w1, #0xde // #222
a850: 97ffea6e bl 5208 <memset>
a854: d503201f nop
a858: b0000040 adrp x0, 13000 <switch_status>
a85c: 91012000 add x0, x0, #0x48
a860: f94017e1 ldr x1, [sp, #40]
a864: f9400000 ldr x0, [x0]
a868: ca000020 eor x0, x1, x0
a86c: f100001f cmp x0, #0x0
a870: 54000040 b.eq a878 <test_stackprotector+0x50> // b.none
a874: 97ffffc2 bl a77c <__stack_chk_fail>
a878: a8c37bfd ldp x29, x30, [sp], #48
a87c: d65f03c0 ret
000000000000a880 <test_stackprotector_strong>:
a880: a9be7bfd stp x29, x30, [sp, #-32]!
a884: 910003fd mov x29, sp
a888: 910063e0 add x0, sp, #0x18
a88c: d2800082 mov x2, #0x4 // #4
a890: 528017e1 mov w1, #0xbf // #191
a894: 97ffea5d bl 5208 <memset>
a898: d503201f nop
a89c: a8c27bfd ldp x29, x30, [sp], #32
a8a0: d65f03c0 ret
000000000000a8a4 <test_stackprotector_all>:
a8a4: d10043ff sub sp, sp, #0x10
a8a8: 12800840 mov w0, #0xffffffbd // #-67
a8ac: 39003fe0 strb w0, [sp, #15]
a8b0: b0000040 adrp x0, 13000 <switch_status>
a8b4: 91170000 add x0, x0, #0x5c0
a8b8: 39403fe1 ldrb w1, [sp, #15]
a8bc: 39000001 strb w1, [x0]
a8c0: d503201f nop
a8c4: 910043ff add sp, sp, #0x10
a8c8: d65f03c0 ret
从汇编里看只有test_stackprotector结尾调用了__stack_chk_fail,符合预期。
另外这里发现了一个有趣的事情,就是在每个函数的第一行汇编保存FP和LR,它竟然不是保存在栈顶,而是又加了一个offset,这和上文图片里所示是有差别的,而这个预留出来的栈空间是给局部变量的,这样做至少避免了因为buffer overflow把本函数给搞挂了,然而更可怕的是如果把之前栈数据给改了,那就不知道什么时候遇到不可预测的问题了。所以FP和LR这样放个人认为意义不大,有可能还增加了debug的难度。这是题外话。
-fstack-protector-strong
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
000000000000acd4 <test_stackprotector>:
acd4: a9bd7bfd stp x29, x30, [sp, #-48]!
acd8: 910003fd mov x29, sp
acdc: b0000040 adrp x0, 13000 <__func__.5917+0x18>
ace0: 91212000 add x0, x0, #0x848
ace4: f9400001 ldr x1, [x0]
ace8: f90017e1 str x1, [sp, #40]
acec: d2800001 mov x1, #0x0 // #0
acf0: 910063e0 add x0, sp, #0x18
acf4: d2800202 mov x2, #0x10 // #16
acf8: 52801bc1 mov w1, #0xde // #222
acfc: 97ffe9f9 bl 54e0 <memset>
ad00: d503201f nop
ad04: b0000040 adrp x0, 13000 <__func__.5917+0x18>
ad08: 91212000 add x0, x0, #0x848
ad0c: f94017e1 ldr x1, [sp, #40]
ad10: f9400000 ldr x0, [x0]
ad14: ca000020 eor x0, x1, x0
ad18: f100001f cmp x0, #0x0
ad1c: 54000040 b.eq ad24 <test_stackprotector+0x50> // b.none
ad20: 97ffffc2 bl ac28 <__stack_chk_fail>
ad24: a8c37bfd ldp x29, x30, [sp], #48
ad28: d65f03c0 ret
000000000000ad2c <test_stackprotector_strong>:
ad2c: a9be7bfd stp x29, x30, [sp, #-32]!
ad30: 910003fd mov x29, sp
ad34: b0000040 adrp x0, 13000 <__func__.5917+0x18>
ad38: 91212000 add x0, x0, #0x848
ad3c: f9400001 ldr x1, [x0]
ad40: f9000fe1 str x1, [sp, #24]
ad44: d2800001 mov x1, #0x0 // #0
ad48: 910043e0 add x0, sp, #0x10
ad4c: d2800082 mov x2, #0x4 // #4
ad50: 528017e1 mov w1, #0xbf // #191
ad54: 97ffe9e3 bl 54e0 <memset>
ad58: d503201f nop
ad5c: b0000040 adrp x0, 13000 <__func__.5917+0x18>
ad60: 91212000 add x0, x0, #0x848
ad64: f9400fe1 ldr x1, [sp, #24]
ad68: f9400000 ldr x0, [x0]
ad6c: ca000020 eor x0, x1, x0
ad70: f100001f cmp x0, #0x0
ad74: 54000040 b.eq ad7c <test_stackprotector_strong+0x50> // b.none
ad78: 97ffffac bl ac28 <__stack_chk_fail>
ad7c: a8c27bfd ldp x29, x30, [sp], #32
ad80: d65f03c0 ret
000000000000ad84 <test_stackprotector_all>:
ad84: d10043ff sub sp, sp, #0x10
ad88: 12800840 mov w0, #0xffffffbd // #-67
ad8c: 39003fe0 strb w0, [sp, #15]
ad90: b0000040 adrp x0, 13000 <__func__.5917+0x18>
ad94: 91370000 add x0, x0, #0xdc0
ad98: 39403fe1 ldrb w1, [sp, #15]
ad9c: 39000001 strb w1, [x0]
ada0: d503201f nop
ada4: 910043ff add sp, sp, #0x10
ada8: d65f03c0 ret
test_stackprotector和test_stackprotector_strong结尾都调用了__stack_chk_fail,符合预期。
-fstack-protector-all
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
000000000000c580 <test_stackprotector>:
c580: a9bd7bfd stp x29, x30, [sp, #-48]!
c584: 910003fd mov x29, sp
c588: d0000040 adrp x0, 16000 <switch_status>
c58c: 91012000 add x0, x0, #0x48
c590: f9400001 ldr x1, [x0]
c594: f90017e1 str x1, [sp, #40]
c598: d2800001 mov x1, #0x0 // #0
c59c: 910063e0 add x0, sp, #0x18
c5a0: d2800202 mov x2, #0x10 // #16
c5a4: 52801bc1 mov w1, #0xde // #222
c5a8: 97ffe67c bl 5f98 <memset>
c5ac: d503201f nop
c5b0: d0000040 adrp x0, 16000 <switch_status>
c5b4: 91012000 add x0, x0, #0x48
c5b8: f94017e1 ldr x1, [sp, #40]
c5bc: f9400000 ldr x0, [x0]
c5c0: ca000020 eor x0, x1, x0
c5c4: f100001f cmp x0, #0x0
c5c8: 54000040 b.eq c5d0 <test_stackprotector+0x50> // b.none
c5cc: 97ffffa8 bl c46c <__stack_chk_fail>
c5d0: a8c37bfd ldp x29, x30, [sp], #48
c5d4: d65f03c0 ret
000000000000c5d8 <test_stackprotector_strong>:
c5d8: a9be7bfd stp x29, x30, [sp, #-32]!
c5dc: 910003fd mov x29, sp
c5e0: d0000040 adrp x0, 16000 <switch_status>
c5e4: 91012000 add x0, x0, #0x48
c5e8: f9400001 ldr x1, [x0]
c5ec: f9000fe1 str x1, [sp, #24]
c5f0: d2800001 mov x1, #0x0 // #0
c5f4: 910043e0 add x0, sp, #0x10
c5f8: d2800082 mov x2, #0x4 // #4
c5fc: 528017e1 mov w1, #0xbf // #191
c600: 97ffe666 bl 5f98 <memset>
c604: d503201f nop
c608: d0000040 adrp x0, 16000 <switch_status>
c60c: 91012000 add x0, x0, #0x48
c610: f9400fe1 ldr x1, [sp, #24]
c614: f9400000 ldr x0, [x0]
c618: ca000020 eor x0, x1, x0
c61c: f100001f cmp x0, #0x0
c620: 54000040 b.eq c628 <test_stackprotector_strong+0x50> // b.none
c624: 97ffff92 bl c46c <__stack_chk_fail>
c628: a8c27bfd ldp x29, x30, [sp], #32
c62c: d65f03c0 ret
000000000000c630 <test_stackprotector_all>:
c630: a9be7bfd stp x29, x30, [sp, #-32]!
c634: 910003fd mov x29, sp
c638: d0000040 adrp x0, 16000 <switch_status>
c63c: 91012000 add x0, x0, #0x48
c640: f9400001 ldr x1, [x0]
c644: f9000fe1 str x1, [sp, #24]
c648: d2800001 mov x1, #0x0 // #0
c64c: 12800840 mov w0, #0xffffffbd // #-67
c650: 39005fe0 strb w0, [sp, #23]
c654: d0000040 adrp x0, 16000 <switch_status>
c658: 91170000 add x0, x0, #0x5c0
c65c: 39405fe1 ldrb w1, [sp, #23]
c660: 39000001 strb w1, [x0]
c664: d503201f nop
c668: d0000040 adrp x0, 16000 <switch_status>
c66c: 91012000 add x0, x0, #0x48
c670: f9400fe1 ldr x1, [sp, #24]
c674: f9400000 ldr x0, [x0]
c678: ca000020 eor x0, x1, x0
c67c: f100001f cmp x0, #0x0
c680: 54000040 b.eq c688 <test_stackprotector_all+0x58> // b.none
c684: 97ffff7a bl c46c <__stack_chk_fail>
c688: a8c27bfd ldp x29, x30, [sp], #32
c68c: d65f03c0 ret
这下三个函数结尾都有调用__stack_chk_fail了。
Stack Dump
以上三个option只决定哪些函数的要进行stack protect,stack的数据安排是类似的,这里选取一种情况分析。以下是stack数据的一个dump(仅截取当前函数为test_stackprotector的stack。另外为dump stack做了另外的操作,和上面汇编并非匹配。):
- no stack protector enabled
1 2 3 4 5 6 7 8
00 00 00 00 00 00 00 00 f0 fe 0b 00 00 00 00 00 // stored FP 04 66 00 00 00 00 00 00 // stored LR de de de de de de de de // buf[16] and set to 0xde de de de de de de de de 10 ff 0b 00 00 00 00 00 // FP of test() 5c 66 00 00 00 00 00 00 // LR of test() 00 00 00 00 00 00 00 00
- with -fstack-protector
1 2 3 4 5 6 7 8 9 10 11 12
00 00 00 00 00 00 00 00 f0 fe 0b 00 00 00 00 00 // stored FP e0 66 00 00 00 00 00 00 // stored LR 00 00 00 00 00 00 00 00 7c 60 00 00 00 00 00 00 00 00 60 04 00 00 00 00 de de de de de de de de // buf[16] and set to 0xde de de de de de de de de a5 5a 5a a5 af be ad de // stack check guard 10 ff 0b 00 00 00 00 00 // FP of test() 38 67 00 00 00 00 00 00 // LR of test() 00 00 00 00 00 00 00 00
Conclusion
通过上面的介绍,可以看到gcc stack protector可以避免部分访问越界的错误,也有它的局限性:
- 只对写操作的overflow起作用,读不会引发错误;
- 只有写越界影响到canary时起作用,canary没变化不会引发错误;
- 空间上会占用更多的stack来保存canary;
- 时间上因为在相关函数结尾会多一个check canary的过程,也会影响performance;
关于#1和#2,需要引入其他机制来加强。关于#3和#4,有两个方法来缓解:
- 使用stack-protector-strong,既可以比较全面的保护,相对stack-protector-all可以少用一些stack和少一些检查;
- 在开发过程中打开选项,而在production的时候关闭;
关于利用stack进行hack,在bare metal程序中比较容易进行。但在应用了MMU的系统中,应用程序都只能运行在自己的virtual space中,不能访问kernel或者其他应用程序的空间,所以通过这种方法hack就不大容易了。
一些流行的开源项目也都确实apply了stack protect,例如Linux Kernel和OPTEE。
总之,stack protector在一定程度上能够及早发现stack overflow并降低debug的难度,还是值得使用的。
Reference
Stack Smashing Protection
Local Variables on the Stack
Stack Introduction