An Introduction to Reverse Engineering for eBPF Bytecode

Table of Contents

  • Introduction
  • What is eBPF?
  • Let’s try the eBPF Challenge
  • Capturing the Flag
  • Conclusion

Introduction

Hi, I’m Chihiro from the Threat Detection and Response team! Since joining Mercari, Inc. last July, I have been focused on detection engineering for cloud environments, incident response, and development of our own SOAR (Security Orchestration Automation and Response) platform.

Mercari has an official system to support club activities (Bukatsu 部活), and there are various club activities that we can participate in. Lately, I have been taking part in CTF (Capture the Flag) cyber security competitions as part of our club activity. In this blog, I would like to explain how an eBPF program is structured and processed, using as an example a reverse engineering challenge related to eBPF that was interesting for me in a CTF that I have participated in recently.

What is eBPF?

eBPF is a technology designed to run in the Linux kernel space, it is used for packet filtering and tracing and can aid in investigating performance issues. eBPF is indirectly utilized by many projects under the Cloud Native Computing Foundation (CNCF) such as Cilium (a container network interface) and Falco (a container runtime security tool).

eBPF bytecode has a unique instruction set as it is executed sandboxed on a special virtual machine. As a result of this, eBPF bytecode is based on an instruction set that is different from the normal operating system. I will explain the instruction set below but for more information, please refer to the official eBPF Instruction Set documentation.

In general, any programming language has areas to store computed values. In the case of eBPF’s virtual machine there are small-scale memory areas called registers to store computed values. These include 10 general purpose registers:

  • R0: Stores a return value of a function, and exit value for an eBPF program
  • R1 – R5: Stores function arguments
  • R6 – R9: For general purpose usage
  • R10: Stores an address for stack frame

Let’s check out the instructions below. The eBPF instructions have fixed length, similar to the instructions in RISC architecture. An instruction is 64 bits in length. More specifically, the instruction consists of the parts shown below:

Opcode specifies the instructions to be performed. There are various instructions such as moving a value to a destination register, arithmetic operations, and conditional branches. The opcode field consists of smaller parts to represent these specific instructions. The value to be assigned is stored in the Immediate field.

Take a look at the example below. It shows the 64 bit value that represents the instructions to be performed. Please note that the value is written by little-endian so that the left part is lower byte.

b7 01 00 00 44 04 05 1c

The value b7 in the first section is Opcode, and it can be converted into binary as 1011 0111. The lower 3 bits are 111, which represent an instruction such as BPF_ALU64.

The first 4 bits, 1011 are defined as BPF_MOV on BPF_ALU64. This is an instruction to copy the data from the source register to the destination register. The remaining fourth bit represents if the source is used by a register or 32-bit immediate value. If this value is 0, an immediate value is used for the source.

For the second byte, we can represent the value as 0000 0001 in binary. The binary is split into two parts, which are the source register and the destination register. In this example, the destination register is 1 which is the R1 register, and the source register is the R0 register.

However, as we previously stated, the source is an immediate value instead of a register. Therefore, we can take in the instruction that is storing an immediate value, 0x1c050444 to the R1 register.

Let’s try the eBPF Challenge

This challenge is a beginner-friendly reverse engineering challenge from Backdoor CTF. Backdoor CTF has been held since 2013 according to CTFtime.

CTFs involve solving a variety of challenges related to computer science and cyber security. The goal of each challenge is to obtain a flag, which is typically formatted as FLAG{COOL_FLAG_NAME}. In terms of reverse engineering challenges, hidden flags are commonly obtained by analyzing the binary files.

Most of the reverse engineering challenges are about analyzing Linux or Windows executable files. However, sometimes we can see challenges involving other file formats. Therefore, what we should do first is to identify the file type by using the file command. Using this command we can reveal that the file is an eBPF program as follows:

root@6d1def7da3d3:~# file babyebpf.o
babyebpf.o: ELF 64-bit LSB relocatable, eBPF, version 1 (SYSV), not stripped

There are typically two approaches to solve this challenge:

  1. Run the eBPF program
  2. Understand the code within the file using reverse engineering techniques

We will take the latter approach this time for curiosity’s sake!

As we learned earlier, it is hard to analyze all the instructions in the binary file manually. Therefore, we usually use a conversion technique called disassembly, which is a technique to automate these conversions. Disassemble converts the machine code into a mnemonic: a mnemonic is a human-friendly text based instruction.

To disassemble the eBPF bytecode I recommend using the llvm-objdump command. With the -d option, we can disassemble the binary file. The command by default displays the original hexadecimal values along with the mnemonics, but the results would be too verbose. We can use the --no-show-raw-insn flag to hide the hexadecimal part and focus on just the mnemonic instructions.

root@6d1def7da3d3:~# llvm-objdump --no-show-raw-insn -d babyebpf.o

babyebpf.o: file format elf64-bpf

Disassembly of section tp/syscalls/sys_enter_execve:

0000000000000000 <detect_execve>:
       0:   r1 = 0x1c050444
       1:   *(u32 *)(r10 - 0x8) = r1
       2:   r1 = 0x954094701340819 ll
       4:   *(u64 *)(r10 - 0x10) = r1
       5:   r1 = 0x10523251403e5713 ll
       7:   *(u64 *)(r10 - 0x18) = r1
       8:   r1 = 0x43075a150e130d0b ll
      10:   *(u64 *)(r10 - 0x20) = r1
      11:   r1 = 0x0

0000000000000060 <LBB0_1>:
      12:   r2 = 0x0 ll
      14:   r2 += r1
      15:   r2 = *(u8 *)(r2 + 0x0)
      16:   r3 = r10
      17:   r3 += -0x20
      18:   r3 += r1
      19:   r4 = *(u8 *)(r3 + 0x0)
      20:   r2 ^= r4
      21:   *(u8 *)(r3 + 0x0) = r2
      22:   r1 += 0x1
      23:   if r1 == 0x1c goto +0x1 <LBB0_2>
      24:   goto -0xd <LBB0_1>

00000000000000c8 <LBB0_2>:
      25:   r3 = r10
      26:   r3 += -0x20
      27:   r1 = 0x1c ll
      29:   r2 = 0x4
      30:   call 0x6
      31:   r0 = 0x1
      32:   exit

I will briefly explain how to interpret the disassembled code. For example, to express an assignment of the value 10 to the r1 register, we use the notation r1 = 10. When assigning data to a location in memory we use a notation like *(u32*)(r10) = r1. In this example we take the value of the r10 register as an address, and the value of r1 is assigned to the memory pointed to by that address.

Let’s begin by looking at the detect_execve function.

0000000000000000 <detect_execve>:
       0:   r1 = 0x1c050444
       1:   *(u32 *)(r10 - 0x8) = r1
       2:   r1 = 0x954094701340819 ll
       4:   *(u64 *)(r10 - 0x10) = r1
       5:   r1 = 0x10523251403e5713 ll
       7:   *(u64 *)(r10 - 0x18) = r1
       8:   r1 = 0x43075a150e130d0b ll
      10:   *(u64 *)(r10 - 0x20) = r1
      11:   r1 = 0x0

This code assigns 0x1c050444, which is 470090820 in decimal, as an immediate value to the r1 register, then copies the value to the address pointed to by r10-8 in memory. Please note that the r10 register points to the address of the stack frame. Therefore, it means the code assigns the value to a local variable. We can also see similar code next line onwards. Finally, the r1 register is set to 0. The below figure shows the stack layout after these instructions were executed.

Let’s continue to read the disassembled code, and check the bottom of this function first.

0000000000000060 <LBB0_1>:
      12:   r2 = 0x0 ll
      14:   r2 += r1
      15:   r2 = *(u8 *)(r2 + 0x0)
      16:   r3 = r10
      17:   r3 += -0x20
      18:   r3 += r1
      19:   r4 = *(u8 *)(r3 + 0x0)
      20:   r2 ^= r4
      21:   *(u8 *)(r3 + 0x0) = r2
      22:   r1 += 0x1
      23:   if r1 == 0x1c goto +0x1 <LBB0_2>
      24:   goto -0xd <LBB0_1>

We can see an if statement, where the instruction compares the r1 register and 0x1c (28 in decimal). If they are equal, the program jumps to LBB0_2 label. If not, it goes back to the first line of the LBB0_1 label. Through this analysis we can reach the conclusion that these instructions are equivalent to loop statements of a high-level programming language. In fact, before the if statement, we can see the value stored in register r1 is incremented by 1. This shows that the program is using the r1 register as a loop counter.

Let’s read the code again, keeping in mind that there is a loop statement. First, the code assigns 0 to the r2 register, then adds the value of the r1 register to the r2 register. In the beginning, the r1 register is set to 0 as explained in the detect_execve function, so the r2 register will be 0 if it is added. Next, the code stores the data by dereferencing the address in the r2 register.

Let’s take a look at the r3 register. It is copied from the r10 register, which is the stack address. Then 32 is subtracted from the r3 register. 32 is equal to the exact offset from the address of the stack frame to the address of the local variable. Then, the address is added to the value in the r1 register, dereferencing the pointer, storing the data in the local variable to the r4 register. After that, the values stored in r2 and r4 registers are XOR’ed and the result is stored in the r2 register. Finally, the data pointing by the r3 register, which is the data of the local variable, is overwritten by the result.

After all of the above, the value of the r1 register is incremented by 1, and the if statement of the loop is evaluated. Therefore, we can conclude that the code block performs an XOR operation with two data points per byte, then overwrites the data in the local variable. Also, the r1 register is compared to 28, so we can guess that length of the data is expected to be 28 bytes.

But exactly what kind of data is stored in the r2 register? Since we cannot find out the data using only the disassembled code, we continue to investigate the binary file from a different perspective. Generally, binary files embed in themselves interesting strings. We can try to extract those strings with the strings command from GNU Binary Utilities.

root@6d1def7da3d3:~# strings -tx -a babyebpf.o
     5c G   T   {
    148 marinkitagawamarinkitagawama
    16e W>@Q2R
    179 G   T   D
    2a5 .text
    2ab detect_execve.____fmt
    2c1 _version
    2ca .llvm_addrsig
    2d8 detect_execve
    2e6 .reltp/syscalls/sys_enter_execve
    307 _license
    310 baby_ebpf.c
    31c .strtab
    324 .symtab
    32c .rodata
    334 LBB0_2
    33b LBB0_1
    342 .rodata.str1.1

According to the result, we found that there are some interesting strings. Based on the result about data length, marinkitagawamarinkitagawama is the most interesting string of all the strings because the data length is exactly 28 bytes:

root@6d1def7da3d3:~# echo -n marinkitagawamarinkitagawama | wc -c
28

Lastly, we will read the instructions on the LBB0_2 label.

00000000000000c8 <LBB0_2>:
      25:   r3 = r10
      26:   r3 += -0x20
      27:   r1 = 0x1c ll
      29:   r2 = 0x4
      30:   call 0x6
      31:   r0 = 0x1
      32:   exit

In this code block, we should pay attention to the call instruction. The call instruction can execute functions that are local to the eBPF program, but it can also execute functions specified by an integer argument. The mapping between those functions and integer values is defined in the Linux source code; 6 looks like the trace_printk function. Therefore, we get the idea that this code intends to print something. The code also stores the address of the local variable to the r3 register as the third function argument. This allows us to guess that this eBPF program is going to display the XOR encoded or decoded data pointed by the value in the r3 register.

Capturing the Flag

Let’s create a script to emulate the decoding algorithm in order to solve this challenge by what we’ve learned. I usually use the Ruby programming language to solve CTF challenges and below is my solution for this challenge. Any programming language is ok, so feel free to write the script in a programming language of your choice.

#!/usr/bin/env ruby
encoded = [
    0x43075a150e130d0b,
    0x10523251403e5713,
    0x954094701340819,
    0x1c050444
].pack('Q*').chars
key = "marinkitagawamarinkitagawama".chars

key.zip(encoded) do |k, e|
  print (k.ord ^ e.ord).chr
end

The above Ruby script displays the data conducted XOR operation with the value of the local variable assigned and the embedded value in the binary file per byte.

By running this script, we could extract the following flag.

root@6d1def7da3d3:~# ruby solve.rb
flag{1n7r0_70_3bpf_h3h3h3eh}

Conclusion

In this blog, I explained the internals of eBPF along with a reverse engineering challenge. I’m sure there are many people using eBPF indirectly without really realizing it. However, there aren’t many who know about the internal details in depth. While the opportunity to use this knowledge may not come up often, it is still a great skill set to have under your belt when you need to debug and investigate problems at a lower level.

Thank you for reading! I hope this blog will help you.

  • X
  • Facebook
  • linkedin
  • このエントリーをはてなブックマークに追加