This page looks best with JavaScript enabled

Flare-On 8 2021 Challenge 9 Solution - 09_evil

Hosted by FireEye's FLARE team from 10 September - 22 October

 ·  ☕ 15 min read  ·  🌚 drome

Thanks drome for sharing his knowledge and skills! He completed all 10 challenges and this series of writeups is done by him :)

Details Links
Official Challenge Site https://flare-on.com/
Official Challenge Announcement https://www.fireeye.com/blog/threat-research/2021/08/announcing-the-eighth-annual-flare-on-challenge.html
Official Solutions https://www.mandiant.com/resources/flare-on-8-challenge-solutions
Official Challenge Binaries http://flare-on.com/files/Flare-On8_Challenges.zip

09_evil

Mandiant’s unofficial motto is “find evil and solve crime”. Well here is evil but forget crime, solve challenge. Listen kid, RFCs are for fools, but for you we’ll make an exception :)
The challenge has 3 false flags: !t_$uRe_W0u1d_B3_n1ce_huh!@flare-on.com 1s_tHi$_mY_f1aG@flare-on.com N3ver_G0nNa_g1ve_y0u_Up@flare-on.com.
7zip password: flare

There is a single file called evil.exe.

arch     x86
baddr    0x400000
binsz    2964480
bintype  pe
bits     32
canary   false
retguard false
class    PE32
cmp.csum 0x002d7f23
compiled Mon Jul 19 12:22:26 2021
crypto   false
endian   little
havecode true
hdr.csum 0x00000000
laddr    0x0
lang     c
linenum  false
lsyms    false
machine  i386
nx       true
os       windows
overlay  false
cc       cdecl
pic      true
relocs   false
signed   false
sanitize false
static   false
stripped false
subsys   Windows CUI
va       true

Dynamic Analysis

Disassembling it, we see that IDA can’t even define the main function because of bad opcodes. We run it dynamically, and see that it calls 0x623D0, which does a null pointer access at 0x62460. This causes an exception in our debugger, and if we pass it to the process, we get a divide by zero exception at 0x624B0.

SEH handlers

To understand where we can find the exception handlers, we refer to Practical Malware Analysis (PMA)’s Chapter 15 - Obscuring Flow Control - Missing Structured Exception Handlers (page 344).

To find the SEH chain, the OS examines the FS segment register. This register contains a segment selector that is used to gain access to the Thread Environment Block (TEB). The first structure within the TEB is the Thread Information Block (TIB). The first element of the TIB (and consequently the first bytes of the TEB) is a pointer to the SEH chain. The SEH chain is a simple linked list of 8-byte data structures called EXCEPTION_REGISTRATION records

This explains the following assembly code at the start of main, which adds an additional SEH entry at the top of the SEH chain pointing to sub_A38E4.

.text:00066450 _main:
.text:00066450                 push    ebp
.text:00066451                 mov     ebp, esp
.text:00066453                 push    0FFFFFFFFh
.text:00066455                 push    offset sub_A38E4
.text:0006645A                 mov     eax, large fs:0
.text:00066460                 push    eax
.text:00066461                 mov     large fs:0, esp

In the function called by main, at 0x623D0, we see that it does something very similar, just with the handler sub_A3740.

However, when we try to put breakpoint at those SEH exception handlers and run, we can’t hit those breakpoints. The only time control gets passed back to our debugger is when the program is running and runs into another exception.

We put a breakpoint at ntdll_KiUserExceptionDispatcher and hit it once we pass control to the application after the first exception.

We can’t step through into user mode until they throw exceptions, as shown in the following debugger output

624B0: Integer divide by zero (exc.code c0000094, tid 6032)
624E6: Integer divide by zero (exc.code c0000094, tid 6032)
6255F: Integer divide by zero (exc.code c0000094, tid 6032)
625AC: Integer divide by zero (exc.code c0000094, tid 6032)
62F70: Priveleged instruction (exc.code c0000096, tid 6032)

__scrt_common_main_seh

__scrt_common_main_seh is the second function called in our entry point function.

By luck, we noticed that sub_654B0 gets called a lot during exceptions. We put a breakpoint there and observe that it gets called at the very start of program execution, even before our main breakpoint is hit.

sub_4820F0 and sub_482130 call sub_654B0 in __scrt_common_main_seh. We observe its behavior and find that it always returns some API functions. Analyzing it statically, we see that sub_654B0 is some API hash resolver using this hash function

1
2
3
4
5
def hash_name(function_name):
    hash_value = 64
    for b in function_name:
        hash_value = (b - 0x45523F21 * hash_value) & 0xFFFFFFFF
    return hash_value

We can repurpose the hash checker we created in Challenge 7 to resolve these API names statically, as such

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
import pefile

def hash_name(function_name):
    hash_value = 64
    for b in function_name:
        hash_value = (b - 0x45523F21 * hash_value) & 0xFFFFFFFF
    return hash_value

entry_export = [pefile.DIRECTORY_ENTRY["IMAGE_DIRECTORY_ENTRY_EXPORT"]]
exports = []
libraries = [
    'ntdll',
    'kernel32',
    'gdi32',
    'user32',
    'comctl32',
    'comdlg32',
    'ws2_32',
    'advapi32',
    'netapi32',
    'ole32',
    'winmm',
    'imm32',
    'bcrypt',
    'wmi',
]

for library in libraries:
    pe = pefile.PE(f'C:/windows/system32/{library}.dll')
    pe.parse_data_directories(directories=entry_export)
    exports += [e.name for e in pe.DIRECTORY_ENTRY_EXPORT.symbols if e.name]
    print(exports[-1])

hash_to_name = {hash_name(name):name for name in exports}

AddVectoredExceptionHandler is resolved by hash in sub_482130, then called in sub_482150, to register the function Handler (at 0x486AD0) as a VEH handler. Both functions are called in succession by 0x4B1ED6 in __scrt_common_main_seh.

This explains why our SEH handlers were not being called — __scrt_common_main_seh registers the VEH handler, which gets called and transfer control back to the program before it ever reaches the SEH handlers.

VEH handler

Handler uses VirtualProtect to change the page permissions to RWX, then changes the two-byte instruction at EIP + 3 to FF D0, where EIP is the address of the faulty code, then changes the EIP at that context to that address, then returns EXCEPTION_CONTINUE_EXECUTION which causes the program to continue execution at that new address.

FF D0 disassembles into call eax, and the value of eax is resolved using the hash function resolver, with the hash being the value of ecx at the time of exception. For example, before the first exception, we have the following code

.text:0048243E C7 45 E8+                mov     dword ptr [ebp-18h], 66FFF672h
...
.text:0048245B 8B 4D E8                 mov     ecx, [ebp-18h]

This corresponds to GetSystemTime.

Control flow obfuscation

Knowing this, we can patch the program so that IDA can define it as a function properly. After every exception-triggering instruction there is usually 3 bytes of bad code that messes IDA up.

3 bytes of bad code in IDA

List of exception triggering instructions and next 3 instructions

  • 33 C0 8B 00 / 74 03 75xor eax, eax; mov eax [eax]
  • 33 C0 8B 00 / EB FF E8xor eax, eax; mov eax [eax]
  • 33 C0 F7 F0 / EB 00 EBxor eax, eax; div eax
  • 33 C0 F7 F0 / E8 FF D2xor eax, eax; div eax
  • 33 C0 F7 F0 / 5B 5D C3xor eax, eax; div eax
  • 33 FF F7 F7 / 33 C0 74xor edi, edi; div edi
  • 33 F6 F7 F6 / E8 FF D2xor esi, esi; div esi

Initially we tried to just patch the last 3 bytes to 90 FF D0, which was nop; call eax, but then because of the unrelated call and the exception causing error, the output in IDA looked quite bad, decompilation didn’t even work, and we had to do a lot of manual analysis of the assembly in graph and even non-graph text view.

Then, we realized that each error-producing sequence is 7 bytes, which were just enough for 5 bytes for a call to any function we wanted and then 2 bytes for call eax. Better yet, the API hash resolving function takes ecx and edx as arguments the way the function sets it up. If we patched the opcodes accordingly, we could even get IDA to analyze the function stack frame properly.

We write an IDAPython script to patch all these exception triggering instructions, referring extensively to the IDAPython’s gruesome documentation at https://hex-rays.com/products/ida/support/idapython_docs/ and https://hex-rays.com/products/ida/support/ida74_idapython_no_bc695_porting_guide.shtml. Base address of our program is now 0x480000.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
TEXT_START = 0x481000
TEXT_END = 0x4C5000

API_HASH_FUNCTION = 0x4854B0

error_opcodes = [
    "33 C0 8B 00",
    "33 C0 F7 F0",
    "33 FF F7 F7",
    "33 F6 F7 F6",
]

known_next_3_bytes = [
    b"\x74\x03\x75",
    b"\xeb\xff\xe8",
    b"\xeb\x00\xeb",
    b"\xe8\xff\xd2",
    b"\x33\xc0\x74",
    b"\x5b\x5d\xc3",
    b"\x90\xFF\xD0",
]

for error_opcode in error_opcodes:
    addr = ida_search.find_binary(TEXT_START, TEXT_END, error_opcode, 16, ida_search.SEARCH_DOWN)
    while addr != 0xFFFFFFFF:
        print(hex(addr))
        next_bytes = get_bytes(addr + 4, 3)
        if next_bytes in known_next_3_bytes:
            call_target = ((API_HASH_FUNCTION - (addr + 5)) & 0xFFFFFFFF).to_bytes(4, byteorder='little')
            idaapi.patch_bytes(addr, b'\xe8' + call_target + b'\xff\xd0')
        addr = ida_search.find_binary(addr + 7, TEXT_END, error_opcode, 16, ida_search.SEARCH_DOWN)

We tried to do some manual fixing to correct the function frames, but the graph view and decompiler gave issues, especially with the continuity of the main function after the check for argc. To fix this and get a nice looking IDB flow, we created a copy of evil.exe, applied the patches to it, then reanalyzed it using IDA on the freshly patched copy. This gave us a much nicer looking IDB that allowed us to do static analysis nearly entirely in the decompiler view.

The patched binary also runs somewhat, but running from the start gives us errors because the API hash resolving function fails to resolve for some reason, so most of the subsequent analysis was done statically.

Static Analysis

With the obfuscation method understood, we can move on to analyzing the sample functionality.

main

main calls new a bunch of times to create a bunch of structures, then calls sub_4823D0 which does a bunch of anti-debugging things.

sub_4823D0 changes the first byte of DbgBreakPoint to 0xC3 (opcode for ret), and the first 14 bytes of DbgUiRemoteBreakin to 6a 00 68 ff ff ff ff B8 <terminate process address> FF D0, which essentially calls TerminateProcess. It also does other things but we didn’t analyze it deeply.

main then calls CreateMutexA and puts some values into the 0x1CC byte object, then calls CreateThread to the functions sub_482E70, sub_482D50, sub_484310, and sub_484680, then calls WaitForMultipleObjects.

Afterwards main calls closesocket and WSACleanup which strongly suggests that there is network functionality in the executable.

main also references 0x751698, which contains fake flag 1, 1s_tHi$_mY_f1aG@flare-on.com, initialized in sub_481000 which was called as part of the initialization routines in __scrt_common_main_seh.

Jumping around

We know that network functions like socket and connect and send and recv are likely called, and probably resolved using the hash API resolver, so we can do binary searches for their function name hashes in the binary to see which code calls them.

For example, the hash for socket is 0xd5af7bf3, and searching for it gives us 2 hits in sub_483A70. connect, send, and recv have no hits. They might be using UDP so we try recvfrom which gets us a hit in sub_484310, and sendto which gets us a hit in sub_483D40, confirming our suspicions.

Doing some xrefs and combining this with our knowledge of the threads that main is creating, we have the following information about threads that main creates

  • sub_482E70, uses the 0x18 byte object used by the anti-debug function, creates more threads and is a pain to reverse, probably not important
  • sub_482D50, also uses 0x18 byte anti-debug object, also probably unimportant

After that main calls sub_483A70 which takes argv[1] as the IP address, and does network startup function calls like WSAStartup, socket, and bind, and puts it into the 0x1CC byte object (not a real object because it doesn’t contain methods, probably just a struct), so we will keep track of that object, calling it network_struct, and take note of the 2 threads that use that struct

  • sub_484310, which uses the network_struct and calls recvfrom, so this function is probably important and should be more deeply analyzed. We will call this the recv function
  • sub_484680, which is huge, uses the mutex and semaphore in the network_struct, and then calls the function sub_483D40 which calls sendto, and sub_4867A0 which calls CryptDecrypt. This function looks very important. We will call this the crypt_and_sendto function

Traffic format

The recv function calls recvfrom into a 1500-byte buffer, then checks the following:

  • buf[9] == 0x11
  • There is some header at &buf[4 * (buf[0] & 0xF)]
    • *(word*)(header+4) != 0
    • ntohs(*(word*)(header+4)) is the size of the 8-byte header and the buffer afterwards.
    • (header+2) == '\x11\x04'
  • buf[6] < 0
  • After the headers there is some buf where
    • *(dword*)&buf[0] is the “mode”, explained in greater detail below in Modes
    • *(dword*)&buf[4] is the length
    • &buf[8] is the stuff after that gets checked in crypt_and_sendto.

Struct fields

After analyzing both the recv and the crypt_and_sendto functions, we managed to identify the following struct fields

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
struct network_struct
{
  struct WSAData wsadata0;
  _DWORD socket1;
  _DWORD socket;
  _DWORD dword198_0x12341104;
  _WORD word19C_0x4CA;
  _WORD word19E;
  _DWORD someflag;
  _DWORD hThread_recvfrom;
  _DWORD hThread_cryptsendto;
  net_internal_struct internal_struct;
}

struct net_internal_struct
{
  struct obj_8byte_struct *obj_8byte;
  _DWORD dword1B0;
  _DWORD dword1B4;
  _DWORD dword1B8;
  _DWORD some_sync_number;
  _DWORD semaphore;
  _DWORD mutex;
  _DWORD antidbgobj;
};

Somehow, dword1B0, dword1B4, and dword1B8 are used to share data between the two threads, but it is very hard to tell how exactly because it does stuff like this

*(void **)(*(_DWORD *)(a_network_struct->dword1B0 + 4 * ((dword1B8 / 4) & (dword1B4 - 1))) + 4 * (dword1B8 % 4))

and this

((_BYTE)dword1B8 + (_BYTE)some_sync_number) % 4 == 0
 && dword1B4 <= (unsigned int)(some_sync_number / 4 + 1)

dword1B8 is updated to dword1B8 & (4 * dword1B4 - 1)

v30 = dword1B8 & (4 * dword1B4 - 1) + some_sync_number

4 * ((v30 >> 2) & (dword1B4 - 1)) + dword1B0 is some address

We ended up not figuring out how it worked exactly, but we know its a deque object somehow because the recv function calls sub_485010 which calls sub_4851B0 which has the message deque<T> too long, so this struct is probably part of some C++ deque implementation.

Nevertheless, we can roughly guess which variables are pointers to the received buffer by how they are used, so we don’t have to fully understand this struct.

Modes

After we analyzed and found everything we could about the traffic format and the object struct in the recv function, we can move on to analyze the crypt_and_sendto function.

Mode 1

There is some checking of some “mode” at (code location) 0x4847A1, then if the mode is 1 it goes into sub_483FC0 which contains API calls like GetConsoleWindow, and it makes a call to the crypt function, decrypting 0x118836 bytes starting at 0x637330, using a 16 byte key at 0x74FBC8.

1
2
3
4
5
a = get_bytes(0x637330, 0x118836)
open('rick_ciphertext.bin', 'wb').write(a)

# 558bec64a10000006aff68d421410050
get_bytes(0x74FBC8, 16).hex()

Decrypting bytes in CyberChef with RC4 key gives an image of Rick Astley

Although the output is useless, this very importantly confirms to us that the crypt function uses RC4.

Afterwards the call the crypt function again with another 37 byte block at 0x74FBA0, with the same key, but that gives N3ver_G0nNa_g1ve_y0u_Up@flare-on.com, which we know to be a fake flag.

Mode 2

Mode 2 has a ugly deobfuscation technique that it uses, the same one used to get fake flag 1.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
if ( aL0ve[4] )
{
  v25 = 6;
  do
  {
    aL0ve[v25] = -((*((_BYTE *)&dword_7517A4 + v25 + 3) | ~aL0ve[v25]) & (aL0ve[v25] | ~*((_BYTE *)&dword_7517A4
                                                                                        + v25
                                                                                        + 3)))
               - 4;
    --v25;
  }
  while ( v25 );
}
love_pointer = (byte *)aL0ve;
aL0ve[0] = (~((aL0ve[0] | 'y') & (~aL0ve[0] | 0x86)) & (~((aL0ve[0] | 0xFB) & (~aL0ve[0] | 4)) | 0xF3)) - 3;
_Init_thread_footer(&dword_75179C);

Instead of analyzing and reimplementing it, we can just debug and set our EIP to that location, and run it, then see what comes out from the other side. This should decrypt 4 strings, L0ve, s3cret, 5Ex, and g0d.

Note: you have to run all the way to the end to make sure its fully decrypted. Our initial try was partial and we got c0d instead of g0d in the final string, which prevented us from getting the flag even though by then we had understood the entire decryption logic.

Afterwards it does some checks to make sure the received buffer matches one of the strings, then if so calls sub_4869F0 and puts the result (a DWORD) in one of the offsets of the 16-byte buffer at 0x751680 according to which string it matches, in big endian format.

There is a different hash function in sub_4869F0, which has the following pseudocode

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
sub_4869F0(unsigned __int8 *a_ciphertext, unsigned int a_len, int *a_output)
{
  if ( !some_array[0] )
  {
    for ( i = 0; i < 0x100; ++i )
      some_array[i] = sub_486980(i);
  }
  for ( j = 0; ; ++j )
  {
    result = j;
    if ( j >= a_len )
      break;
    *a_output = some_array[a_ciphertext[j] ^ *(unsigned __int8 *)a_output] ^ ((unsigned int)*a_output >> 8);
  }
  return result;
}

We ran it in the debugger by setting the IP to the start of this function, then running the first for loop, and realized that the array generated matches the one in CRC32, and when we check Wikipedia we realize that the algorithm matches too.

This means that it takes the 4 strings, CRC32 them, then concats the CRC32 values in big endian.

Mode 3

Mode 3 takes the CRC32 buffer generated by Mode 2 then uses it as the key to call the crypt function which we know does RC4 decryption, to decrypt the 39 byte string at 0xD0FB68. Then calls the sendto function but we didn’t analyze that.

Final Solver Script

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
import zlib

def rc4_decrypt(ciphertext, key):
    S = [i for i in range(256)]
    j = 0
    for i in range(256):
        j = (j + S[i] + key[i % len(key)]) % 256
        S[i], S[j] = S[j], S[i]

    i = 0
    j = 0
    plaintext = []

    for c in ciphertext:
        i = (i + 1) % 256
        j = (j + S[i]) % 256
        S[i], S[j] = S[j], S[i]
        K = S[(S[i] + S[j]) % 256]
        plaintext.append(c ^ K)

    return bytes(plaintext)

strings = [
    b'L0ve',
    b's3cret',
    b'5Ex',
    b'g0d',
]

key = b''
for string in strings:
    key += zlib.crc32(string + b'\0').to_bytes(4, 'big')

print(f"key: {key.hex()}")

ciphertext = b'28\xa7\x02p\xdf\xe7+\xf7zw\xf5v)\x1b\xa2\x87\xe4\xc2\xf9S\xcc?n\xe8\x9a\xa6\x82\x0c\xbd\xa4\xd1\x96\xe8z\x89\x00\xc5\xf5'

print(rc4_decrypt(ciphertext, key))

Flag

n0_mOr3_eXcEpti0n$_p1ea$e@flare-on.com
Share on