Thanks drome for sharing his knowledge and skills! He completed all 10 challenges and this series of writeups is done by him :)
|Official Challenge Site||https://flare-on.com/|
|Official Challenge Announcement||https://www.fireeye.com/blog/threat-research/2021/08/announcing-the-eighth-annual-flare-on-challenge.html|
|Official Challenge Binaries||http://flare-on.com/files/Flare-On8_Challenges.zip|
Mandiant’s unofficial motto is “find evil and solve crime”. Well here is evil but forget crime, solve challenge. Listen kid, RFCs are for fools, but for you we’ll make an exception :)
The challenge has 3 false flags:
There is a single file called
arch x86 baddr 0x400000 binsz 2964480 bintype pe bits 32 canary false retguard false class PE32 cmp.csum 0x002d7f23 compiled Mon Jul 19 12:22:26 2021 crypto false endian little havecode true hdr.csum 0x00000000 laddr 0x0 lang c linenum false lsyms false machine i386 nx true os windows overlay false cc cdecl pic true relocs false signed false sanitize false static false stripped false subsys Windows CUI va true
Disassembling it, we see that IDA can’t even define the main function because of bad opcodes. We run it dynamically, and see that it calls
0x623D0, which does a null pointer access at
0x62460. This causes an exception in our debugger, and if we pass it to the process, we get a divide by zero exception at
To understand where we can find the exception handlers, we refer to Practical Malware Analysis (PMA)’s Chapter 15 - Obscuring Flow Control - Missing Structured Exception Handlers (page 344).
To find the SEH chain, the OS examines the FS segment register. This register contains a segment selector that is used to gain access to the Thread Environment Block (TEB). The first structure within the TEB is the Thread Information Block (TIB). The first element of the TIB (and consequently the first bytes of the TEB) is a pointer to the SEH chain. The SEH chain is a simple linked list of 8-byte data structures called EXCEPTION_REGISTRATION records
This explains the following assembly code at the start of main, which adds an additional SEH entry at the top of the SEH chain pointing to
.text:00066450 _main: .text:00066450 push ebp .text:00066451 mov ebp, esp .text:00066453 push 0FFFFFFFFh .text:00066455 push offset sub_A38E4 .text:0006645A mov eax, large fs:0 .text:00066460 push eax .text:00066461 mov large fs:0, esp
In the function called by main, at
0x623D0, we see that it does something very similar, just with the handler
However, when we try to put breakpoint at those SEH exception handlers and run, we can’t hit those breakpoints. The only time control gets passed back to our debugger is when the program is running and runs into another exception.
We put a breakpoint at
ntdll_KiUserExceptionDispatcher and hit it once we pass control to the application after the first exception.
We can’t step through into user mode until they throw exceptions, as shown in the following debugger output
624B0: Integer divide by zero (exc.code c0000094, tid 6032) 624E6: Integer divide by zero (exc.code c0000094, tid 6032) 6255F: Integer divide by zero (exc.code c0000094, tid 6032) 625AC: Integer divide by zero (exc.code c0000094, tid 6032) 62F70: Priveleged instruction (exc.code c0000096, tid 6032)
__scrt_common_main_seh is the second function called in our entry point function.
By luck, we noticed that
sub_654B0 gets called a lot during exceptions. We put a breakpoint there and observe that it gets called at the very start of program execution, even before our
main breakpoint is hit.
__scrt_common_main_seh. We observe its behavior and find that it always returns some API functions. Analyzing it statically, we see that
sub_654B0 is some API hash resolver using this hash function
We can repurpose the hash checker we created in Challenge 7 to resolve these API names statically, as such
AddVectoredExceptionHandler is resolved by hash in
sub_482130, then called in
sub_482150, to register the function
0x486AD0) as a VEH handler. Both functions are called in succession by
This explains why our SEH handlers were not being called —
__scrt_common_main_seh registers the VEH handler, which gets called and transfer control back to the program before it ever reaches the SEH handlers.
VirtualProtect to change the page permissions to RWX, then changes the two-byte instruction at
EIP + 3 to
FF D0, where
EIP is the address of the faulty code, then changes the
EIP at that context to that address, then returns
EXCEPTION_CONTINUE_EXECUTION which causes the program to continue execution at that new address.
FF D0 disassembles into
call eax, and the value of eax is resolved using the hash function resolver, with the hash being the value of ecx at the time of exception. For example, before the first exception, we have the following code
.text:0048243E C7 45 E8+ mov dword ptr [ebp-18h], 66FFF672h ... .text:0048245B 8B 4D E8 mov ecx, [ebp-18h]
This corresponds to
Control flow obfuscation
Knowing this, we can patch the program so that IDA can define it as a function properly. After every exception-triggering instruction there is usually 3 bytes of bad code that messes IDA up.
List of exception triggering instructions and next 3 instructions
33 C0 8B 00 / 74 03 75—
xor eax, eax; mov eax [eax]
33 C0 8B 00 / EB FF E8—
xor eax, eax; mov eax [eax]
33 C0 F7 F0 / EB 00 EB—
xor eax, eax; div eax
33 C0 F7 F0 / E8 FF D2—
xor eax, eax; div eax
33 C0 F7 F0 / 5B 5D C3—
xor eax, eax; div eax
33 FF F7 F7 / 33 C0 74—
xor edi, edi; div edi
33 F6 F7 F6 / E8 FF D2—
xor esi, esi; div esi
Initially we tried to just patch the last 3 bytes to
90 FF D0, which was
nop; call eax, but then because of the unrelated call and the exception causing error, the output in IDA looked quite bad, decompilation didn’t even work, and we had to do a lot of manual analysis of the assembly in graph and even non-graph text view.
Then, we realized that each error-producing sequence is 7 bytes, which were just enough for 5 bytes for a call to any function we wanted and then 2 bytes for
call eax. Better yet, the API hash resolving function takes
edx as arguments the way the function sets it up. If we patched the opcodes accordingly, we could even get IDA to analyze the function stack frame properly.
We write an IDAPython script to patch all these exception triggering instructions, referring extensively to the IDAPython’s gruesome documentation at https://hex-rays.com/products/ida/support/idapython_docs/ and https://hex-rays.com/products/ida/support/ida74_idapython_no_bc695_porting_guide.shtml. Base address of our program is now
We tried to do some manual fixing to correct the function frames, but the graph view and decompiler gave issues, especially with the continuity of the main function after the check for
argc. To fix this and get a nice looking IDB flow, we created a copy of
evil.exe, applied the patches to it, then reanalyzed it using IDA on the freshly patched copy. This gave us a much nicer looking IDB that allowed us to do static analysis nearly entirely in the decompiler view.
The patched binary also runs somewhat, but running from the start gives us errors because the API hash resolving function fails to resolve for some reason, so most of the subsequent analysis was done statically.
With the obfuscation method understood, we can move on to analyzing the sample functionality.
new a bunch of times to create a bunch of structures, then calls
sub_4823D0 which does a bunch of anti-debugging things.
sub_4823D0 changes the first byte of
0xC3 (opcode for
ret), and the first 14 bytes of
6a 00 68 ff ff ff ff B8 <terminate process address> FF D0, which essentially calls
TerminateProcess. It also does other things but we didn’t analyze it deeply.
main then calls
CreateMutexA and puts some values into the 0x1CC byte object, then calls
CreateThread to the functions
sub_484680, then calls
WSACleanup which strongly suggests that there is network functionality in the executable.
main also references
0x751698, which contains fake flag 1,
1s_tHi$_mY_f1aG@flare-on.com, initialized in
sub_481000 which was called as part of the initialization routines in
We know that network functions like
recv are likely called, and probably resolved using the hash API resolver, so we can do binary searches for their function name hashes in the binary to see which code calls them.
For example, the hash for
0xd5af7bf3, and searching for it gives us 2 hits in
recv have no hits. They might be using UDP so we try
recvfrom which gets us a hit in
sendto which gets us a hit in
sub_483D40, confirming our suspicions.
Doing some xrefs and combining this with our knowledge of the threads that
main is creating, we have the following information about threads that main creates
sub_482E70, uses the 0x18 byte object used by the anti-debug function, creates more threads and is a pain to reverse, probably not important
sub_482D50, also uses 0x18 byte anti-debug object, also probably unimportant
After that main calls
sub_483A70 which takes
argv as the IP address, and does network startup function calls like
bind, and puts it into the 0x1CC byte object (not a real object because it doesn’t contain methods, probably just a struct), so we will keep track of that object, calling it
network_struct, and take note of the 2 threads that use that struct
sub_484310, which uses the
recvfrom, so this function is probably important and should be more deeply analyzed. We will call this the
sub_484680, which is huge, uses the mutex and semaphore in the
network_struct, and then calls the function
CryptDecrypt. This function looks very important. We will call this the
recv function calls
recvfrom into a 1500-byte buffer, then checks the following:
buf == 0x11
- There is some header at
&buf[4 * (buf & 0xF)]
*(word*)(header+4) != 0
ntohs(*(word*)(header+4))is the size of the 8-byte header and the buffer afterwards.
(header+2) == '\x11\x04'
buf < 0
- After the headers there is some buf where
*(dword*)&bufis the “mode”, explained in greater detail below in Modes
*(dword*)&bufis the length
&bufis the stuff after that gets checked in
After analyzing both the
recv and the
crypt_and_sendto functions, we managed to identify the following struct fields
dword1B8 are used to share data between the two threads, but it is very hard to tell how exactly because it does stuff like this
*(void **)(*(_DWORD *)(a_network_struct->dword1B0 + 4 * ((dword1B8 / 4) & (dword1B4 - 1))) + 4 * (dword1B8 % 4))
((_BYTE)dword1B8 + (_BYTE)some_sync_number) % 4 == 0 && dword1B4 <= (unsigned int)(some_sync_number / 4 + 1) dword1B8 is updated to dword1B8 & (4 * dword1B4 - 1) v30 = dword1B8 & (4 * dword1B4 - 1) + some_sync_number 4 * ((v30 >> 2) & (dword1B4 - 1)) + dword1B0 is some address
We ended up not figuring out how it worked exactly, but we know its a deque object somehow because the
recv function calls
sub_485010 which calls
sub_4851B0 which has the message
deque<T> too long, so this struct is probably part of some C++ deque implementation.
Nevertheless, we can roughly guess which variables are pointers to the received buffer by how they are used, so we don’t have to fully understand this struct.
After we analyzed and found everything we could about the traffic format and the object struct in the
recv function, we can move on to analyze the
There is some checking of some “mode” at (code location)
0x4847A1, then if the mode is 1 it goes into
sub_483FC0 which contains API calls like
GetConsoleWindow, and it makes a call to the crypt function, decrypting
0x118836 bytes starting at
0x637330, using a 16 byte key at
Although the output is useless, this very importantly confirms to us that the crypt function uses RC4.
Afterwards the call the crypt function again with another 37 byte block at 0x74FBA0, with the same key, but that gives
N3ver_G0nNa_g1ve_y0u_Up@flare-on.com, which we know to be a fake flag.
Mode 2 has a ugly deobfuscation technique that it uses, the same one used to get fake flag 1.
Instead of analyzing and reimplementing it, we can just debug and set our EIP to that location, and run it, then see what comes out from the other side. This should decrypt 4 strings,
Note: you have to run all the way to the end to make sure its fully decrypted. Our initial try was partial and we got
c0d instead of
g0d in the final string, which prevented us from getting the flag even though by then we had understood the entire decryption logic.
Afterwards it does some checks to make sure the received buffer matches one of the strings, then if so calls
sub_4869F0 and puts the result (a DWORD) in one of the offsets of the 16-byte buffer at
0x751680 according to which string it matches, in big endian format.
There is a different hash function in
sub_4869F0, which has the following pseudocode
We ran it in the debugger by setting the IP to the start of this function, then running the first for loop, and realized that the array generated matches the one in CRC32, and when we check Wikipedia we realize that the algorithm matches too.
This means that it takes the 4 strings, CRC32 them, then concats the CRC32 values in big endian.
Mode 3 takes the CRC32 buffer generated by Mode 2 then uses it as the key to call the crypt function which we know does RC4 decryption, to decrypt the 39 byte string at
0xD0FB68. Then calls the
sendto function but we didn’t analyze that.
Final Solver Script