Thanks drome for sharing his knowledge and skills! He completed all 10 challenges and this series of writeups is done by him :)
Details | Links |
---|---|
Official Challenge Site | https://flare-on.com/ |
Official Challenge Announcement | https://www.fireeye.com/blog/threat-research/2021/08/announcing-the-eighth-annual-flare-on-challenge.html |
Official Solutions | https://www.mandiant.com/resources/flare-on-8-challenge-solutions |
Official Challenge Binaries | http://flare-on.com/files/Flare-On8_Challenges.zip |
09_evil
Mandiant’s unofficial motto is “find evil and solve crime”. Well here is evil but forget crime, solve challenge. Listen kid, RFCs are for fools, but for you we’ll make an exception :)
The challenge has 3 false flags:!t_$uRe_W0u1d_B3_n1ce_huh!@flare-on.com
1s_tHi$_mY_f1aG@flare-on.com
N3ver_G0nNa_g1ve_y0u_Up@flare-on.com
.
7zip password:flare
There is a single file called evil.exe
.
arch x86
baddr 0x400000
binsz 2964480
bintype pe
bits 32
canary false
retguard false
class PE32
cmp.csum 0x002d7f23
compiled Mon Jul 19 12:22:26 2021
crypto false
endian little
havecode true
hdr.csum 0x00000000
laddr 0x0
lang c
linenum false
lsyms false
machine i386
nx true
os windows
overlay false
cc cdecl
pic true
relocs false
signed false
sanitize false
static false
stripped false
subsys Windows CUI
va true
Dynamic Analysis
Disassembling it, we see that IDA can’t even define the main function because of bad opcodes. We run it dynamically, and see that it calls 0x623D0
, which does a null pointer access at 0x62460
. This causes an exception in our debugger, and if we pass it to the process, we get a divide by zero exception at 0x624B0
.
SEH handlers
To understand where we can find the exception handlers, we refer to Practical Malware Analysis (PMA)’s Chapter 15 - Obscuring Flow Control - Missing Structured Exception Handlers (page 344).
To find the SEH chain, the OS examines the FS segment register. This register contains a segment selector that is used to gain access to the Thread Environment Block (TEB). The first structure within the TEB is the Thread Information Block (TIB). The first element of the TIB (and consequently the first bytes of the TEB) is a pointer to the SEH chain. The SEH chain is a simple linked list of 8-byte data structures called EXCEPTION_REGISTRATION records
This explains the following assembly code at the start of main, which adds an additional SEH entry at the top of the SEH chain pointing to sub_A38E4
.
.text:00066450 _main:
.text:00066450 push ebp
.text:00066451 mov ebp, esp
.text:00066453 push 0FFFFFFFFh
.text:00066455 push offset sub_A38E4
.text:0006645A mov eax, large fs:0
.text:00066460 push eax
.text:00066461 mov large fs:0, esp
In the function called by main, at 0x623D0
, we see that it does something very similar, just with the handler sub_A3740
.
However, when we try to put breakpoint at those SEH exception handlers and run, we can’t hit those breakpoints. The only time control gets passed back to our debugger is when the program is running and runs into another exception.
We put a breakpoint at ntdll_KiUserExceptionDispatcher
and hit it once we pass control to the application after the first exception.
We can’t step through into user mode until they throw exceptions, as shown in the following debugger output
624B0: Integer divide by zero (exc.code c0000094, tid 6032)
624E6: Integer divide by zero (exc.code c0000094, tid 6032)
6255F: Integer divide by zero (exc.code c0000094, tid 6032)
625AC: Integer divide by zero (exc.code c0000094, tid 6032)
62F70: Priveleged instruction (exc.code c0000096, tid 6032)
__scrt_common_main_seh
__scrt_common_main_seh
is the second function called in our entry point function.
By luck, we noticed that sub_654B0
gets called a lot during exceptions. We put a breakpoint there and observe that it gets called at the very start of program execution, even before our main
breakpoint is hit.
sub_4820F0
and sub_482130
call sub_654B0
in __scrt_common_main_seh
. We observe its behavior and find that it always returns some API functions. Analyzing it statically, we see that sub_654B0
is some API hash resolver using this hash function
|
|
We can repurpose the hash checker we created in Challenge 7 to resolve these API names statically, as such
|
|
AddVectoredExceptionHandler
is resolved by hash in sub_482130
, then called in sub_482150
, to register the function Handler
(at 0x486AD0
) as a VEH handler. Both functions are called in succession by 0x4B1ED6
in __scrt_common_main_seh
.
This explains why our SEH handlers were not being called — __scrt_common_main_seh
registers the VEH handler, which gets called and transfer control back to the program before it ever reaches the SEH handlers.
VEH handler
Handler
uses VirtualProtect
to change the page permissions to RWX, then changes the two-byte instruction at EIP + 3
to FF D0
, where EIP
is the address of the faulty code, then changes the EIP
at that context to that address, then returns EXCEPTION_CONTINUE_EXECUTION
which causes the program to continue execution at that new address.
FF D0
disassembles into call eax
, and the value of eax is resolved using the hash function resolver, with the hash being the value of ecx at the time of exception. For example, before the first exception, we have the following code
.text:0048243E C7 45 E8+ mov dword ptr [ebp-18h], 66FFF672h
...
.text:0048245B 8B 4D E8 mov ecx, [ebp-18h]
This corresponds to GetSystemTime
.
Control flow obfuscation
Knowing this, we can patch the program so that IDA can define it as a function properly. After every exception-triggering instruction there is usually 3 bytes of bad code that messes IDA up.
List of exception triggering instructions and next 3 instructions
33 C0 8B 00 / 74 03 75
—xor eax, eax; mov eax [eax]
33 C0 8B 00 / EB FF E8
—xor eax, eax; mov eax [eax]
33 C0 F7 F0 / EB 00 EB
—xor eax, eax; div eax
33 C0 F7 F0 / E8 FF D2
—xor eax, eax; div eax
33 C0 F7 F0 / 5B 5D C3
—xor eax, eax; div eax
33 FF F7 F7 / 33 C0 74
—xor edi, edi; div edi
33 F6 F7 F6 / E8 FF D2
—xor esi, esi; div esi
Initially we tried to just patch the last 3 bytes to 90 FF D0
, which was nop; call eax
, but then because of the unrelated call and the exception causing error, the output in IDA looked quite bad, decompilation didn’t even work, and we had to do a lot of manual analysis of the assembly in graph and even non-graph text view.
Then, we realized that each error-producing sequence is 7 bytes, which were just enough for 5 bytes for a call to any function we wanted and then 2 bytes for call eax
. Better yet, the API hash resolving function takes ecx
and edx
as arguments the way the function sets it up. If we patched the opcodes accordingly, we could even get IDA to analyze the function stack frame properly.
We write an IDAPython script to patch all these exception triggering instructions, referring extensively to the IDAPython’s gruesome documentation at https://hex-rays.com/products/ida/support/idapython_docs/ and https://hex-rays.com/products/ida/support/ida74_idapython_no_bc695_porting_guide.shtml. Base address of our program is now 0x480000
.
|
|
We tried to do some manual fixing to correct the function frames, but the graph view and decompiler gave issues, especially with the continuity of the main function after the check for argc
. To fix this and get a nice looking IDB flow, we created a copy of evil.exe
, applied the patches to it, then reanalyzed it using IDA on the freshly patched copy. This gave us a much nicer looking IDB that allowed us to do static analysis nearly entirely in the decompiler view.
The patched binary also runs somewhat, but running from the start gives us errors because the API hash resolving function fails to resolve for some reason, so most of the subsequent analysis was done statically.
Static Analysis
With the obfuscation method understood, we can move on to analyzing the sample functionality.
main
main
calls new
a bunch of times to create a bunch of structures, then calls sub_4823D0
which does a bunch of anti-debugging things.
sub_4823D0
changes the first byte of DbgBreakPoint
to 0xC3
(opcode for ret
), and the first 14 bytes of DbgUiRemoteBreakin
to 6a 00 68 ff ff ff ff B8 <terminate process address> FF D0
, which essentially calls TerminateProcess
. It also does other things but we didn’t analyze it deeply.
main
then calls CreateMutexA
and puts some values into the 0x1CC byte object, then calls CreateThread
to the functions sub_482E70
, sub_482D50
, sub_484310
, and sub_484680
, then calls WaitForMultipleObjects
.
Afterwards main
calls closesocket
and WSACleanup
which strongly suggests that there is network functionality in the executable.
main
also references 0x751698
, which contains fake flag 1, 1s_tHi$_mY_f1aG@flare-on.com
, initialized in sub_481000
which was called as part of the initialization routines in __scrt_common_main_seh
.
Jumping around
We know that network functions like socket
and connect
and send
and recv
are likely called, and probably resolved using the hash API resolver, so we can do binary searches for their function name hashes in the binary to see which code calls them.
For example, the hash for socket
is 0xd5af7bf3
, and searching for it gives us 2 hits in sub_483A70
. connect
, send
, and recv
have no hits. They might be using UDP so we try recvfrom
which gets us a hit in sub_484310
, and sendto
which gets us a hit in sub_483D40
, confirming our suspicions.
Doing some xrefs and combining this with our knowledge of the threads that main
is creating, we have the following information about threads that main creates
sub_482E70
, uses the 0x18 byte object used by the anti-debug function, creates more threads and is a pain to reverse, probably not importantsub_482D50
, also uses 0x18 byte anti-debug object, also probably unimportant
After that main calls sub_483A70
which takes argv[1]
as the IP address, and does network startup function calls like WSAStartup
, socket
, and bind
, and puts it into the 0x1CC byte object (not a real object because it doesn’t contain methods, probably just a struct), so we will keep track of that object, calling it network_struct
, and take note of the 2 threads that use that struct
sub_484310
, which uses thenetwork_struct
and callsrecvfrom
, so this function is probably important and should be more deeply analyzed. We will call this therecv
functionsub_484680
, which is huge, uses the mutex and semaphore in thenetwork_struct
, and then calls the functionsub_483D40
which callssendto
, andsub_4867A0
which callsCryptDecrypt
. This function looks very important. We will call this thecrypt_and_sendto
function
Traffic format
The recv
function calls recvfrom
into a 1500-byte buffer, then checks the following:
buf[9] == 0x11
- There is some header at
&buf[4 * (buf[0] & 0xF)]
*(word*)(header+4) != 0
ntohs(*(word*)(header+4))
is the size of the 8-byte header and the buffer afterwards.(header+2) == '\x11\x04'
buf[6] < 0
- After the headers there is some buf where
*(dword*)&buf[0]
is the “mode”, explained in greater detail below in Modes*(dword*)&buf[4]
is the length&buf[8]
is the stuff after that gets checked incrypt_and_sendto
.
Struct fields
After analyzing both the recv
and the crypt_and_sendto
functions, we managed to identify the following struct fields
|
|
Somehow, dword1B0
, dword1B4
, and dword1B8
are used to share data between the two threads, but it is very hard to tell how exactly because it does stuff like this
*(void **)(*(_DWORD *)(a_network_struct->dword1B0 + 4 * ((dword1B8 / 4) & (dword1B4 - 1))) + 4 * (dword1B8 % 4))
and this
((_BYTE)dword1B8 + (_BYTE)some_sync_number) % 4 == 0
&& dword1B4 <= (unsigned int)(some_sync_number / 4 + 1)
dword1B8 is updated to dword1B8 & (4 * dword1B4 - 1)
v30 = dword1B8 & (4 * dword1B4 - 1) + some_sync_number
4 * ((v30 >> 2) & (dword1B4 - 1)) + dword1B0 is some address
We ended up not figuring out how it worked exactly, but we know its a deque object somehow because the recv
function calls sub_485010
which calls sub_4851B0
which has the message deque<T> too long
, so this struct is probably part of some C++ deque implementation.
Nevertheless, we can roughly guess which variables are pointers to the received buffer by how they are used, so we don’t have to fully understand this struct.
Modes
After we analyzed and found everything we could about the traffic format and the object struct in the recv
function, we can move on to analyze the crypt_and_sendto
function.
Mode 1
There is some checking of some “mode” at (code location) 0x4847A1
, then if the mode is 1 it goes into sub_483FC0
which contains API calls like GetConsoleWindow
, and it makes a call to the crypt function, decrypting 0x118836
bytes starting at 0x637330
, using a 16 byte key at 0x74FBC8
.
|
|
Although the output is useless, this very importantly confirms to us that the crypt function uses RC4.
Afterwards the call the crypt function again with another 37 byte block at 0x74FBA0, with the same key, but that gives N3ver_G0nNa_g1ve_y0u_Up@flare-on.com
, which we know to be a fake flag.
Mode 2
Mode 2 has a ugly deobfuscation technique that it uses, the same one used to get fake flag 1.
|
|
Instead of analyzing and reimplementing it, we can just debug and set our EIP to that location, and run it, then see what comes out from the other side. This should decrypt 4 strings, L0ve
, s3cret
, 5Ex
, and g0d
.
Note: you have to run all the way to the end to make sure its fully decrypted. Our initial try was partial and we got c0d
instead of g0d
in the final string, which prevented us from getting the flag even though by then we had understood the entire decryption logic.
Afterwards it does some checks to make sure the received buffer matches one of the strings, then if so calls sub_4869F0
and puts the result (a DWORD) in one of the offsets of the 16-byte buffer at 0x751680
according to which string it matches, in big endian format.
There is a different hash function in sub_4869F0
, which has the following pseudocode
|
|
We ran it in the debugger by setting the IP to the start of this function, then running the first for loop, and realized that the array generated matches the one in CRC32, and when we check Wikipedia we realize that the algorithm matches too.
This means that it takes the 4 strings, CRC32 them, then concats the CRC32 values in big endian.
Mode 3
Mode 3 takes the CRC32 buffer generated by Mode 2 then uses it as the key to call the crypt function which we know does RC4 decryption, to decrypt the 39 byte string at 0xD0FB68
. Then calls the sendto
function but we didn’t analyze that.
Final Solver Script
|
|
Flag
n0_mOr3_eXcEpti0n$_p1ea$e@flare-on.com