Chapter Six focused on code constructs and how analysts can easily identify them when walking through the disassembly in IDA. Let’s take a look at the exercises now.

Exercise 1

HashName
6abde2f83015f066385d27cff6143c44Lab06-01.exe
536e6f91d4515e30af7afd37f22c213fee152126Lab06-01.exe
fe30f280b1d0a5e9cef3324c2e8677f55a6202599d489170ece125f3cd843a03Lab06-01.exe

Question Number 1: What is the major code construct found in the only subroutine called by main?

Let’s get to work. The start function is located at the address 00401090. It’s recommended that you don’t really fall into the disassembly generated by IDA here. Through some quick tracking, we can identify the main function which is located at 00401040.

This function further calls a function at 00401000 which has an If conditional construct which further calls a function based on the output of the API call to InternetGetConnectedState.

Function referencing InternetGetConnectedState

Function referencing InternetGetConnectedState

Based on the connection’s status, an offset is pushed to the stack and the function at sub_40105F is called.

Question Number 2: What is the subroutine located at 0x40105F?

Sadly, I was unable to answer this question correctly. After hours of tracking the arguments through the stack and going through function calls trying to understand it, I gave in. My initial suspicions were indeed that the function would somehow use the string offset pushed before (the output of InternetGetConnectedState) and print or write it to a file. I saw a few WriteFile functions as well but the mere size of the disassembly made it quite difficult to truly identify the purpose of the binary.

The Wild Disassembly

The Wild Disassembly

Well, the function was printf. The authors explained how a string offset being pushed to the stack right before a function call is a pretty good indicator the function could be printing the string. Sadly, IDA didn’t recognize the function call itself and hours were lost trying to disassemble a simple function.

Question Number 3: What is the purpose of this program?

Using basic static analysis, we can identify that the program:

  • Identifies the active internet connection status of the computer (and prints it as we’ve seen in the function disassembly above)
Referenced Libraries

Referenced Libraries

Strings

Strings

It could potentially be used by a different malware to check the connection status of a computer. I’ll avoid digging in to this binary any further (primarily due to my lack of interest after the blow from question number 2, haha).

Exercise 2

HashName
c0b54534e188e1392f28d17faff3d454Lab06-02.exe
bb6f01b1fef74a9cfc83ec2303d14f492a671f3cLab06-02.exe
b71777edbf21167c96d20ff803cbcb25d24b94b3652db2f286dcd6efd3d8416aLab06-02.exe

Question Number 1: What operation does the first subroutine called by main perform?

The disassembly points to the start function at 004011B0. We can find the main function at 00401130. However, the disassembly doesn’t read the function properly. We can change the name to main and the arguments should be adjusted automatically.

Main Function

Main Function

The first function appears to be sub_401000 which is the same function as LAB6-01. It checks whether the system has an active internet connection or not.

Check Internet Status

Checks Internet Status

Question Number 2: What is the subroutine located at 0x40117F?

Heh, the authors sure do love to challenge us. Yes, this time I can successfully understand that this function is indeed a printf call. Why?

  • Arguments before the function call are string offsets (with line endings suggesting they might be printed to console or a file)
  • Format characters like %d or %c
  • Code constructs similar to what we saw in the last exercise
Printf Function

Printf Function

Question Number 3: What does the second subroutine called by main do?

The second subroutine called by main is at the address 00401040.

It appears to establish a connection to practicalmalwareanalysis.com to access the file cc.htm and reads its content 200 bytes a time.

Establish Connection to URL

Establish Connection to URL

The parsing is done such that the program attempts to compares the first few characters of the array (content read from the top of the webpage and currently stored in the Buffer) against the identifier for a comment ('<!–'). If a comment is successfully parsed, the following character is stored in a register and is rendered as the command to be executed.

Parsing an HTML Comment

Parsing an HTML Comment

Question Number 4: What type of code construct is used in this subroutine?

If it’s the second subroutine in question, they’re a series of if-else-if conditionals. When the call to InternetReadFile is made and an HTML file is downloaded, the buffer’s first four elements (wherein the file’s content are temporarily placed) undergo a series of comparisons to identify a comment; the fifth character is the actual command to be used by this program to continue the execution.

PS: I learned how to fix stack variables and change their types using this exercise’s solutions. See how the variable comparisons by default don’t show that the variables being compared are simple increments of 1 into the buffer character array. We can fix this by properly configuring the type of Buffer to be an array of size 200H (number of bytes to read is pushed to stack before so we know Buffer is 512 bytes in size) which is equivalent to 512. Once done, you’ll see how IDA does its magic and properly traverses the buffer array without adding in useless variables to cater the counter variables.

Buffer as Character Array of Size 512

Buffer as Character Array of Size 512

Question Number 5: Are there any network-based indicators for this program?

The command acquisition function called by main has several network calls using which we can acquire the following indicators:

Question Number 6: What is the purpose of this malware?

The malware can be used to check the connection status of the compromised system as well as acquire commands from the C2 server (based on the provided URL and the file therein) and display them to the console.

Exercise 3

HashName
3f8e2b945deba235fa4888682bd0d640Lab06-03.exe
d4e234ec4baf7d12dd59c3a9238326819a509a31Lab06-03.exe
75eb05679a0a988dddf8badfc6d5996cc7e372c73e1023dde59efbaab6ece655Lab06-03.exe

Question Number 1: Compare the calls in main to Lab 6-2’s main method. What is the new function called from main?

There’s just one additional function call in main here. It’s located at the address, 00401130

Main Function

9_extraCall

Question Number 2: What parameters does this new function take?

It takes two parameters:

  • lpExistingFilename (which if we backtrack is the first argument of the argv array)
  • Character (from the function call we disassembled and analyzed in the last exercise to traverse the switch)

Question Number 3: What major code construct does this function contain?

The major code construct in this function is a switch (along with jump tables).

Switch Cases

Switch Cases

Question Number 4: What can this function do?

Depending on the character input provided to the function, the switch can:

  • Create a directory
  • Copy a file
  • Delete a file
  • Set a Registry key value
  • Sleep (100)
  • Default Case: Print an error

Question Number 5: Are there any host-based indicators for this malware?

The function with the switch has several host-based indicators which we can use to drive detections. They’re listed below:

  • Directory: C:\Temp\
  • Filename: CC.exe
  • Registry Key: Software\Microsoft\Windows\CurrentVersion\Run
  • Registry Subkey: Malware

Question Number 6: What is the purpose of this malware?

The purpose of this malware is to check an active network connection, download an HTML file, and parse a comment from it. Then, based on the command from the server, it will either create a directory, copy the malware, delete it, or set it in registry to persist in the Run key (which is how malware re-executes when the system is rebooted and the user logs in).

Exercise 4

HashName
21be74dfafdacaaab1c8d836e2186a69Lab06-04.exe
5b0afb3069346a8e00b3786af0908783a5f304b4Lab06-04.exe
cce96e5cb884c565c75960c41f53a7b56cef1a3ff5b9893cd81c390fd0c35ef3Lab06-04.exe

Question Number 1: What is the difference between the calls made from the main method in Labs 6-3 and 6-4?

First thing’s first – let’s push some of our previously learned modifications so we don’t repeat our analysis.

Structural changes can be noticed in the main function. Here’s a list of addresses of functions called from inside of main:

  • 0x00401000 (Checks internet connection status)
  • 0x00401040 (Connect to internet, acquire HTML, parse the command from comment)
  • 0x004012B5 (Printf)
  • 0x00401150 (Switch + Jump-tables)

Question Number 2: What new code construct has been added to main?

It’s a for loop added to the main function. This will help the function loop over the network call procedure and acquire a new command for one of the switch cases to execute.

For Loop

For Loop

Question Number 3: What is the difference between this lab’s parse HTML function and those of the previous labs?

It appears to be that the user-agent string this time takes in a format character, %d used for digits, to perhaps “do” something to the string. After another hour of analysis, I just had a flashback as the format character might be pointing to something here. Here’s my thought process:

  • %d suggests a format character was taken in; could it be printing the value?
  • printf is already labeled, what function could it be?

Turns out; sprintf is yet another C function which takes in a format character but behaves a tad-bit differently than printf. Rather than printing it to the console or standard output, the function generates a string (takes in a string with a format character) and stores it in the buffer. In our case, I’ve changed the function name and parameters are they should be labeled:

sprintf

sprintf

Question Number 4: How long will this program run? (Assume that it is connected to the Internet.)

To answer this one, let’s go back to the main function. We see a loop which is initialized to 0 and runs till 1440. Later, a sleep command follows with the time set to 60 seconds so every loop runs for a minute (or until a minute? However you want to put it). So, the final time is 1440 hours which is an equivalent to 24 hours.

Sleep Function

Sleep Function

Question Number 5: Are there any new network-based indicators for this malware?

C2 URL and filename indicators are the same. Only difference here is the new User Agent string being generated on run-time.

User-agent String: Internet Explorer 7.50/ pma%d (%d is the active minute or the variable used in the loop)

Question Number 6: What is the purpose of this malware?

Now that we’re at the final question of this entire chapter; let’s summarize it.

The malware first attempts to check for an active network connection. If it does find it, it prints it to the console and then attempts to connect over to the Internet at the specified address, pull the HTML file, parse a comment, and use the characters from within the comment to execute various commands through a Switch table. Here, what’s unique is that the user-agent string used to connect to the Internet is dynamically generated based on the minute of execution rather than a static string. It runs for a total of 24 hours and exits soon after. It’s likely going to continue the infection using the malware it might’ve copied/downloaded to disk.