In recent few months I have interviewed on lots of small to large Antivirus and other security domain companies for Malware analysis or Security Researchers positions. I have noticed that unlike programming interview, questions asked here are repetitive or related to one another. But I haven't able to find any online resource which covers all those topics. Hence, I decided to write about these common questions that has mostly been asked in interviews for malware/threat researcher positions.
I have tried to answer these question in as much as possible depth, hence if you want to get familiar with some particular topic of malware analysis then you can refer related questions here. I am also going to add external useful resources that you can use to get more familiar with a particular topic.
One thing I want to mention is since I mostly apply for Entry level to few years experience positions, so these questions may not helpful for you if you are applying for more Senior level positions. Although I have added some advance level questions and few questions from my side that interviewers should have to ask.
Whenever I give a interview in past I usually write down the new questions I face and through that I have collected a huge stack of questions on different topic which is been asked many times. I constantly face some foolish question or irrelevant questions which I am also going to add here because you may also face it someday. I will also try to update this list constantly so that new questions get added time to time.
Portable Executable(PE) Header
I think more then 50% of questions in an interview process are related to PE header. So, you can easily dominate an interview if you have depth knowledge of PE file header. PE file is the windows executable file format. Like any other file format, it also has header embedded at the starting bytes which is used to store details about it, known as PE header.
Resources to learn PE Header
In my knowledge the best place where you can learn about PE header is through the Life of Binaries course present at opensecuritytraining.info site.
You can also use windows official documentation for getting familiar with this thing.
Relevant Questions
1) Can you give overview of PE Header?
Variant: Can you explain the PE Header?
In questions like this the interviewer is not expecting you to answer everything/every field about PE Header but only key structures and there use and maybe few important fields from all structures need to be said. But the answer of this question completely depends on you, how much in depth you want to answer. But I am writing how I generally answer.
Answer:
First you must have to remember all the headers structure inside PE file.
- Dos Header (_IMAGE_DOS_HEADER)
- NT Header (_IMAGE_NT_HEADERS)
- File Header (_IMAGE_FILE_HEADER) (Inside NT header)
- Optional Header (_IMAGE_OPTIONAL_HEADER) (Inside NT header)
- Section Headers (_IMAGE_SECTION_HEADER) (one for each section)
Short answer:
First we have DOS header, after that NT header. Inside NT header two other header are embedded called file header and optional header. After optional header is array of 16 structure called as data directory. Then there is section header just after that, which are there for each section present in PE image.
Long answer:
First we have DOS header containing field like magic value(Set to MZ
in hex) and few other non useful fields. Then there is NT header whose first field is Signature
which set to PE
in hex. Then there are two headers embedded inside it. First is File header, containing important fields like number of sections, file characteristics and more. Then there is Optional Header which has important field like ImageBase
, Address of Entry Point
etc. The last entry in optional header is a array of 16 structure. Each structure point to a specific Directory structure required by loader like Debug Directory, Import directory etc. Then at last we have one section header for each sections like for .text
, .rdata
, .bss
sections.
2) In which header you can find field Address of entry point?
Variant: In which header is field XYZ is present?
These type of questions are something you need to remember by yourself.
Answer:
In case of Address of entry point it is present in Optional header.
Cheat: If your interview is telephonic or through video call you can use this cheat sheet for reference. For normal purpose (as quick reference) this cheat sheet is very helpful.
3) What hex values "Magic" field in Optional header is set to?
Answer:
Magic field in Option header shows if the PE file support 32 bit machine or 64 bit machine. Its value set to 010B
for PE32 and 020B
for PE64.
It's not necessary to remember these values but it may be asked sometime. So its better to remember only important field's value like the above one, machine
in File header and e_magic
in DOS header.
4) How to determine total size of header in disk?
Variant: In which header size Of XYZ field can be found?
All size related fields are present in optional header except SizeOfOptionalHeader which is present in File header.
Answer:
SizeOfHeader
in optional header field shows the total size of header in disk.
5) How many sections are possible in a PE file?
Answer:
Inside File header NumberOfSections
field show the number of section possible. Since it is a Word value(2 bytes) maximum sections possible is 0 to 65,535.
6) How to know if a PE file is executable of dll?
Answer:
In File Header there is Characteristics structure which contain properties of that image. One of those property IMAGE_FILE_EXECUTABLE_IMAGE
is set to 1 if the file is executable and IMAGE_FILE_DLL
is set to 1 if the file is dll(dynamic linking library).
There can be more questions related to Characteristics in File header as well as Optional Header. For example few important field to remember are:
IMAGE_FILE_LARGE_ADDRESS_ AWARE
- Application can handle > 2-GB addresses.
IMAGE_FILE_DEBUG_STRIPPED
- Debugging information is removed from the image file.
For OptionalHeader Characteristics:
IMAGE_DLLCHARACTERISTICS_DYNAMIC_BASE
- ASLR is supported or not.
IMAGE_DLLCHARACTERISTICS_NX_COMPAT
- Support non execution of stack.
IMAGE_DLLCHARACTERISTICS_ NO_SEH
- No SE handler may be called in this image.
7) What is the difference between RVA(Relative virtual address) and AVA(Absolute virtual address)?
Variant: What will be the RVA if ImageBase is 400000 and VA is 400100?
Answer:
AVA(also called as VA) is the original address in the virtual memory. whereas RVA is the relative address with respect to the ImageBase
. In calculation:
RVA = AVA - ImageBase
Means for AVA = 400100 and ImageBase = 400000, RVA will be 100.
8) What is Import Address Table(IAT) used for?
Answer:
IAT contains the address and few other information of all dll's that needed to be imported by that image.
Questions related to IAT has been asked a lot. Even for your malware analysis career, you must be familiar with imports and exports. More info on this in next few questions.
9) What is difference between IAT and INT( Import Names Table)?
Answer:
In OptionalHeader.DataDirectory[16]
, the second structure in the array points to _IMAGE_IMPORT_DESCRIPTOR
.
_IMAGE_IMPORT_DESCRIPTOR
is a structure that is present for each dll that needed to import. At 0x0c
in import descriptor name
field value set to name of dll (eg kernel32.dll).
Here OriginalFirstThunk
points to Import Names table and FirstThunk
points to IAT. INT points to array of names of functions (_IMAGE_IMPORT_BY_NAME) and IAT points to array of address of functions (_IMAGE_THUNK_DATA).
So, as an answer you need to tell that INT points to names of function whereas IAT points to address of function in the memory.
10) Why we have two different Imports table(IAT and INT) but they both point to same structure in disk?
Answer:
Initially when the image is not loaded in the memory the loader doesn't know the address of functions so it point IAT to the INT only.
But when image is loaded in memory it resolve the address of functions using INT entries and point IAT to the address.
11) What is TLS Callback?
Variant: If you find out a malware doing malicious activity before its main function(Or entry point) runs. Where will you look for its code in disassembler?
Answer:
TLS Callback is Address of Callbacks( functions that are generally stored on .tls section) that are executed when a process or thread is started or stopped. Since, windows loader first create a thread for the process to run. The code in TLS Callback runs even before the program reach at entry point.
Malwares use these functions/Callbacks to store there malicious code or Anti-Debug methods. It makes malware analyst confused while they are debugging the code since they first break at EntryPoint but the malicious code is already executed.
12) What is difference between import table and export table?
Answer:
You already know about import table.
Export table contain details about functions that the image exports to use by other program.
13) What information .pdb file contain?
Variant: How the disassembler know the names of functions and variables when an image is loaded?
Answer:
.pdb file contain debugging information of the program in windows. These debugging information have symbols of variables and functions.
In Debug Directory _IMAGE_DEBUG_DIRECTORY
address of Raw Data field points to debug information.
At particular offset of Debug data is the path of .pdb file associated with the image.
14) What is difference between SizeOfRawData and VirtualSize in section header?
Answer:
VirtualSize
is the total size of a section when loaded into memory. Whereas SizeOfRawData
is size of the section when the image is in disk.
15) What is the use of .reloc section?
Answer:
.reloc
section contain relocation information for where to modify hard coded addresses which assume that the code was loaded at its preferred base address (defined with ImageBase) in memory.
16) Can .rsrc section can have executable permission?
Answer:
The permissions of a section is defined in Characteristics structure. So any section's permission can be changed to make it executable.
Interviewers ask specifically for .rsrc section because lots of malware( ex: stuxnet) embedded their code or whole binary inside the .rsrc section. Although .rsrc is used to store icons or other graphical resources.
Let's look at few miscellaneous PE header related questions
17) What tool you usually use to view PE Header?
Answer:
I personally use CFF Explorer and PEView but your favorites may differ.
18) What is MZ stands for in PE file?
Answer:
'MZ' stands for Mark Zbikowski who developed MS-DOS and PE format.
Here ends the questions related to PE Header. In next part we will look at assembly language related questions.