Introducing MIDNIGHTTRAIN - A Covert Stage-3 Persistence Framework weaponizing UEFI variables
Reading Time: 16 minutes
It is available here: https://github.com/slaeryan/MIDNIGHTTRAIN
Fair warning: This has been made as a small weekend project and it has received limited testing so bugs are to be expected. Furthermore, I’m not a professional coder so get ready to read crappy code! However, I am willing to fix bugs in my spare time so if you find any or even improvements in the code don’t feel shy to hit me up!
One of my favourite pastimes is to read APT reports and look for the interesting TTP(Tactics, Techniques and Procedures) used by apex adversaries(Read as: State-backed Threat Groups) in a later attempt to recreate it or at least a variant of it in my lab.
So last Friday was no different, except that this time I was going through the CIA Vault7 leaks specifically the EDB branch documents when suddenly this document came to my attention which describes the theory behind NVRAM variables. Immediately it piqued my interest and I started digging deeper.
Turns out, these variables are not only writable/readable from the User-mode(Ring-3) but also it’s an awesome place to hide your shit like egress implants, config data, stolen goodies, encryption keys and whatnot which I found out after watching an enlightening DEFCON talk, thanks to Topher Timzen and Michael Leibowitz.
In the talk, they do give a demo using C# but the attendees are encouraged to figure out their own way to weaponize this technique.
So it got me thinking of various ways to weaponize this and suddenly I remembered glossing over a report by ESET some time back describing an alleged CIA implant(Ironically again!) named DePriMon which registered as the default print monitor to achieve persistence on the host(hence the name).
That was the birth of the MIDNIGHTTRAIN framework. Over the next two days, I spent time coding it and then a couple of more hours for cleaning up and writing this post.
Of NVRAM variables, Print Monitors, Execution Guardrails with DPAPI, Thread Hijacking etc. oh my my!
For the uninitiated readers, don’t be scared of these buzzwords for I can guarantee you that this is absolutely nothing to be scared of and I shall attempt to explain each of these individual components(and the motivation behind it) one by one.
But first, let’s go through these basic concepts.
Initiated readers, feel free to skip this part and move on directly to the framework architecture.
I am not going to bore you with the theory, for that is not my goal. Just know that all modern UEFI machines use these variables to store important boot-time data and various other vendor-specific data in the flash memory. Needless to say, this data will survive a full reinstallation of the Operating System and to quote the CIA, “are invisible to a forensic image of the hard drive”.
Sound like a stealthy place to hide your life’s secrets yet?
What’s more? As easy as it is to write data into firmware variables from User-mode i.e. Ring 3, it is incredibly difficult(if not downright impossible) for the defenders to enumerate the data from the same.
How so, you ask? Well, you’ll see in a bit.
Now, conveniently for us attackers, Microsoft provides us with fully-documented API access to the magical land of firmware variables using:
- SetFirmwareEnvironmentVariable() - To create and set the value of an NVRAM variable
BOOL SetFirmwareEnvironmentVariableA( LPCSTR lpName, LPCSTR lpGuid, PVOID pValue, DWORD nSize );
- GetFirmwareEnvironmentVariable() - To fetch the value of an NVRAM variable
DWORD GetFirmwareEnvironmentVariableA( LPCSTR lpName, LPCSTR lpGuid, PVOID pBuffer, DWORD nSize );
And if you’re wondering what a Guid is, A GUID(Globally Unique Identifier) along with the variable name is just a way to identify the specific variable in question. Therefore, each variable must have a unique name and GUID.
Now does it make sense why it’s almost impossible to enumerate from Ring 3? Because enumeration would require the exact name and GUID of the variables. Hell to even verify the existence of a variable, you’d need its specific name and GUID.
Okay, this sounds too good to be true so what’s the caveat? Can you call these API’s even from a non-elevated context?
Good question, the answer is no!
Using these API functions require that you are a local admin and that you have a specific privilege available and enabled in the calling token namely -
SeSystemEnvironmentPrivilege/SE_SYSTEM_ENVIRONMENT_NAME. This means that our persistence framework won’t install without an Elevated Context(Blue Teams take note!)
I wouldn’t consider this a huge problem for attackers since persistence is typically meant to be a Post-Ex job and could be easily installed after privilege escalation on the host.
But the problem doesn’t end there. The next big caveat is size. How much data can you store in a single NVRAM variable and how many such variables can be created reliably?
That shall solely dictate what can or can’t be used as the payload.
To answer this question, I have done some testing in my lab and I have found that you can approximately create around 50 variables and each with a capacity of 1000 characters before Windows starts whining with a 1470 error code.
Also, now is a good time to point out that it is possible to enumerate these variables from Kernel-mode i.e. Ring 0 using frameworks such as CHIPSEC or using physical access to the machine with an UEFI shell(Again, Defenders take note!)
Once again, I will not bore you with endless theory. But it is important to know a few things. Port Monitors are User-mode DLLs that according to MSDN “are responsible for providing a communications path between the user-mode print spooler and the kernel-mode port drivers that access I/O port hardware”.
These DLLs are loaded by the Print Spooler Service or
spoolsv.exe at startup and for that to happen primarily one of the two methods must be followed:
- The fully-qualified pathname of the DLL must be written to
This requires either a manual registry entry or via WinAPI and it allows loading of arbitrary DLLs.
- The second method has a couple of more constraints.
- The DLL must reside in
- Arbitrary DLLs cannot be loaded via this technique(well, it can but without persistence), the DLL must be written in a special way with some mandatory functions defined and must export a function named
InitializePrintMonitor2which gets called immediately after the DLL is loaded
Finally, the Port Monitor can be registered via:
AddMonitor() - To install a local port monitor
BOOL AddMonitor( _In_ LPTSTR pName, _In_ DWORD Level, _In_ LPBYTE pMonitors );
What the function does under the hood is add the same registry entries and load the DLL within
spoolsv.exe but without any direct intervention. Readers should probably take note that if the DLL is not created exactly according to MSDN specifications then while the DLL will be loaded for the current session but the appropriate registry entries will not be made and the DLL will not load after a reboot ergo, defeating its very purpose.
And to uninstall a Port Monitor:
DeleteMonitor() - To remove a local port monitor
BOOL DeleteMonitor( _In_ LPTSTR pName, _In_ LPTSTR pEnvironment, _In_ LPTSTR pMonitorName );
I have chosen the second way for the framework.
Execution Guardrails with DPAPI
Well, I’m a big fan of putting execution guardrails in my code primarily because of two reasons:
- To prevent the accidental breaking of the rules of engagement. This will ensure that our malcode doesn’t end being executed on any unintended host which are out of the scope
- To hinder the efforts of blue teams trying to reverse engineer the implant on non-targeted assets and thwart analysis on automated malware sandboxes
Although, I think in this case the latter is more applicable than the former.
So what’s DPAPI?
DPAPI(Data Protection API) is simply a set of functions provided by Microsoft intended to ensure confidentiality and integrity of locally stored credentials like Browser passwords, WiFi PSKs etc.
This is primarily achieved through the use of two functions:
DPAPI_IMP BOOL CryptProtectData( DATA_BLOB *pDataIn, LPCWSTR szDataDescr, DATA_BLOB *pOptionalEntropy, PVOID pvReserved, CRYPTPROTECT_PROMPTSTRUCT *pPromptStruct, DWORD dwFlags, DATA_BLOB *pDataOut );
DPAPI_IMP BOOL CryptUnprotectData( DATA_BLOB *pDataIn, LPWSTR *ppszDataDescr, DATA_BLOB *pOptionalEntropy, PVOID pvReserved, CRYPTPROTECT_PROMPTSTRUCT *pPromptStruct, DWORD dwFlags, DATA_BLOB *pDataOut );
Apart from the fact that these functions are quite straightforward to use, it provides another benefit.
If we can encrypt a data blob with DPAPI on the target host, that encrypted data cannot be decrypted anywhere else but on the same host machine. This means that if a payload is encrypted directly on a targeted asset, it shall make decryption and ergo execution non-trivial on a non-targeted asset like a sandbox or say a malware analyst’s VM.
I got the inspiration for this from a malware named InvisiMole, technical analysis courtesy of ESET.
You can read in-detail about DPAPI if you’re interested here.
Typical code injection uses thread injection using the documented
CreateRemoteThread() or it’s lesser known undocumented cousin
RtlCreateUserThread() or an Nt* equivalent in Ntdll like
What happens in Thread Injection is that a thread is created in the remote process to run our malcode.
Though this remains one of the most popular, easy to implement and stable forms of code injection, this has some disadvantages from an OPSEC perspective. With tools, such as Get-InjectedThread it is quite easy to detect an injected thread in a remote process by spotting missing
MEM_IMAGE flags for the memory of the thread start address.
Anyway, this is something that @xpn(Adam Chester) will do a far better job of explaining than me!
The way Thread Hijacking overcomes the obstacle is by not injecting a thread in the first place but instead hijacking an existing thread of the remote process by first suspending it, then redirecting the
RIP register to our malcode before resuming the thread again to launch our malcode this time.
This is why it is also fondly known as SiR(Suspend-Inject-Resume) injection. Pretty neat eh?
To accomplish this, we primarily need to perform the following steps(and API calls):
- VirtualAllocEx() - To allocate memory in the target process for our shellcode
- WriteProcessMemory() - To write the shellcode to the allocated memory in the target process
- SuspendThread() - To suspend a thread
- GetThreadContext() - To fetch the current state of the registers for our hijacked thread
- SetThreadContext() - To set the updated state of the registers for our hijacked thread specifically the RIP register now redirected to point to our shellcode
- ResumeThread() - To resume the hijacked thread
There’s an extra VirtualProtectEx() call which I have taken the liberty to add because well, for one allocating
RWX memory in a remote process is not taken too kindly by PSPs.
The workaround first allocates
RW pages for writing the payload and later changes the page protection to
RX before the thread is resumed so that the payload can be executed.
One last thing to note regarding this method is that it is a little unstable and the chances of the target process crashing after the malcode terminates are extremely high! (requires cleanup to fix)
A possible alternative might include creating a new process and hijacking that but I wanted to avoid a process creation event(Sysmon Event ID 1). Ideally, you should weigh your pros and cons taking into factor your target environment and edit the code to suit your needs.
Whatever you’ve read till now explains the What?. Now that we have more-or-less understood the individual pieces of the puzzle let’s move forward and assemble the above pieces to solve the puzzle and try to explain the How?.
But first, let us look at a block diagram to help visualize the architecture.
So as we can see underlined in the diagram, this framework consists of two payloads:
Gremlin- The Port Monitor DLL
Gargoyle- The Persistence Installer
Both of them are compiled to DLLs and with the Gargoyle payload an extra step is taken to convert it into a PIC(Position Independent Code) blob, big thanks to (@monoxgas)Nick Landers[SBS] for the amazing sRDI project.
This is done to ensure that persistence can be delivered via your favourite C2 framework and installed with inline execution/local execution of shellcode.
Gargoyle is executed in-memory in an Elevated Context it primarily has two objectives to accomplish:
- Figure out if persistence is already installed on the host or not. If not:
Gremlinimplant DLL from its resource section and copy it to
System32folder before installing it as a Port Monitor DLL using the above-mentioned method
- Extract the Beaconing shellcode payload from its resource section, encrypt the payload using DPAPI on the target host,
Base64URLencode the encrypted payload and divide it into chunks before writing them into as many NVRAM variables as permissible by the flash chip
- If persistence is already installed on the host:
- Delete the
- Delete the payload from the NVRAM variables
- Delete the
This is turn loads
Gremlin implant by
spoolsv.exe if persistence is installed successfully which has the following objectives to accomplish:
- Steal a token from
winlogon.exeand impersonate for the current thread(more on this later)
- Check if
SeSystemEnvironmentPrivilege/SE_SYSTEM_ENVIRONMENT_NAMEis available in the token and enable it if available
- Now, read back the individual chunks from the NVRAM variables and assemble to get the Base64URL-encoded encrypted payload
Base64URLdecode it to get the encrypted payload byte blob
- Decrypt the blob using DPAPI to get the final payload
- Hijack a thread of
explorer.exeto execute our Beaconing payload(
Design Considerations and OPSEC Concerns
If you came this far, you must have a lot of questions regarding this framework. Don’t worry for now I shall attempt to address some of those and discuss some OPSEC concerns. Hopefully, that shall explain the Why?.
First, let’s address the issue with UEFI variables.
By now, it is pretty evident that we can do little with the buffer space offered by one NVRAM variable. Therefore, we need to chunk the payload into the max permissible size and write those individual chunks to as many variables as required and permissible. Like I said before, I have found out from my tests that we can create a maximum of 50 variables and each with a buffer space of 1000 characters. Our next quest is to figure out what encoding scheme to use to store the payload byte blob. It needs to be an efficient one for us to utilize the most out of the buffer space.
hex will take two characters for each byte while
Base64 takes 4 characters for every 3 bytes, so it’s more efficient than
hex. But can we do any better? Yes, we can! And one of the ways is by using
Base64URL which is an URL-safe variant of
Base64 encoding plus omitting the padding character(=).
So what’s the final size of the payload that we can reliably store in NVRAM? It comes as around
It becomes immediately evident to us that Stageless payloads generated out-of-the-box are out of the question with this tiny size limit in today’s age.
So what can we use? Well, Staged payloads generated out-of-the-box should work fine with this framework. But this potentially raises an OPSEC concern since using default
Beacon stagers is not recommended. And what about the cases when even the Staged payload crosses the size limit?
Taking into consideration all these factors, I recommend designing a simple native payload stager/loader yourself that fetches the final payload over a network(due to size constraints) and executes it locally. In that case, there would be no need to inject it again since the egress implant is already in the address space of a process from where network activity is not considered unusual by PSPs.
Secondly, some of you might be wondering if we are touching disk anyways with the
Gremlin implant then why do we need NVRAM variables and why do we even need a separate persistence payload for that matter? Shouldn’t persistence be a part of the Stage-1 or Stage-2 RAT?
Short Answer: OPSEC
Long Answer: Sure, persistence has to touch disk but we can always minimize the impact of that by controlling what touches the disk and what stays in-memory only. A
Stage-1(Beaconing) or a
Stage-2(Post-Exploitation) RAT on disk is just asking to be caught by AV/EDRs. They have no business being on disk and they should reside in-memory only. But with that comes a problem. If they are in-memory only, how can we possibly achieve persistence with them? That answer is a “relatively-benign” persistence implant that automatically loads at machine startup which in turn loads the egress implant in-memory. So how does the persistence implant(in our case
Gremlin - A Port Monitor DLL) fetch the egress implant? There are possibly two avenues here:
- Either over network or better yet
- A stealthy storage place in Windows One of those covert storage compartments happens to be NVRAM Variables. Some other possible places to hide your shit could be NTFS ADS(Alternate Data Streams), Covert File Systems, Windows Registry Keys, Event Logs etc.
I simply chose UEFI variables because well it seemed more fun than the rest and since (ab)using them for covert data storage requires elevation anyways, I decided to use a Port Monitor DLL as the persistence implant which is loaded by
spoolsv.exe which if you haven’t noticed is a
SYSTEM process so it just all fit together nicely :)
One last thing I feel like I should point out is that although
spoolsv.exe runs as
SYSTEM, it doesn’t have the privilege required to use NVRAM variables in its process token. Ergo, we have to perform token stealing a.k.a.
Token Impersonation i.e. to steal and impersonate a primary token from a process(
winlogon.exe) that has the required privilege in its token(albeit in a disabled state) for the calling thread and then attempt to enable the privilege.
Hopefully, with this, I was able to explain the motivation behind each design choice.
Time for screenshots!
And if you’re wondering about
edr_console, it’s simply a modified Sysmon EventLog parser and you can get it here
And we successfully caught an incoming
Inspecting loaded modules in
Some pretty suspicious functions in the import table here, wonder what this module is hmmm.
This is the actual
Beacon stager shellcode used:
Aaah! That familiar PE DOS stub!
If you’re still reading, I want to thank you for having the patience to read the whole article. I hope you enjoyed reading it as much as I enjoyed designing the framework/writing this blog post.
Keep in mind that the framework has been designed in a very modular structure so that operators can easily mix and match other techniques keeping the architecture same or just treat it as separate modules and use them in their own projects.
Feel free to hit me up if you feel something could be improved/general suggestions/other cool ideas etc.
You can find what I’m up to on my Twitter @slaeryan.
Bene vale operator!
- https://github.com/perturbed-platypus - Big thanks to @TTimzen & @r00tkillah for their wonderful research.
- https://gist.github.com/jthuraisamy/e602d5d870230df3ce00178001f9ac16 - Another PoC thanks to @Jackson_T
- @am0nsec for dropping dem hints regarding the token impersonation.
- Dark Side Ops - 1 - A hands-on approach to implant development from some of the best people in the game, a highly recommended course, period!
- Sektor7 RTO: MalDev Essentials & @reenz0h for getting me initiated into the game and the templates that I still use to this date.
- CIA Vault7 leaks - I have a joke but it is REDACTED.
- @monoxgas for sRDI and being an awesome researcher in general!
- Mr. Base64 - for the review and code improvements. +1 for being a top-level guy! You can find him hanging out here 0x00sec Discord with a bunch of other really cool peeps.
Support this project
If you find this project useful, consider buying me coffee or a beer as a token of appreciation.
You can do it right here: