Wizard Spider and Sandworm Evaluation: Detection Categories
The evaluation focuses on articulating how detections occur, rather than assigning scores to vendor capabilities.
For the evaluation, we categorize each detection and capture notes about how those detections occur. We organize detections according to each technique. Techniques may have more than one detection if the capability detects the technique in different ways, and detections we observe are included in the results. While we make every effort to capture different detections, vendor capabilities may be able to detect procedures in ways that we did not capture. For a detection to be included for a given technique, it must apply to that technique specifically (i.e. just because a detection applies to one technique in a Step or Sub-Step does not mean it applies to all techniques of that Step). For proof of detection in each category, we require that the proof be provided to us, but we may not include all detection details in public results, particularly when those details are sensitive.
To determine the appropriate category for a detection, we review the screenshot(s) provided, notes taken during the evaluation, results of follow-up questions to the vendor, and vendor feedback on draft results. We also independently test procedures in a separate lab environment and review open-source tool detections and forensic artifacts. This testing informs what is considered to be a detection for each technique.
After performing detection categorizations, we calibrate the categories across all vendors to look for discrepancies and ensure categories are applied consistently. The decision of what category to apply is ultimately based on human analysis and is therefore subject to discretion and biases inherent in all human analysis, although we do make efforts to hedge against these biases by structuring analysis as described above.
Detections will be tagged with the data source(s) that signify the type of data used to generate the detection. This will be used to differentiate and provide more precise descriptions of similar detections (ex: telemetry from file monitoring versus process command-line arguments). The list of possible data source tags will be calibrated by MITRE after execution of the evaluations.
Vendor did not have visibility on the system under test. The vendor must state before the evaluation what systems they did not deploy a sensor on to enable Not Applicable to be in scope for relevant steps.
|No sensor was deployed in the Linux systems within the environment to capture command-line activity, which would have been required to satisfy the detection criteria of the technique under test.|
No data was made available within the capability related to the behavior under test that satisfies the assigned detection criteria. There are no modifiers, notes, or screenshots included with a None.
Minimally processed data collected by the capability showing that event(s) occurred specific to the behavior under test that satisfies the assigned detection criteria. Evidence must show definitively that behavior occurred and be related to the execution mechanism (did happen vs may have happened). This data must be visible natively within the tool and can include data retrieved from the endpoint.
Command-line output is produced that shows a certain command was run on a workstation by a given username.
There is a remote shell component within the capability that can be used to pull native OS logs from a system suspected of being compromised for further analysis.
Processed data specifying that malicious/abnormal event(s) occurred, with relation to the behavior under test. No or limited details are provided as to why the action was performed (tactic), or details for how the action was performed (technique).
A detection describing "cmd.exe /c copy cmd.exe sethc.exe" as abnormal/malicious activity, but not stating it's related to Accessibility Features or a more specific description of what occurred.
A “Suspicious File” detection triggered upon initial execution of the executable file.
A detection stating that "suspicious activity occurred" related to an action but did not provide detail regarding the technique under test.
Processed data specifying ATT&CK Tactic or equivalent level of enrichment to the data collected by the capability. Gives the analyst information on the potential intent of the activity or helps answer the question "why this would be done". To qualify as a detection, there must be more than a label on the event identifying the ATT&CK Tactic, and it must clearly connect a tactic-level description with the technique under-test.
A detection called “Malicious Discovery” is triggered on a series of discovery techniques. The detection does not identify the specific type of discovery performed.
A detection describing that persistence occurred but not specifying how persistence was achieved.
Processed data specifying ATT&CK Technique, Sub-Technique or equivalent level of enrichment to the data collected by the capability. Gives the analyst information on how the action was performed or helps answer the question "what was done" (i.e. Accessibility Features or Credential Dumping). To qualify as a detection, there must be more than a label on the event identifying the ATT&CK Technique ID (TID), and it must clearly connect a technique-level description with the technique under-test.
A detection called "Credential Dumping" is triggered with enough detail to show what process originated the behavior against lsass.exe and/or provides detail on what type of credential dumping occurred.
A detection for "Lateral Movement with Service Execution" is triggered describing what service launched and what system was targeted.
Modifier Detection Types
The configuration of the capability was changed since the start of the evaluation. This may be done to show additional data can be collected and/or processed. The Configuration Change modifier may be applied with additional modifiers describing the nature of the change, to include:
- Data Sources – Changes made to collect new information by the sensor.
- Detection Logic – Changes made to data processing logic.
- UX – Changes related to the display of data that was already collected but not visible to the user.
The sensor is reconfigured to is created to enables the capability to monitor file activity related to data collection. This would be labeled with a modifier for Configuration Change-Data Sources.
A new rule is created, a pre-existing rule enabled, or sensitivities (e.g., blacklists) changed to successfully trigger during a retest. These would be labeled with a modifier Configuration Change-Detection Logic.
Data showing account creation is collected on the backend but not displayed to the end user by default. The vendor changes a backend setting to allow Telemetry on account creation to be displayed in the user interface, so a detection of Telemetry and Configuration Change-UX would be given for the Create Account technique.
The detection is not immediately available to the analyst due to additional processing unavailable due to some factor that slows or defers its presentation to the user, for example subsequent or additional processing produce a detection for the activity. The Delayed category is not applied for normal automated data ingestion and routine processing taking minimal time for data to appear to the user, nor is it applied due to range or connectivity issues that are unrelated to the capability itself. The Delayed modifier will always be applied with modifiers describing more detail about the nature of the delay.
|The capability uses machine learning algorithms that trigger a detection on credential dumping after the normal data ingestion period. This detection would receive a Modifier detection category of Delayed with a description of the additional processing time.|