Ransomware Defender Admin Guide

Eyeglass Ransomware Defender Admin Guide



Contents

  1. 1 What's New
  2. 2 Chapter 1 - Introduction to this Guide
    1. 2.1 Overview
    2. 2.2 Supported Automated Actions
    3. 2.3 Prerequisites, Requirements or limits
    4. 2.4 Read this first or cautions
    5. 2.5 Where to go for Support
  3. 3 Abbreviations
  4. 4 Installation Guide
  5. 5 Licensing
  6. 6 Planning and Design
    1. 6.1 Overview
  7. 7 Securing Root user on Isilon
  8. 8 Ransomware Security Signal Events and Detection Overview
  9. 9 Ransomware - Signal Strengths - Summary Explanation
  10. 10 Ransomware - Signal Strengths - Detailed Explanation
  11. 11 Security Event Threat Detector Definitions
  12. 12 Threat Detection Signal Strengths in Eyeglass
    1. 12.1 Example Signal Strength walk through
    2. 12.2 Signal Strength Window Overview
    3. 12.3 Eyeglass Clustered Agent Deployment Options
    4. 12.4 Supported Topologies
      1. 12.4.1 #1 Eyeglass topology --HOT COLD
      2. 12.4.2 #1A Eyeglass topology --HOT COLD - 3rd party auditing compatibility - External CEE Server
      3. 12.4.3 #2 Eyeglass topology --HOT COLD
      4. 12.4.4 #3 Eyeglass topology --HOT COLD
      5. 12.4.5 #4 Eyeglass topology --HOT HOT clusters
  13. 13 Ransomware Defender Eyeglass Clustered Agent (ECA) Architecture
  14. 14 High Availability and Resiliency
    1. 14.1 Cluster Operational Requirements
    2. 14.2 Architectural Data flow of CEE events though the Eyeglass Clustered Agent
  15. 15 Message flow Inside ECA Node
  16. 16 How to Configure and Tune Ransomware Defender Threat Detection and Responses
    1. 16.1 Severity Threat Level Severity Definitions and Responses
    2. 16.2 How to Enable Monitor Mode to baseline User Behaviours
    3. 16.3 How to Tune Threat Detection Rates (Warning, Major and Critical)
    4. 16.4 How to determine threat response settings to meet your Company’s Risk Profile
    5. 16.5 How to Enable Disable Critical Event Detection
    6. 16.6 Eyeglass Security Event Workflow and Operations
      1. 16.6.1 Auto Archive Warning Security Events
      2. 16.6.2 How to change the default Autoarchive timeout
    7. 16.7 Threat Detector Threshold configuration
  17. 17 How to change Threat Detection Settings
    1. 17.1 Ignored List:
      1. 17.1.1 Detected User Security Event Descriptions
      2. 17.1.2 Security Event State Descriptions
      3. 17.1.3 Security Event Possible Action Descriptions
      4. 17.1.4 How to respond to Security Events for warning, major or critical events
        1. 17.1.4.1 Overview of Security Event Triage Process
        2. 17.1.4.2 If Warning
        3. 17.1.4.3 If Major
        4. 17.1.4.4 If Critical
    2. 17.2 Security Event Action State Descriptions
      1. 17.2.1 Warning State
        1. 17.2.1.1 Locked out User State (Critical Severity Threat Detection)
        2. 17.2.1.2 Access Restored State
        3. 17.2.1.3 Delayed Lockout state
        4. 17.2.1.4 Acknowledged State
        5. 17.2.1.5 Archived Event on Event History
    3. 17.3 Rapid Machine to Machine Malware Spreading Attack Defense Overview
    4. 17.4 Rapid Machine to Machine Malware Attack Auto Response Escalation Configuration
    5. 17.5 False Positive Security Event Handling and Configuration Options
      1. 17.5.1 Warning
      2. 17.5.2 Major
      3. 17.5.3 Critical
    6. 17.6 Ignore List setting Procedures
  18. 18 Eyeglass User Lockout Active Directory Planning
  19. 19 Role Based Access
  20. 20 Remote Service Authentication and Protocol
    1. 20.1 Service Registration Monitoring in Eyeglass
      1. 20.1.1 Service States
      2. 20.1.2 Health States
  21. 21 Security Guard - Automated Security Testing
    1. 21.1 Simulated Attack
    2. 21.2 Pre-Requisites
    3. 21.3 Security Guard Lockout Behavior
    4. 21.4 Configuration
  22. 22 How to Run on Demand Security Guard Penetration test
  23. 23 How to Review Security Guard Penetration test history and logs
  24. 24 Ransomware ECA Cluster Operational Procedures
    1. 24.1 Eyeglass Cluster Maintenance Operations
      1. 24.1.1 Cluster OS shutdown or restart
      2. 24.1.2 Cluster Startup
      3. 24.1.3 Cluster IP Change
      4. 24.1.4 Single ECA Node Restart or Host crash Affect 1 or more ECA nodes
    2. 24.2 Eyeglass ECA Cluster Operations
      1. 24.2.1 Stop ECA Database
      2. 24.2.2 ecactl db stop
      3. 24.2.3 Start ECA Database
      4. 24.2.4 ecactl db start
    3. 24.3 Checking ECA database Status:
      1. 24.3.1 ecactl db shell
      2. 24.3.2 Bring Up the ECA Cluster:
      3. 24.3.3 ecactl cluster up
      4. 24.3.4 Bring down the ECA Cluster:
      5. 24.3.5 ecactl cluster down
      6. 24.3.6 Checking events and logs
      7. 24.3.7 Verifying events are being stored in the DB:
      8. 24.3.8 Tail the log of ransomware signal candidates: ecactl logs --follow iglssvc
    4. 24.4 CEE Event Rate Monitoring for Sizing ECA Cluster Compute
      1. 24.4.1 Statistics provided by OneFS for CEE Auditing
      2. 24.4.2 CEE Statistics ISI command examples
      3. 24.4.3 Monitor Cluster Event forwarding Delay to ECA cluster
      4. 24.4.4 Set the CEE Event forwarding start Date
    5. 24.5 How to Generate Support logs on ECA cluster Nodes
      1. 24.5.1 Steps to create an ECA log archive locally on each node
    6. 24.6 Eyeglass ECA Performance Tuning
      1. 24.6.1 vCenter ECA OVA CPU Performance Monitoring
      2. 24.6.2 vCenter OVA CPU limit Increase Procedure
      3. 24.6.3 How to check total ESX host MHZ Capacity
  25. 25 ECA CLI Command Guide
  26. 26 Eyeglass CLI command for Ransomware
    1. 26.1 Root user lockout behaviour
      1. 26.1.1 Security Guard CLI
      2. 26.1.2 Analytics Database Export Tables to CSV
  27. 27 ECA Cluster Disaster Recovery and Failover Supported Configurations
    1. 27.1 Scenario #1 - Production Writeable cluster fails over to DR site
    2. 27.2 Scenario #2 - Production Writeable cluster fails over to DR site and Site 1 ECA cluster impacted
  28. 28 Data Recovery Manager Integration with Ransomware Defender
    1. 28.1 Overview
    2. 28.2 How to launch Data Recovery Manager Request From Ransomware defender action menu
      1. 28.2.1 Prerequisites
      2. 28.2.2 Procedures
  29. 29 Troubleshooting ECA Configuration Issues
    1. 29.1 Issue - no CEE Events are processed
      1. 29.1.1 Issue - Cluster startup can not find Analytics database on Isilon
      2. 29.1.2 Issue - Events being dropped or not committed to Analytics database
      3. 29.1.3 Issue - Monitoring Isilon Audit Event Rate Performance


What's New


1.9.2 has new supportability enhancements and feature to disable real-time critical lockout action and use only time delayed response for security events.  Full feature description in this release is available here.



Chapter 1 - Introduction to this Guide

Overview


This guide covers configuration, setup and monitoring of a Ransomware Defender.  The solution is deployed with a 3 VM cluster that process Isilon CEE audit files with an active active design for maximum availability to survive hardware or software failures.

The active defense solution monitors for user behaviours that are malicious, consistent with Ransomware encryption techniques of customer files.   Network attached SMB mounts on user workstations exposes Isilon critical data.

Three levels of detection are possible Warning, Major and Critical with automated defense options increasing with detection levels.

Supported Automated Actions

  1. Timed lockout (Major) - a time delayed lockout of the user account that triggered the security event.  The lockout can be stopped before the timer expiries.

  2. Immediate lockout (Critical) - User lockout begins real-time once a critical event is detected

Prerequisites, Requirements or limits


  1. Eyeglass VM installed

  2. Cluster discovery licenses (per node or per cluster) that need to be managed by

  3. Ransomware feature license and a Ransomware agent license for all writeable clusters protected by Ransomware defender

  4. Single host for ECA (Eyeglass Clustered Agent) VM’s OR multiple hosts for high level of availability (refer to installation guide deployment topologies section)

  5. CPU limits applied to ECA cluster object in vcenter

  6. Hardware recommendation (see install guide)


Read this first or cautions


NOTE: It's assumed that all workstations and other entry points for Ransomware are running current virus malware software.   Ransomware defender is a second line of defense product.

Where to go for Support

https://support.superna.net

Abbreviations

  • CEE: Common Event Enabler - EMC Specific event protocol (xml based)

  • ECA: Ransomware Defender Application - the entire ransomware defender stack that runs in a separate VM outside of eyeglass.

  • ECA: Eyeglass Clustered Agent



Installation Guide

The install guide covers ECA cluster installation and requirements.   See the guide here.


Licensing

Each writeable cluster requires an agent license and agent maintenance.  A cluster can be monitored by the ECA without a license when it's the cold or DR cluster.   The startup process will automatically assign licences based on the following criteria.


Registered Isilon clusters licensed for Eyeglass DR qualify to be licensed with Ransomware Defender. This licensing will happen as part of an inventory data collection and config sync jobs. Isilon clusters are licensed based on these rules:


  1. The Isilon cluster has at least one enabled syncIQ policy.

  2. The Isilon cluster has no enabled syncIQ policies and no other cluster replicates to it and there are shares or exports.

  3. The cluster has no enabled syncIQ policy and is the target of a replication and it contains shares and exports not covered by the replication policy.



A system alarm will be issued in case a insufficient licenses exist and more writeable clusters are detected in the CEE event messages.



Screen Shot 2017-01-29 at 7.13.56 PM.png

Planning and Design

Overview

The Ransomware Defender solution for Isilon requires existing Eyeglass DR cluster licenses for each Isilon cluster plus an Eyeglass clustered agent license.


The Eyeglass Ransomware Defender solution is intended to be a last line of defense for critical NAS data stored on Isilon.  A best practise defense should include virus software on laptop and workstations, along with email gateway or IDS network solutions.

The intended use case for Ransomware Defender assumes malware has circumvented all existing defenses leaving critical NAS data exposed to attack.


The diagram below shows traditional approach to security with perimeter defenses with primary purpose of ensuring malware never enters your network.


The Eyeglass solution builds a new security perimeter inside your network, with active defense to threats.




Securing Root user on Isilon


Root user should never be used to access data on Isilon.  The reason this user is a high security risk is because root has access to all shares even if access has not been granted to the root user.  This security risk could allow a compromised machine using the root user to access data could encrypt all data on the cluster.


Eyeglass Ransomware Defender offers a mode configured through igls CLI to disable SMB protocol on the Isilon clusters managed by Eyeglass to ensure in the event a Ransomware event is detected the compromised machine does not destroy all data on all clusters.  See CLI at the end of this guide for instructions to enable.


The root user can NOT be locked out with a deny permission which is why SMB protocol disable is the only way to protect data.


NOTE: IF YOU USE RUN AS ROOT ON SHARES YOU ARE EXPOSING DATA TO VERY HIGH SECURITY RISK SINCE NO LOCKOUT WILL BE POSSIBLE.  THIS IS BECAUSE THE USER SID THAT IS SENT WHEN AN AD USER ACCESSES DATA WITH RUN AS ROOT ENABLED IS THE ROOT USER NOT THE ACTUAL AD USER.


We recommend to NOT use run as root on shares for the reason above AND its fails all security audits of Isilon in all industry standards (PCI, HIPPA, FedRAMP, ITSG, etc…)  Remove run as root option on all shares.


The default setting on Ransomware Defender is DO NOT respond with SMB disable root user SID has tripped a threat detector.   To enable this mode VERIFY no run as root user shares exist.


This can be done using the Eyeglass cluster configuration report

  1. Login to Eyeglass

  2. Open Reports on Demand Icon

  3. Select Create New Report

    1. Screen Shot 2017-04-28 at 12.46.44 PM.png

  4. Wait until report is finished by viewing running jobs

  5. Select Open/Print option for the finished report from Reports on Demand after running jobs shows the report creation is completed.

  6. Click Cancel on Print option (if using Chrome).

  7. Control-F to search the page option

  8. Search for “run_as_root”

  9. If any Shares are found with this option set see below.

  10. DO NOT ENABLE LOCK ROOT FEATURE.

Ransomware Security Signal Events and Detection Overview


Ransomware defender is a per user monitoring solution that operates at Isilon Scale.  This means each user's file activity is monitored individually for user behaviours that trigger threat detection patterns.    This builds a zero day solution to identify patterns of IO that are detected and weighted without needing definition file based detection.


The weight is called “signal strength” and determines how Eyeglass will respond to the threat.  


Three threat levels are defined:

  1. Warning - No action taken only alarm email sent to administrator

  2. Major - Timed lock of user account in minutes from event

  3. Critical - Immediate lockout of user account


Eyeglass Active Responses to Threats

  1. Lockout action means deny permission on all shares the user has access across all managed Isilon clusters (not just the cluster the event was detected)

  2. Future responses are planned (example, stop syncIQ replication to preserve DR copy of the data, Or create snapshots if suspicious events are seen and snapshot all shares a user has access).



Ransomware - Signal Strengths - Summary Explanation


The Signal Strength is a number that represents the peak threat count per user per Signal Strength Threshold level (warning, major, critical).  Each Signal Strength Threshold level has its own threat threshold crossing value and time period that the threshold must be crossed within before the event is treated as a security event.


This can be viewed as threat rate per minute.  Each time a threat event is received for a given user, 3 different threat/minute rates are evaluated for this user.   Depending on the rate calculated determines if the user security response is warning, major and critical.  Each Signal Strength Threshold represents is a different rate of threats.    Each new threat event for a user triggers another calculation using the Signal Strength Threshold rates and intervals.


If this calculation results in a warning state changing to major, then the response for that Signal Strength Threshold will be applied.   In this example, moving from warning to major will result in a lockout of the user according to the lockout timer set on the Settings tab.



The above example is for illustration only and should not use these settings as shown without direction from support.   To Configure detection as per the example above.  It would look like the image below:


Sample only, example for illustration only

Screen Shot 2017-06-06 at 7.22.34 AM.png

Ransomware - Signal Strengths - Detailed Explanation


This section describes the Signal Strength calculation in Superna Eyeglass Ransomware Defender.


Security Event Threat Detector Definitions

  • File Event: a discrete CEE event published by Isilon’s CEE event stream based on a user action example open file, close file, write or read to file.

  • Threat Detectors: Logic used by Eyeglass Ransomware Defender to determine if a group of File Events is potentially associated with a Ransomware attack. There are multiple independent threat detectors used by Eyeglass Ransomware Defender during analysis that are assessed in parallel.

  • Signal: Occurrence of one or more File Events that have been flagged by one or more threat detectors as a potential Ransomware Event.

  • Signal Strength: For a given Signal, the number of threat detectors that were triggered. A higher Signal Strength has a higher probability of being a Ransomware Event.

  • Ransomware Event: A collection of signals whose combined Signal Strengths exceeds the user-set threshold in Eyeglass.


Threat Detection Signal Strengths in Eyeglass


Threat detection Signal Strength is a measure of the severity of a user's File Event behaviour.  The higher the count the higher the severity of the detection.  


The Signal Strength is displayed in the Eyeglass Ransomware Defender window Active Events or Event history tab.

Screen Shot 2017-04-20 at 7.25.37 AM.png

The numbers represent the peak of warning, major and critical signal strengths that were recorded in the entire lifetime of the ransomware event.


Example Signal Strength walk through



In the above diagram, if we have settings with the following

  • WARNING: 1 event in 30 minutes

  • MAJOR: 5 events in 10 minutes

  • CRITICAL: 8 events in 5 minutes


We would expect a signal strength of “12 / 10 / 9”, since the peak signals in the whole event’s lifetime are found in the following intervals:




Also note that the blue line counts for 2, since there are two independent threat detectors that contributed to the event.


Signal Strength Window Overview

When clicking on the Signal Strength in Eyeglass, you can see the threat detectors that contributed to the event. Note that this is not broken down by severity, and represents the total of the Threat Detector types that were tripped throughout the lifetime of the event for this user security event.


Screen Shot 2017-04-07 at 7.23.21 AM.png


Note that the sum of these values can be greater than the peak signal strengths described above, since it’s possible that the lifetime of the ransomware event is greater than the interval for the thresholds.

Eyeglass Clustered Agent Deployment Options


It's best practice to place ECA clusters near Isilon clusters to reduce latency between the cluster and the ECA instance that is processing of the audit log.  


The ECA is 3 VM’s that all receive CEE audit log events from the cluster as three separate end points. The supported configurations are outlined below.

Supported Topologies

Blue lines = Service broker communication heartbeat 23457

Orange Lines = Isilon REST API over TLS 8080 and SSH

Green lines = CEE messages from the cluster port 12228

Red lines = SyncIQ replication

#1 Eyeglass topology --HOT COLD




#1A Eyeglass topology --HOT COLD - 3rd party auditing compatibility - External CEE Server


See Section in install guide on configuring external CEE server to work with ECA cluster


#2 Eyeglass topology --HOT COLD






#3 Eyeglass topology --HOT COLD




#4 Eyeglass topology --HOT HOT clusters




Ransomware Defender Eyeglass Clustered Agent (ECA) Architecture


The flow of information through the ECA is shown in the picture below:





High Availability and Resiliency


The ECA cluster is an active active design that offers Matrix Processing of events. This design uses dedicated docker containers that perform a specific function on each node. The solution allows for multiple container failures within a node and between nodes.


The solution allows a distribution of event processing at the functional container level on any of the nodes in the cluster.  This allows greater than a single point of failover within a node and between nodes.  This ensures processing contains under most common conditions with greater than 2x HA level of redundancy.



Cluster Operational Requirements


The platform is a robust high performance event processing cluster for threat and audit detection capabilities. The cluster will remain operational as long as 2 of the 3 nodes are running and can reach the HDFS cluster database.





Architectural Data flow of CEE events though the Eyeglass Clustered Agent

How the ECA processes incoming events, should be understood when debugging


  1. The ECA cluster is an active active active solution which means all nodes process and analyze CEE data from the cluster.   

  2. The cluster load balances CEE messages to each node in the cluster.

  3. Each node has CEE listener container, fastanalysis container, Eyeglass service container, ceefilter container, DB container.

  4. Each user in AD  hashed and assigned to one node in the cluster so single user behaviour patterns can be processed by a single node in the cluster.

  5. If a node goes down another node takes over the active directory user processing for the failed node .







Message flow Inside ECA Node



How to Configure and Tune Ransomware Defender Threat Detection and Responses


The detection of a Ransomware event will be contained strictly to the ECA nodes. Eyeglass will be responsible for taking action against the user's access to cluster data and notifying administrators. This section identifies the behaviours that the Eyeglass appliance takes when the ECA identifies a threat and how to configure settings that align to your company security policies or risk tolerance.


Severity Threat Level Severity Definitions and Responses

There are three Signal Strength Threshold levels defined, and Eyeglass will take different action for each:


Threat Level

Eyeglass Action

WARNING

Eyeglass sends an email to notify any subscribed administrators of the threat, but takes no direct action.

MAJOR

Eyeglass begins a “delayed lockout” procedure. Notify the administrator(s) that a threat has been detected, and the user will be locked out after X minutes, unless the admin logs in and explicitly cancels the action.  This grace period is configurable in the Eyeglass settings.

CRITICAL

The user lock out is immediate, and the Administrator(s) are notified.


How to Enable Monitor Mode to baseline User Behaviours


Monitor mode is used after installation to disable any actions for Major and Critical events and to baseline the environment.  It can also be used to quickly disable user actions if too many false positives are detected.


  1. Open Ransomware Defender window.

  2. Select Settings tab and click the Monitor mode check box and click save.

  3. Screen Shot 2017-04-20 at 8.45.37 PM.png


How to Tune Threat Detection Rates (Warning, Major and Critical)


The Monitor Mode feature is designed to monitor IO from users in an environment before going into to production.  This allows threat detection and user behaviour to be monitored before entering into production mode.  Monitor Mode can be enabled or disabled at any time.


Monitor mode is used to identify service accounts or applications that are writing data at a high rate that should be added to the ignore list and filter out these applications from normal user data access.


The setting is enabled from the Settings tab.  With Monitor Mode enabled all events detected are treated as Warning with no lockout actions.

Screen Shot 2017-06-06 at 7.38.04 AM.png


The Statistics screen is used to monitor which user behaviours are being triggered to determine how to set detection for warning, major and critical settings.


These statistics can be submitted to support.superna.net to recommend settings for detection.


How to determine threat response settings to meet your Company’s Risk Profile


The Ransomware Defender product has several options to tune the detection and response to a Ransomware attack.  The more sensitive the detection the more likely a false positive can occur.  Threat response options are outlined below with business impact considerations for each option.  This section should be reviewed to determine how to configure the product in your environment.


Risk tolerance and business impact need to be assessed to determine the best settings for your environment.  The section below outlines the recommendations for each threat detection level.


Threat responses:


  1. Lockout of user account

    1. Timed lockout delay

  2. Disable lockout on critical only.  Turns off immediate lockout and all responses are delayed using Major lockout timer.

  3. Snapshot file system path at share path,  can be combined with delayed lockout to create recovery point and after x minutes lockout if user behaviour continues.  (upcoming release)


Threat Level Severity

Action

Business Impact

Warning

No action taken. Email alert is sent

No impact to applications or user access to data

Major

Timed lockout of user. Email alert is sent

Business applications or servers write data that are not added to the ignore list can be locked out.  


Impact: application down time until restore access completed.


Recommendation: add to ignore list.

Critical

Immediate lockout of user. Email alert is sent

Impact: application down time until restore access completed.  No wait time from detection to lockout for administrators to determine action.


Recommendation: add to ignore list or Disable critical  actions.



How to Enable Disable Critical Event Detection


This option will disable immediate lockout action and will only use major timed lockout option.  This is recommended if risk tolerance for a lockout on users should be reviewed by an administrator using the timed out lockout feature on Major severity detections.


  1. Open Ransomware Defender window.

  2. Select Settings tab and click the Monitor mode check box and click save.

  3. Screen Shot 2017-06-06 at 7.38.04 AM.png

  4. Select the checkbox “Critical off mode” and click save

  5. done.

Eyeglass Security Event Workflow and Operations


Under normal working state it will be normal to see some user behaviours detected as warnings in the active events window.  These events will stay in active monitoring state for a period of time (settable in the settings tab), to continue to monitor this users behavior for new threat detectors and rates of detection the promote the event to Major or Critical.    


If the user's activity continues to fire threat detectors at or below the Warning rate, the security event will remain in Active monitoring state and will not be Auto Archived.


Auto Archive Warning Security Events


This feature simplies monitoring of low grade security events.  Warning security events will stay active as long as new threat detectors for this user continue to be detected during the auto archive timeout period.  This feature will auto archive the event if no new threat detectors fire for this user’s security event.   The expires column can be used to monitor which events will auto archive in X hours or minutes from the Active Events window.


How to change the default Autoarchive timeout

Use this procedure to set the time period a warning event will stay visible in the active event window before it's archived to the history tab.  Longer time period allows tracking a user's behavior for longer time period.

Screen Shot 2017-06-06 at 7.38.04 AM.png

  1. Open the Ransomware Defender window.

  2. Select the Settings tab

  3. Change the auto archive timeout from the default of 3 hours to another value in minutes.  See screenshot for Expiry setting in the Warning section.

  4. Save changes


Detection and Response Configurable Settings

Threat Detector Threshold configuration

Eyeglass allows the administrator to configure the thresholds at which the various events take place.


  • Units are in Candidate Events per user per interval: i.e. the number of files that were affected by a single user, in a given time period.

  • Different thresholds are available for the WARNING, MAJOR, and CRITICAL severities.

  • The MAJOR severity also allows the specification of the grace period (the time between event detection and lockout).  Timed Lockout can be stopped with action menu on an active event.


The figure below shows the settings UI.

Screen Shot 2017-06-06 at 7.38.04 AM.png


How to change Threat Detection Settings

  1. Open Ransomware Icon

  2. Click Settings tab

    1. Screen Shot 2017-05-03 at 7.38.26 AM.png

  3. Change events per user settings to trigger warning, major and critical security events.  

  4. The lower the number the more sensitive the detection will become.  Changing to a larger number can avoid false positive depending on IO patterns within your Isilon environment.

  5. The Grace period (minutes) sets how long a Major security event detection will wait before locking out the user named in the security event.  Best practice: This should be set to a value that ensures an administrator can review the event and determine if lockout should occur or be canceled.  It is the response to review an event before the lockout occurs.

  6. NOTE: Recommended to consult support before making any changes.





Ignored List:


Eyeglass allows the administrator to specify paths, users, and client or server ip address to exclude from ransomware processing. The UI Ignore list is shown below:

Screen Shot 2017-05-03 at 7.55.36 AM.png

  1. Using the button next to the title, the administrator can add new Paths (fullpath is required example /ifs/data/xxx), Active Directory Users (domain\userid or user@domainname) and Ignored client IP in the Sources column.

  2. Sources can be specified in ip/subnet notation to ignore ranges.  Example 10.0.0.0/8  will ignore all ip addresses in the 10.x range.

  3. NOTE: each ignore column is an OR meaning if ANY of the listed ignore values is found in an audit message it will be dropped before processing.  The first matched ignore list will drop the audit event.




Detected User Security Event Descriptions


Once a user security event appears in the active events the following table outlines the column definitions and descriptions of each state of the security event.


Screen Shot 2017-04-20 at 7.25.37 AM.png



Column Name

Description

State

Warning Threat rate threshold crossed

Delayed Lockout - Major Threat rate threshold crossed

Locked Out Critical Threat rate threshold crossed

Severity

Warning - Threat detector peak rate threshold for this event was crossed

Major - Threat detector peak rate threshold for this event was crossed

Critical Threat detector peak rate threshold for this event was crossed

Files

A count of files that tripped the threat detectors for this event.  Click to browse the file system path to see the location on disk that the user was accessing.   

  • Two tabs are shown one is a list of files that user was accessing within the last hour since the event was detected (All Files)

  • Affected Files is list of files that tripped the threat detectors.

  • All files should be inspected to verify integrity

Signal Strengths

Each number from left to right is warning peak/ major peak /  critical peak threat rate file count.  This indicates the highest count seen for each severity configured in the settings tab.  The metric is a count per minute.   The higher the number for each severity indicates a higher security risk detected for the user behaviour.  It indicates more files were involved in the threat detection security event.  When comparing two different security events higher numbers indicates more files tripped the threat detector.

User

The domain and user account of the affected user

Detected

Date and time representing the beginning of the security event.   This event will stay until it is auto archived or is updated as resolved, or unresolved status.

Expires

This will show the time remaining before autoarchive as unresolved is applied to the event.  The autoarchive feature will only apply to events detected as warning and will monitor the event for this time period before archive the event as unresolved.


OR


If a timed lockout is active the time remaining until a lockout will occur.

Clients

This has a popup link to list the source ip address of the client machine the user was logged into when the signal event was detected.  This assists in finding the client on routers and switches in the environment.     Multiple ip’s can be listed for a client if they are logged into more than one machine.

Actions

Click to bring up the security event history of the event, all previous actions taken and menu to select available actions depending on the state of the security event.



Security Event State Descriptions


A Ransomware event in eyeglass can be in one of the following states:  


State

Description

WARNING

New Ransomware events with a WARNING severity initially have a WARNING state.

DELAYED_LOCKOUT

New Ransomware events with a MAJOR severity initially have a DELAYED_LOCKOUT state. This implies that the user has not yet been locked out, but will be if the event is not acknowledged.

LOCKED_OUT

New Ransomware events with a CRITICAL severity initially have a LOCKED_OUT state.


MAJOR severity events that are not acknowledged before the grace period elapses also have a LOCKED_OUT state.  


WARNING severity events have a LOCKED_OUT state if the Administrator explicitly locks out the user.

ACKNOWLEDGED

A WARNING severity event can be acknowledged to indicate that the admin has seen the event and is monitoring the situation.


MAJOR severity events change to ACKNOWLEDGED when the admin intervenes before the grace period has elapsed.


CRITICAL severity events can never be ACKNOWLEDGED.

ACCESS_RESTORED

An event is in RESOLVED state when the Administrator has restored access to a locked out user

SELF_RECOVERY

An event is in SELF_RECOVERY state when the Administrator has initiated a workflow for the user to recover the affected files.  See Data Recovery section in this guide.

RECOVERED

An event is in RECOVERED state when the user file recovery process is complete.


RECOVERED state events are not listed in the Active Events tab on eyeglass. They are listed in the Event History tab.

UNRESOLVED

An event is in UNRESOLVED state when the Administrator has archived the event, but not explicitly restored access to the user.  


UNRECOVERED state events will are not listed in the Active Events tab on eyeglass. They are listed in the Event History tab.

ERROR

An event is in ERROR state when eyeglass has attempted to initiate an action on the Administrator’s behalf, but that action has failed.


Security Event Possible Action Descriptions


The following actions are available to the Administrator at different stages of the Ransomware event lifecycle. The Required States column lists the state that the event must be in for the action to be available. Whenever an action is submitted, a new record is added to the event’s history.


Action

Required States

Result

Comment

ANY

Adds a comment to the event history

Acknowledge

WARNING

Changes the event to ACKNOWLEDGED state.

Stop Lockout Timer

DELAYED_LOCKOUT


Changes the event to ACKNOWLEDGED state. Disables any countdown for the grace period on MAJOR severity events.

Lockout

WARNING,

DELAYED_LOCKOUT

Initiates the procedure on eyeglass to revoke access to the user’s shares. Changes the event to the LOCKED_OUT state.

Restore User Access

LOCKED_OUT

Initiates the procedure on eyeglass to restore access to any shares where access was revoked in the lockout step. Changes the event to ACCESS_RESTORED state.

Initiate Self Recovery

ACKNOWLEDGED,

ACCESS_RESTORED

Launches the eyeglass workflow to allow the user to recover all files associated with this event. This procedure will put the event into the RECOVERED state when it is complete.


Events in the RECOVERED state.  


See Data Recovery section in this guide.

Mark as recovered

ACKNOWLEDGED,

ACCESS_RESTORED,

SELF_RECOVERY

Allows the admin to manually mark an event as having been recovered. This can happen if the administrator manually restores files, or the user decides that they do not need the encrypted files.

Archive as Unresolved

WARNING,

ACKNOWLEDGED,

LOCKED_OUT,

ACCESS_RESTORED,

SELF_RECOVERY,

ERROR


The administrator can archive an event in nearly any state. The event gets put into event history, and is no longer shown on the active events screen.





How to respond to Security Events for warning, major or critical events


Overview of Security Event Triage Process


Upon a security event being detected the following steps to review and take actions should be followed.



  1. Review the severity (Warning, Major and Critical).  

    1. If Warning

      1. Review the list of files affected for user account ip address by selecting the link in the Files column.

      2. The file list view shows files that triggered the security event and the last hour of files accessed by the user that should be reviewed for possible compromise or data recovery.

      3. Screen Shot 2017-04-20 at 8.55.50 AM.png

      4. If the affected files are the result of normal file operations and not a malicious event, the event can be marked as resolved with the actions menu.

      5. See Action Menu Security Event Actions table below.

      6. Security event closed and moved to Event history tab.

    2. If Major

      1. Review affected files, user name and IP address to locate user in AD and your organization

      2. Review time to lockout timer in Active Events tab which is the time until the lockout will be issued.

        1. If you determine this is a false alarm by contacting the user along with an assessment of the affected files, use the Action Menu to Stop the Lockout timer and then mark security event as Resolved.

        2. See Action Menu Security Event Actions table below.

      3. If you determine it is a malicious security event, you can accelerate the lockout timer by using the Action menu to select Lockout Now.

      4. See Action Menu Security Event Actions table below.

      5. Recovery: Re-image machine or other recovery procedures that your policies require.  Determine which files to be recovered on the Isilon by selecting the files option on the security event.  From this screen you can download a CSV file of trigger files AND files from the last 1 hour of activity.

      6. Restore User Access:  Take this step after it has been determined it is safe to restore access to the user.  The actions menu can be used to remove the lockout from the user account to all cluster shares the user had access too.  Using the Actions menu restore user access.

      7. See Action Menu Security Event Actions table below.

      8. The security event will now be in Restored state and can be archived to the Event History tab.  Using the actions menu submit a Mark As resolved action.

      9. See Action Menu Security Event Actions table below.

      10. Done.

    3. If Critical

      1. The security event will have a lockout applied immediately since it is a critical detection.

      2. Recovery: Re-image machine or other recovery procedures that your policies require.  Determine which files to be recovered on the Isilon by selecting the files option on the security event.  From this screen you can download a CSV file of trigger files AND files from the last 1 hour of activity.

      3. Restore User Access:  After it has been determined it is safe to restore access to the user.  The actions menu can be used to remove the lockout from the user account to all cluster shares the user had access too.  Using the Actions menu restore user access.

      4. See Action Menu Security Event Actions table below.

      5. The security event will now be in Restored state and can be archived to the Event History tab.  Using the actions menu submit a Mark As resolved action.

      6. See Action Menu Security Event Actions table below.

      7. Done.




Security Event Action State Descriptions

Once a user security event appears in the active events tab the following operations are possible by clicking the Actions icon. Each state has several possible actions.  The table below describes the options available for each state of a security event.



State of Event

Possible Actions

Warning State

Screen Shot 2017-04-02 at 1.16.27 PM.png

  1. Comment on the event to update the security response or assessment of the event.  Can be viewed by other administrators that review the security event history.

  2. Archive as Unsolved - Moves event to the History tab.  

  3. Lockout -  From the Access Restored state it's possible to re-lockout the user again from the action menu.  This applies deny permission to all shares stored within the lockout event.

  4. Acknowledged State - An administrator has acknowledged this event but has not marked as resolved.  In this state the user is not locked out or in timed lockout states.



Locked out User State (Critical Severity Threat Detection)

Screen Shot 2017-04-02 at 11.45.48 AM.png

  1. Comment on the event to update the security response or assessment of the event.  Can be viewed by other administrators that review the security event history.

  2. Restore User Access - This will reverse the lockout and grant access to the shares that were locked out.  Review the lockout details for a full list of shares and clusters that lockout was applied.

    1. Once Restore User Access  is launched, this will start a restore access job (running jobs window) and real-time restore access to the share last that was locked out.

    2. Verify user has access

    3. Verify a cluster share to confirm restore access was successful

  3. Archive as Unsolved - Leaves the lockout applied and moves event to the History tab.  Not recommended unless user access is permanently revoked.


Access Restored State

Screen Shot 2017-04-02 at 12.58.43 PM.png

  1. Mark as Recovered - This option appears to allow archiving the security event to the history tab.

  2. Lockout -  From the Access Restored state it's possible to re-lockout the user again from the action menu.  This applies deny permission to all shares stored within the lockout event.

  3. Initiate Self Recovery -  This option will only function of the Cluster Storage Monitor addon is purchased.  It integrates with the Backup Recovery User portal to create secured shares to snapshots and DR data that allow the user to recover data from snapshots.  The temporary shares will have a time to live of 2 days by default, after which they will be deleted.  The shares are secured only to the user involved in the lockout.  The data recovery request will require approval in the Data Recovery Manager Icon. See Data Recovery section in this guide.  (If licensed)

  4. Comment on the event to update the security response or assessment of the event.  Can be viewed by other administrators that review the security event history.

  5. Restore User Access - (Allows to re-run this job in the event a share or update failed) This will reverse the lockout and grant access to the shares that were locked out.  Review the lockout details for a full list of shares and clusters that lockout was applied.

    1. Once Restore User Access  is launched, this will start a restore access job (running jobs window) and real-time restore access to the share last that was locked out.

    2. Verify user has access

    3. Verify a cluster share to confirm restore access was successful

  6. Archive as Unsolved - Leaves the lockout applied and moves event to the History tab.  Note recommended unless user access is permanently revoked.

Delayed Lockout state

Screen Shot 2017-04-02 at 1.07.57 PM.png

  1. Lockout -  From the Access Restored state it's possible to re-lockout the user again from the action menu.  This applies deny permission to all shares stored within the lockout event.

  2. Stop Lockout Timer-  This option can be used to stop the timed lockout.  This would be used when investigation determines the user account should not be locked out.  Run the stop lock option.

  3. The status changes to Acknowledged and not lock out will occur.

  4. Comment on the event to update the security response or assessment of the event.  Can be viewed by other administrators that review the security event history.


Acknowledged State


Screen Shot 2017-04-02 at 1.12.57 PM.png

  1. Comment on the event to update the security response or assessment of the event.  Can be viewed by other administrators that review the security event history.

  2. Archive as Unsolved - Leaves the lockout applied and moves event to the History tab.  Note recommended unless user access is permanently revoked.

  3. Initiate Self Recovery -  This option will only function of the Cluster Storage Monitor addon is purchased.  It integrates with the Backup Recovery User portal to create secured shares to snapshots and DR data that allow the user to recover data from snapshots.  The temporary shares will have a time to live of 2 days by default, after which they will be deleted.  The shares are secured only to the user involved in the lockout.  The data recovery request will require approval in the Data Recovery Manager Icon.  (If licensed)

  4. Mark as Recovered - This option appears to allow archiving the security event to the history tab.

Archived Event on Event History

Screen Shot 2017-04-02 at 1.06.01 PM.png

  1. Comment on the event to update the security response or assessment of the event.  Can be viewed by other administrators that review the security event history.








Rapid Machine to Machine Malware Spreading Attack Defense Overview


Ransomware Defender can use multiple cluster detections to elevate the automated response due to the severity of the detection and number of concurrent security events.  Refer to the diagram below






Rapid Machine to Machine Malware Attack Auto Response Escalation Configuration


This feature is designed to protect against a multi user scenario where malware affects many machines in a short period of time and when malware is spreading from machine to machine.  The goal in this scenario is to escalate the response automatically based on the number of concurrent events.   The example below walks through how warning → major → critical response escalation will occur based on settings.  


Best Practise: Set the warning to major to a higher number example 30 and major to critical to half of the warning example 15.

  1. Screen Shot 2017-06-06 at 7.38.04 AM.png

    1. Major and Upgrade to Critical events are set to upgrade the severity to this level when a lower severity detection event matches or exceeds the number entered.

    2. Example (A) if Upgrade Major is set to 5 this means 5 separate Warning events are detected and will be auto upgraded to Major and timed lockout started.  

    3. Example (B) if Upgrade Critical is set to 5 this means 5 separate Major events are detected and will be auto upgraded to Critical and immediate lockout response activated.  



False Positive Security Event Handling and Configuration Options


This section documents how to react to false positive security events.  


Warning

  1. Small number or occasional warning events are detected no action is needed since no end user action is taken.  Each warning will survive for 8 hours default and is continuously monitored each user file action to determine if the warning should be promoted to a Major or Critical event. Actions: No action is needed the events will automatically expire after a typical working day and move to the Event history tab.  This ensures a record of the event.

  2. High number of warning events  The thresholds for threat detectors should be increased.  Action: Open a case with support (support.superna.net) to get recommended threat detector values to be changed on the settings tab of the Ransomware Defender window.  Support logs will be required.

    1. Screen Shot 2017-06-06 at 7.38.04 AM.png

Major

  1. Small number of Major events. User is locked out after delay timer and it's determined the detection was false.  If the user or application workflow is triggering a lockout an ignore list is recommended.

  2. Follow steps in “Ignore List setting Procedures

  3. High number of Major events.   

    1. Follow steps in “Enabling Monitor only Mode

    2. NOTE: In this release existing locked out users will need to be restored using action menu and then archived using Action Mark as resolved.

    3. Contact Support with support logs to get adjusted threat detector settings.

Critical

  1. Small number of critical events. Same as Major.  Add to Ignore list and save.

    1. Follow steps in “Ignore List setting Procedures

  2. High number of critical events.

    1. Follow steps in “Enabling Monitor only Mode”   This will disable user actions quickly in the event many users are detected and locked out.

    2. NOTE: In this release existing locked out users will need to be restored using action menu and then archived using Action Mark as resolved.

    3. Contact Support with support logs to get adjusted threat detector settings.



Ignore List setting Procedures


Follow the steps below to add ignore list of paths, uses or server/client source ip.



  1. Open Ransomware Defender window.

    1. Select Ignored List tab

    2. Screen Shot 2017-04-20 at 8.32.53 PM.png

    3. Enter a path, AD user domain\userid, or server or client ip address and save.



Eyeglass User Lockout Active Directory Planning


The lockout process identifies all shares the user has access based on searching all shares in all access zones on all clusters managed by Eyeglass.  This list of shares will have a real-time deny permission added to the share for the affected user.


A special case is handled for the “Everyone” well known group which should be understood how it operates in multi-domain Active Directory configurations.


Two scenarios can exist with AD domains on Isilon clusters.  


Scenario #1:

  • The first is parent and child AD domains that are members of the same forest and a trust relationship exists.

Scenario #2:

  • The second scenario covers two domains that are not members of the same forest and no trust relationship exists between the domains


The Everyone well known group if applied to a share in each scenario is shown below and a lockout permission applied regardless of which domain the user is located.  This is required since Eyeglass has no way to know if the domains trust each other or not.  This solution ensures all everyone shares are locked out, which is more secure than skipping some shares.


Reference the diagram below.








Role Based Access


  1. To create a dedicated role to perform administration and monitoring for Ransomware events

  2. Open User Roles Icon

  3. Assign the Ransomware Defender role to a user or group

  4. Screen Shot 2017-03-30 at 9.00.36 PM.png


Remote Service Authentication and Protocol

Eyeglass can communicate with multiple ransomware defender endpoints. Each endpoint must have a unique API token, generated by the Superna Eyeglass REST API window:


Once a token has been generated for a specific ECA, it can be used in that ECA’s startup command for authentication, along with the location of eyeglass.


Communication with the ECA is bidirectional at the start (ECA -> Eyeglass for security events).  Eyeglass will query the analytics database and test database access on regular interval.   


The ECA should:

  1. Heartbeat

  2. Notify eyeglass of any detected ransomware threats

  3. Periodically send statistics on processed events.

  4. Periodically poll for updated ransomware definitions, thresholds, and Ignore list settings.




Service Registration Monitoring in Eyeglass


Eyeglass icon “Manage Services” displays all registered ECA’s and CA UIM probes operating remotely from the Eyeglass appliance.  The screenshot below shows 3 ECA nodes registered and the health of each process running inside the node.



Service States

  1. Active:  Has checked in with heartbeat

  2. In-Active: Has failed to heartbeat, no longer processing

Health States

  1. Up - running and up time in days

  2. Down - not running


The Delete icon per service registration should not be used unless directed by support. This will remove the registration from the remote service.

Security Guard - Automated Security Testing


Ransomware Defender monitors cluster IO for suspicious user behaviour.  Under normal day to day conditions no actions are required since alerts are sent in the event of a warning, major or critical security event.


The Security Guard feature simulates a Ransomware attack on a daily bases to validate all components are functioning including alerting and lockout of user sessions.  Once configured administrators get daily updates that Ransomware Defender is actively monitoring and responding to Ransomware events.


This offers you the highest level of confidence that your environment is ready in the event a malicious virus is inside your network and finds shares to attack data.


The feature will create a “honeypot share with name igls-honeypot” in the System Zone of each cluster managed by a Ransomware agent license key.   The feature can simulate an attack on demand or on a scheduled interval.


Simulated Attack

  1. Creates share automatically secured to the service account.

  2. Share name igls-honeypot

  3. Creates test files using a well known extension to trigger a simulated attack response from Ransomware Defender Clustered agent

  4. Verifies the user lockout occurs by checking that files cannot be written to the share

  5. Initiates recovery of the user and verifies access to the share again

  6. Reports success and failure per step

  7. Emails administrator results


Pre-Requisites

  1. System zone must have an AD provider

  2. A user account created in Active Directory within the System zone AD provider. This user is not a special user in any way and should be a normal user created, Home directory does not matter.

  3. System zone must be enabled in the audit configuration on the Isilon cluster


Security Guard Lockout Behavior


  1. The user does not need to be added to any shares. The Security guard will create its own share in system zone called igls-honeypot and add the service account user to the share.

  2. If you add the service account user to other shares, only the igls-honeypot share will have files written during the execution of a simulated attack.

  3. Additional shares that have the service account add to the share permissions WILL  have the service account access locked out during simulated attacks.



Configuration

  1. Open the Ransomware Defender window on the desktop and select the Security guard

Screen Shot 2017-03-30 at 8.51.49 PM.png

Ransomware Defender Security Guard Configuration


  1. Enter active directory service account and password from system zone authentication provider. Example domain\userid or user@domain.

  2. Enable Security Guard in Settings

  3. Set interval to schedule simulated attacks

  4. Select check box of each cluster to simulate the attack

  5. Submit saves settings

  6. Run now tests Security guard on demand.


How to Run on Demand Security Guard Penetration test


  1. Open the Ransomware Icon

  2. Select Security guard tab

  3. Select each licensed cluster to test

  4. Select Run Now

  5. Screen Shot 2017-03-31 at 5.36.50 PM.png

  6. Open Jobs icon

  7. Running jobs tab to monitor progress

  8. Screen Shot 2017-03-31 at 5.32.38 PM.png



How to Review Security Guard Penetration test history and logs


  1. Open the Ransomware Defender window

  2. Select Security Guard tab

  3. Select each licensed cluster to test

  4. Select run now

  5. Screen Shot 2017-03-31 at 5.36.50 PM.png

  6. Click Open link to review results

Ransomware ECA Cluster Operational Procedures


Eyeglass Cluster Maintenance Operations


Note:  Restart of the OS will not auto start up the cluster post boot.  Follow steps in this section for cluster OS shutdown, restart and boot process.

Cluster OS shutdown or restart


  1. To correctly shutdown the cluster

  2. Login as admin via ssh on the master node (Node 1)

  3. ecactl cluster down (wait until all nodes are down)

  4. Now shutdown the OS nodes from ssh login to each node

  5. ssh to each node

  6. Type sudo -s (enter admin password)

  7. Type shutdown


Cluster Startup

  1. ssh to the master node (node 1)

  2. Login as admin user

  3. ecactl cluster up

  4. Verify boot messages shows user tables exist and signal table exists (this step verifies connection to analytics database over HDFS on startup)

  5. Verify cluster is up

  6. ecactl cluster status (verify containers and table exist in the output)

  7. Done.


Cluster IP Change


  1. To correctly change  the cluster node ip addresses

  2. Login as admin via ssh on the master node (Node 1)

  3. ecactl cluster down (wait until completely down)

  4. Sudo to root

    1. sudo -s (enter admin password)

    2. Type yast

    3. Navigate to networking to change the IP address on the interface)

    4. Screen Shot 2017-04-25 at 4.50.15 PM.png


    5. Screen Shot 2017-04-25 at 4.50.35 PM.png

    6. Screen Shot 2017-04-25 at 4.50.44 PM.png

    7. Screen Shot 2017-04-25 at 4.51.16 PM.png

    8. Screen Shot 2017-04-25 at 4.51.24 PM.png

    9. Each screenshot shows ip, dns, router settings

    10. Save and exit yast

    11. Repeat on all nodes in the cluster

  5. Once completed changes verify network connectivity with ping and DNS nslookup

  6. Edit  with ‘ nano /opt/superna/eca/eca-env-common.conf ’ on the master node (Node 1)

    1. Edit the ip addresses of each node to match new new settings

    2. export ECA_LOCATION_NODE_1=x.x.x.x

    3. export ECA_LOCATION_NODE_2=x.x.x.x2

    4. export ECA_LOCATION_NODE_3=x.x.x.x3

    5. Control X to exit and save

  7. Start cluster up

    1. From master node (Node 1)

    2. ecactl cluster up (verify boot messages look as expected)

  8. Eyeglass /etc/hosts file validation

    1. Once the ECA cluster is up

    2. Login to Eyeglass as admin via ssh

    3. Type cat /etc/hosts

    4. Verify the new ip address assigned to the ECA cluster is present in the hosts file.

    5. If it is not correct edit the hosts file and correct the IP addresses for each node.

  9. Login to Eyeglass and open the Manage Services window.  Verify active ECA nodes are detected as Active and Green.

  10. Done


Single ECA Node Restart or Host crash Affect 1 or more ECA nodes


Use this procedure when restarting one ECA node, which under normal conditions should not be done unless directed by support.  The other use case is when a host running an ECA VM is restarted for maintenance and a node will leave the cluster and needs to rejoin.



  1. On the master node

  2. Login via ssh as admin

  3. Type command :  ecactl cluster refresh  (this command will re-integrate this node back into the cluster and check access to database tables on all nodes)

  4. Verify output

  5. Now type: ecactl db shell

  6. type : status

  7. Verify no dead servers are listed

  8. If no dead servers

  9. Login to Eyeglass GUI, check Managed Services and verify all nodes are green.

  10. Cluster node integration procedure completed.

Eyeglass ECA Cluster Operations


Stop ECA Database


  • ecactl db stop

eca-node01:~ # ecactl db stop

Warning: Permanently added '[localhost]:2200' (ECDSA) to the list of known hosts.

stopping hbase...................

db_node_1.eca_superna_local: Warning: Permanently added '[db_node_1.eca_superna_local]:2200,[172.18.0.2]:2200' (ECDSA) to the list of known hosts.

db_node_1.eca_superna_local: stopping zookeeper.

db_node_2.eca_superna_local: Warning: Permanently added '[db_node_2.eca_superna_local]:2200,[172.22.1.19]:2200' (ECDSA) to the list of known hosts.

db_node_2.eca_superna_local: stopping zookeeper.

db_node_3.eca_superna_local: Warning: Permanently added '[db_node_3.eca_superna_local]:2200,[172.22.1.20]:2200' (ECDSA) to the list of known hosts.

db_node_3.eca_superna_local: stopping zookeeper.

eca-node01:~ #





Start ECA Database

  • ecactl db start

eca-node01:~ # ecactl db start

Warning: Permanently added '[localhost]:2200' (ECDSA) to the list of known hosts.

db_node_3.eca_superna_local: Warning: Permanently added '[db_node_3.eca_superna_local]:2200,[172.22.1.20]:2200' (ECDSA) to the list of known hosts.

db_node_3.eca_superna_local: starting zookeeper, logging to /opt/hbase/bin/../logs/hbase-root-zookeeper-db_node_3.eca_superna_local.out

db_node_2.eca_superna_local: Warning: Permanently added '[db_node_2.eca_superna_local]:2200,[172.22.1.19]:2200' (ECDSA) to the list of known hosts.

db_node_2.eca_superna_local: starting zookeeper, logging to /opt/hbase/bin/../logs/hbase-root-zookeeper-db_node_2.eca_superna_local.out

db_node_1.eca_superna_local: Warning: Permanently added '[db_node_1.eca_superna_local]:2200,[172.18.0.2]:2200' (ECDSA) to the list of known hosts.

db_node_1.eca_superna_local: starting zookeeper, logging to /opt/hbase/bin/../logs/hbase-root-zookeeper-db_node_1.eca_superna_local.out

starting master, logging to /opt/hbase/bin/../logs/hbase-root-master-db_node_1.eca_superna_local.out

db_node_3.eca_superna_local: Warning: Permanently added '[db_node_3.eca_superna_local]:2200,[172.22.1.20]:2200' (ECDSA) to the list of known hosts.

db_node_3.eca_superna_local: starting regionserver, logging to /opt/hbase/bin/../logs/hbase-root-regionserver-db_node_3.eca_superna_local.out

db_node_2.eca_superna_local: Warning: Permanently added '[db_node_2.eca_superna_local]:2200,[172.22.1.19]:2200' (ECDSA) to the list of known hosts.

db_node_2.eca_superna_local: starting regionserver, logging to /opt/hbase/bin/../logs/hbase-root-regionserver-db_node_2.eca_superna_local.out

db_node_1.eca_superna_local: Warning: Permanently added '[db_node_1.eca_superna_local]:2200,[172.18.0.2]:2200' (ECDSA) to the list of known hosts.

db_node_1.eca_superna_local: starting regionserver, logging to /opt/hbase/bin/../logs/hbase-root-regionserver-db_node_1.eca_superna_local.out

db_node_3.eca_superna_local: Warning: Permanently added '[db_node_3.eca_superna_local]:2200,[172.22.1.20]:2200' (ECDSA) to the list of known hosts.

db_node_3.eca_superna_local: starting master, logging to /opt/hbase/bin/../logs/hbase-root-master-db_node_3.eca_superna_local.out

db_node_2.eca_superna_local: Warning: Permanently added '[db_node_2.eca_superna_local]:2200,[172.22.1.19]:2200' (ECDSA) to the list of known hosts.

db_node_2.eca_superna_local: starting master, logging to /opt/hbase/bin/../logs/hbase-root-master-db_node_2.eca_superna_local.out

eca-node01:~ #



Checking ECA database Status:


ecactl db shell

2017-01-24 21:29:39,191 WARN  [main] util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

HBase Shell; enter 'help<RETURN>' for list of supported commands.

Type "exit<RETURN>" to leave the HBase Shell

Version 1.2.4, r67592f3d062743907f8c5ae00dbbe1ae4f69e5af, Tue Oct 25 18:10:20 CDT 2016


hbase(main):001:0> status

1 active master, 2 backup masters, 3 servers, 0 dead, 1.6667 average load


hbase(main):002:0>



hbase(main):002:0> status 'detailed'

version 1.2.4

0 regionsInTransition

active master:  db_node_1.eca_superna_local:16000 1485293148048

2 backup masters

   db_node_2.eca_superna_local:16000 1485293150704

   db_node_3.eca_superna_local:16000 1485293150606

master coprocessors: []

3 live servers

   db_node_2.eca_superna_local:16020 1485293149046

       requestsPerSecond=0.0, numberOfOnlineRegions=1, usedHeapMB=20, maxHeapMB=955, numberOfStores=2, numberOfStorefiles=2, storefileUncompressedSizeMB=0, storefileSizeMB=0, memstoreSizeMB=0, storefileIndexSizeMB=0, readRequestsCount=0, writeRequestsCount=0, rootIndexSizeKB=0, totalStaticIndexSizeKB=0, totalStaticBloomSizeKB=0, totalCompactingKVs=0, currentCompactedKVs=0, compactionProgressPct=NaN, coprocessors=[]

       "user,,1485281658079.c6d7bca7da467b5a0d0ebee99c1811f9."

           numberOfStores=2, numberOfStorefiles=2, storefileUncompressedSizeMB=0, lastMajorCompactionTimestamp=0, storefileSizeMB=0, memstoreSizeMB=0, storefileIndexSizeMB=0, readRequestsCount=0, writeRequestsCount=0, rootIndexSizeKB=0, totalStaticIndexSizeKB=0, totalStaticBloomSizeKB=0, totalCompactingKVs=0, currentCompactedKVs=0, compactionProgressPct=NaN, completeSequenceId=-1, dataLocality=0.0

   db_node_3.eca_superna_local:16020 1485293149077

       requestsPerSecond=0.0, numberOfOnlineRegions=2, usedHeapMB=26, maxHeapMB=955, numberOfStores=3, numberOfStorefiles=3, storefileUncompressedSizeMB=0, storefileSizeMB=0, memstoreSizeMB=0, storefileIndexSizeMB=0, readRequestsCount=16, writeRequestsCount=4, rootIndexSizeKB=0, totalStaticIndexSizeKB=0, totalStaticBloomSizeKB=0, totalCompactingKVs=0, currentCompactedKVs=0, compactionProgressPct=NaN, coprocessors=[MultiRowMutationEndpoint]

       "file,,1485281646872.c6968096ee56fd2214675f939f06e470."

           numberOfStores=2, numberOfStorefiles=2, storefileUncompressedSizeMB=0, lastMajorCompactionTimestamp=0, storefileSizeMB=0, memstoreSizeMB=0, storefileIndexSizeMB=0, readRequestsCount=0, writeRequestsCount=0, rootIndexSizeKB=0, totalStaticIndexSizeKB=0, totalStaticBloomSizeKB=0, totalCompactingKVs=0, currentCompactedKVs=0, compactionProgressPct=NaN, completeSequenceId=-1, dataLocality=0.0

       "hbase:meta,,1"

           numberOfStores=1, numberOfStorefiles=1, storefileUncompressedSizeMB=0, lastMajorCompactionTimestamp=0, storefileSizeMB=0, memstoreSizeMB=0, storefileIndexSizeMB=0, readRequestsCount=16, writeRequestsCount=4, rootIndexSizeKB=0, totalStaticIndexSizeKB=0, totalStaticBloomSizeKB=0, totalCompactingKVs=0, currentCompactedKVs=0, compactionProgressPct=NaN, completeSequenceId=-1, dataLocality=0.0

   db_node_1.eca_superna_local:16020 1485293149663

       requestsPerSecond=0.0, numberOfOnlineRegions=2, usedHeapMB=22, maxHeapMB=955, numberOfStores=3, numberOfStorefiles=3, storefileUncompressedSizeMB=0, storefileSizeMB=0, memstoreSizeMB=0, storefileIndexSizeMB=0, readRequestsCount=4, writeRequestsCount=0, rootIndexSizeKB=0, totalStaticIndexSizeKB=0, totalStaticBloomSizeKB=0, totalCompactingKVs=0, currentCompactedKVs=0, compactionProgressPct=NaN, coprocessors=[]

       "hbase:namespace,,1485281642137.d63ebb0b372206aa32b632867b4fcc56."

           numberOfStores=1, numberOfStorefiles=1, storefileUncompressedSizeMB=0, lastMajorCompactionTimestamp=0, storefileSizeMB=0, memstoreSizeMB=0, storefileIndexSizeMB=0, readRequestsCount=4, writeRequestsCount=0, rootIndexSizeKB=0, totalStaticIndexSizeKB=0, totalStaticBloomSizeKB=0, totalCompactingKVs=0, currentCompactedKVs=0, compactionProgressPct=NaN, completeSequenceId=-1, dataLocality=0.0

       "signal,,1485281668216.a4bbcac83e4857b17789f374c69fee92."

           numberOfStores=2, numberOfStorefiles=2, storefileUncompressedSizeMB=0, lastMajorCompactionTimestamp=0, storefileSizeMB=0, memstoreSizeMB=0, storefileIndexSizeMB=0, readRequestsCount=0, writeRequestsCount=0, rootIndexSizeKB=0, totalStaticIndexSizeKB=0, totalStaticBloomSizeKB=0, totalCompactingKVs=0, currentCompactedKVs=0, compactionProgressPct=NaN, completeSequenceId=-1, dataLocality=0.0

0 dead servers


hbase(main):003:0>




Bring Up the ECA Cluster:


ecactl cluster up

eca-node01:~ # ecactl cluster up

Starting services on all cluster nodes.


Starting service containers on node: 172.22.1.18

Creating network "eca_superna_local" with driver "bridge"

Creating db_node_1

Creating eca_rmq_1

executing: docker-compose -f /opt/superna/eca/docker-compose.yml up -d rmq db


Starting service containers on node: 172.22.1.19

Creating network "eca_superna_local" with driver "bridge"

Creating db_node_2

Creating eca_rmq_1

executing: docker-compose -f /opt/superna/eca/docker-compose.yml up -d rmq db


Starting service containers on node: 172.22.1.20

Creating network "eca_superna_local" with driver "bridge"

Creating db_node_3

Creating eca_rmq_1

executing: docker-compose -f /opt/superna/eca/docker-compose.yml up -d rmq db


Starting database service on node 172.22.1.18

Warning: Permanently added '[localhost]:2200' (ECDSA) to the list of known hosts.

db_node_1.eca_superna_local: Warning: Permanently added '[db_node_1.eca_superna_local]:2200,[172.18.0.2]:2200' (ECDSA) to the list of known hosts.

db_node_1.eca_superna_local: starting zookeeper, logging to /opt/hbase/bin/../logs/hbase-root-zookeeper-db_node_1.eca_superna_local.out

db_node_3.eca_superna_local: Warning: Permanently added '[db_node_3.eca_superna_local]:2200,[172.22.1.20]:2200' (ECDSA) to the list of known hosts.

db_node_3.eca_superna_local: starting zookeeper, logging to /opt/hbase/bin/../logs/hbase-root-zookeeper-db_node_3.eca_superna_local.out

db_node_2.eca_superna_local: Warning: Permanently added '[db_node_2.eca_superna_local]:2200,[172.22.1.19]:2200' (ECDSA) to the list of known hosts.

db_node_2.eca_superna_local: starting zookeeper, logging to /opt/hbase/bin/../logs/hbase-root-zookeeper-db_node_2.eca_superna_local.out

starting master, logging to /opt/hbase/bin/../logs/hbase-root-master-db_node_1.eca_superna_local.out

db_node_3.eca_superna_local: Warning: Permanently added '[db_node_3.eca_superna_local]:2200,[172.22.1.20]:2200' (ECDSA) to the list of known hosts.

db_node_3.eca_superna_local: starting regionserver, logging to /opt/hbase/bin/../logs/hbase-root-regionserver-db_node_3.eca_superna_local.out

db_node_2.eca_superna_local: Warning: Permanently added '[db_node_2.eca_superna_local]:2200,[172.22.1.19]:2200' (ECDSA) to the list of known hosts.

db_node_2.eca_superna_local: starting regionserver, logging to /opt/hbase/bin/../logs/hbase-root-regionserver-db_node_2.eca_superna_local.out

db_node_1.eca_superna_local: Warning: Permanently added '[db_node_1.eca_superna_local]:2200,[172.18.0.2]:2200' (ECDSA) to the list of known hosts.

db_node_1.eca_superna_local: starting regionserver, logging to /opt/hbase/bin/../logs/hbase-root-regionserver-db_node_1.eca_superna_local.out

db_node_3.eca_superna_local: Warning: Permanently added '[db_node_3.eca_superna_local]:2200,[172.22.1.20]:2200' (ECDSA) to the list of known hosts.

db_node_3.eca_superna_local: starting master, logging to /opt/hbase/bin/../logs/hbase-root-master-db_node_3.eca_superna_local.out

db_node_2.eca_superna_local: Warning: Permanently added '[db_node_2.eca_superna_local]:2200,[172.22.1.19]:2200' (ECDSA) to the list of known hosts.

db_node_2.eca_superna_local: starting master, logging to /opt/hbase/bin/../logs/hbase-root-master-db_node_2.eca_superna_local.out

Connection to 172.22.1.18 closed.


Checking for the existence of a schema in the db...


2017-01-24 21:35:06,091 WARN  [main] util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

file table already exists


Checking for the user table...

2017-01-24 21:35:18,118 WARN  [main] util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

user table already exists


Checking for the signal table...

2017-01-24 21:35:22,786 WARN  [main] util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

signal table already exists


Initializing application


Starting application on node 172.22.1.18

db_node_1 is up-to-date

eca_rmq_1 is up-to-date

Creating eca_iglssvc_1

Creating eca_ceefilter_1

Creating eca_cee_1

Creating eca_fastanalysis_1

executing: docker-compose -f /opt/superna/eca/docker-compose.yml up -d


Starting application on node 172.22.1.19

db_node_2 is up-to-date

eca_rmq_1 is up-to-date

Creating eca_fastanalysis_1

Creating eca_cee_1

Creating eca_ceefilter_1

Creating eca_iglssvc_1

executing: docker-compose -f /opt/superna/eca/docker-compose.yml up -d


Starting application on node 172.22.1.20

db_node_3 is up-to-date

eca_rmq_1 is up-to-date

Creating eca_ceefilter_1

Creating eca_cee_1

Creating eca_fastanalysis_1

Creating eca_iglssvc_1

executing: docker-compose -f /opt/superna/eca/docker-compose.yml up -d


eca-node01:~ #



Bring down the ECA Cluster:

ecactl cluster down


eca-node01:~ # ecactl cluster down

Stopping services on all cluster nodes.


Stopping service containers on node: 172.22.1.18

Stopping eca_fastanalysis_1 ... done

Stopping eca_iglssvc_1 ... done

Stopping eca_cee_1 ... done

Stopping eca_ceefilter_1 ... done

Stopping eca_rmq_1 ... done

Stopping db_node_1 ... done

Removing eca_fastanalysis_1 ... done

Removing eca_iglssvc_1 ... done

Removing eca_cee_1 ... done

Removing eca_ceefilter_1 ... done

Removing eca_rmq_1 ... done

Removing db_node_1 ... done

Removing network eca_superna_local

executing: docker-compose -f /opt/superna/eca/docker-compose.yml down



Stopping service containers on node: 172.22.1.19

Stopping eca_iglssvc_1 ... done

Stopping eca_ceefilter_1 ... done

Stopping eca_cee_1 ... done

Stopping eca_fastanalysis_1 ... done

Stopping eca_rmq_1 ... done

Stopping db_node_2 ... done

Removing eca_iglssvc_1 ... done

Removing eca_ceefilter_1 ... done

Removing eca_cee_1 ... done

Removing eca_fastanalysis_1 ... done

Removing eca_rmq_1 ... done

Removing db_node_2 ... done

Removing network eca_superna_local

executing: docker-compose -f /opt/superna/eca/docker-compose.yml down



Stopping service containers on node: 172.22.1.20

Stopping eca_iglssvc_1 ... done

Stopping eca_cee_1 ... done

Stopping eca_fastanalysis_1 ... done

Stopping eca_ceefilter_1 ... done

Stopping eca_rmq_1 ... done

Stopping db_node_3 ... done

Removing eca_iglssvc_1 ... done

Removing eca_cee_1 ... done

Removing eca_fastanalysis_1 ... done

Removing eca_ceefilter_1 ... done

Removing eca_rmq_1 ... done

Removing db_node_3 ... done

Removing network eca_superna_local

executing: docker-compose -f /opt/superna/eca/docker-compose.yml down


eca-node01:~ #




Checking events and logs

  • Verifying events are being stored in the DB:

seca-node01:~ # ecactl db shell

2017-01-24 21:39:58,157 WARN  [main] util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

HBase Shell; enter 'help<RETURN>' for list of supported commands.

Type "exit<RETURN>" to leave the HBase Shell

Version 1.2.4, r67592f3d062743907f8c5ae00dbbe1ae4f69e5af, Tue Oct 25 18:10:20 CDT 2016


hbase(main):001:0> count 'file'

145 row(s) in 0.4820 seconds


=> 145

hbase(main):002:0> count 'signal'

3 row(s) in 0.0260 seconds


=> 3

hbase(main):003:0> count 'user'

145 row(s) in 0.0950 seconds


=> 145

hbase(main):004:0>



  • Tail the log of ransomware signal candidates: ecactl logs --follow iglssvc

eca-node01:~ # ecactl logs --follow iglssvc

Attaching to eca_iglssvc_1

iglssvc_1       | 2017-01-24 21:35:30,068 IglsSvcAnalysisModule:63 INFO : Service base module starting up

iglssvc_1       | 2017-01-24 21:35:31,092 IglsSvcAnalysisModule:78 INFO : Service base module initialized




CEE Event Rate Monitoring for Sizing ECA Cluster Compute


To assist with managing event rates processed by the ECA.  OneFS provides statistics that assist with measuring performance.  


Run this commands to measure audit event rate and use the installation guide to assist with choosing a single or multi VMware host configuration based on the event rate.


Statistics provided by OneFS for CEE Auditing


These stats can be used with examples below to get stats from the cluster.

Node.audit.cee.export.rate (key state for ECA sizing)

Node.audit.cee.export.total (debug event to ECA message flow)

Node.audit.events.logged.rate (not needed for ECA sizing but indicates cluster logging rate)

Node.audit.events.logged.total (not needed for ECA sizing but indicates cluster logging rate)



CEE Statistics ISI command examples

  1. (onefs 8) isi statistics query current --stats=node.audit.cee.export.rate --nodes=all  (get event rate on all nodes)

  2. (onefs 7) isi statistics query --stats=node.audit.cee.export.rate --nodes=all  (get event rate on all nodes)

  3. (onefs 8) isi statistics query current --stats=node.audit.cee.export.total --nodes=all (get events total on all nodes)

  4. (onefs 7) isi statistics query --stats=node.audit.cee.export.total --nodes=all (get events total on all nodes)


Monitor Cluster Event forwarding Delay to ECA cluster


  1. isi audit progress global view (command allows one to see the oldest unsent protocol audit event for the cluster)  A Large gap in time between the logged event and the sent event time stamp indicates the cluster forwarding rate is not keeping up with the rate of audit events.  EMC SR should be opened if this condition persists.


Set the CEE Event forwarding start Date


This command can be used to set a current forwarding if auditing has been enabled without a CEE endpoint configured.   Only use if directed by Support.

  1. The following will update the pointer to forward events newer than Nov 19, 2014 at 2pm

    1. OneFS 7.x: isi audit settings modify --cee-log-time "Protocol@2014-11-19 14:00:00"

    2. OneFS 8.X: isi audit settings global modify --cee-log-time "Protocol@2017-05-01 17:01:30"


How to Generate Support logs on ECA cluster Nodes


The support process does not change with Eyeglass and ECA clusters.  A step must be completed one time on the Eyeglass appliance to support ssh-less login from Eyeglass to each ECA node to collect support logs in the normal Eyeglass backup archive creation process.  Each ECA node also has a backup script to create a local log archive.


Steps to create an ECA log archive locally on each node


Use this process if the Eyeglass create full backup fails to collect logs from the ECA cluster.  This is a fail backlog create procedure and not required under normal conditions.

  1. Each ECA node has a script that can collect logs as well that can be run independently.

  2. /opt/superna/eca/scripts/log_bundle.sh

  3. This script can be run and outputs a location of the log zip file that can be copied with SCP from the node.

  4. Logs are placed in the /tmp/eca_logs

  5. Done

Eyeglass ECA Performance Tuning


The ECA cluster is mostly CPU intensive operation.

  1. If the average CPU utilization of the ECA cluster as measured from vCenter and averages 75% or greater,  it is recommended to increase the CPU limit applied by default on the ECA cluster.



Default ECA OVA cluster reservation provides 12000 MHZ shared across all the VM’s.

Screen Shot 2017-05-03 at 6.56.06 PM.png

vCenter ECA OVA CPU Performance Monitoring

  1. To determine if the ECA MHz limit should be increased.

  2. Using vCenter select the OVA cluster Performance tab

  3. Screen Shot 2017-05-03 at 7.22.00 PM.png

  4. As shown above the average Mhz usage is 221 well below the 12000 limit.  No change would be required until the average cpu MHZ shows 9000 MHz or greater.   The screenshot shows spikes in CPU but the average cpu is the statistic to use.

  5. To increase the limit follow procedure below.


vCenter OVA CPU limit Increase Procedure

  1. If it's determined an increase is required to new value, it is recommended to increase by 25% and monitor again.  Example 12000 * 25% = 3000 MHz

  2. Select the Resource Allocation tab on the OVA

  3. Screen Shot 2017-05-03 at 7.27.50 PM.png

  4. Change the Limit value from 12000 to 15000 to increase by 25% and click ok.

  5. Screen Shot 2017-05-03 at 7.29.12 PM.png

  6. Click ok to apply the settings.



How to check total ESX host MHZ Capacity


  1. Get the ESX host Summary tab CPU capacity

  2. Screen Shot 2017-05-03 at 7.36.29 PM.png

  3. 2.699 GHZ * 1000 = 2699 MHZ per core * 16 = 43,184 MHz of total capacity.  

  4. This host example is using 6136 MHz of the total 43,184 capacity so there is plenty of unused CPU capacity available on this host.





ECA CLI Command Guide


The following table outlines each cli command and purpose



CLI Command

Function

ecactl cluster <command>

UP -  Bring up the cluster across all nodes,

Down - Bring down the cluster across all nodes

Status -  gets status of all processes and connection to HDFS database

Refresh  - Use this command on an ECA node when it was restarted and needs to rejoin an existing cluster.

ecactl components upgrade self (on each node)

Upgrades cluster scripts on each node

ecactl components upgrade eca (on master node)

Upgrades containers on master node, backup nodes detect upgrade available on startup and auto update

ecactl logs --follow iglssvc (other services are rmq, fastanalysis, cee, ceefilter)

Tail the Eyeglass agent on a node, used for debugging.





Eyeglass CLI command for Ransomware



Use this command to enable SMB shutdown if root Isilon user is detected with ransomware behaviour.  Cluster wide shutdown of all SMB IO.  NOTE: Root user should never be used to access data since it can access all shares regardless of permissions set on the share.  This means no lockout is possible for root user using deny permissions.

Root user lockout behaviour

igls admin lockroot --lock_root

igls admin lockroot --lock_root true  (when set root user SID detected for Ransomware will disable cluster SMB)

igls admin lockroot -lock_root false  (to disable and take no action if root user has security event detected )



Security Guard CLI


The security guard UI allows changing the schedule of the interval it will run with.


Igls admin sched  (list schedules and get the ID)

igls admin sched set --id SecurityGuard --interval 15M (run security guard job every 15 M)



Analytics Database Export Tables to CSV


This option allows extraction of rows from the analytics database for support purposes or other requirements to view rows in the Analytics table.  This CLI command will execute a query against the Analytics database and convert to CSV file.


igls rswsignals dumpSignalTable --since='2017-05-01-08:01' --until='2017-05-02-09:30' --csv=sign

altable.csv --eca=ECA2

(note: --csv and eca are optional, --csv names the csv file and --eca selects which eca installation to query if multiple exist.  The eca name used on the configuration of the ECA cluster would be used.)


igls rswsignals -- supports 2 different tables to select from.


dumpUserTable -- stores user information based on SID

dumpSignalTable -- stores detected threat records by user SID and detector was triggered.



The output will be stored in the following path:  /srv/www/htdocs/archive/


Sample output. NOTE: fields are delimited with #


'Mon May 01 08:04:05 EDT 2017'#'matched a testing signature for Superna Eyeglass Security Guard'#'S-1-5-21-2947135865-3844123249-188779117-1133'#'[THREAT_DETECTOR_06]'

'Mon May 01 08:04:05 EDT 2017'#'matched a testing signature for Superna Eyeglass Security Guard'#'S-1-5-21-2947135865-3844123249-188779117-1133'#'[THREAT_DETECTOR_06]'

'Mon May 01 08:03:59 EDT 2017'#'matched a testing signature for Superna Eyeglass Security Guard'#'S-1-5-21-2947135865-3844123249-188779117-1133'#'[THREAT_DETECTOR_06]'




ECA Cluster Disaster Recovery and Failover Supported Configurations


This section covers how ECA cluster availability and failover.


Scenario #1 - Production Writeable cluster fails over to DR site

In this scenario refer to the diagram below with an ECA cluster monitoring clusters at Site 1. Clusters are replicating in Hot Cold configuration with Eyeglass.


Requirements

  1. Single Agent Ransomware license key floats to DR cluster


Overview of Failover

  1. Before Failover CEE audit data is sent from the Hot site.  

  2. Cold site is configured but not sending any events since no writeable data on this cluster and no agent license key

  3. After failover to Cold Site, Cold cluster will send audit events to the CEE cluster at Site 1.  The Site 1 ECA cluster will process events and if any security event is detected, it will be sent to the eyeglass appliance at the Cold site.

  4. Eyeglass will failover the ECA Cluster agent license to the writeable cluster automatically by detecting the Cold Cluster SyncIQ policies as enabled and writeable.  This can be confirmed by looking at the Ransomware Defender icon Statistics tab and look at the Licensed cluster section to verify the writeable Hot cluster is listed correctly.

  5. Screen Shot 2017-04-21 at 3.02.00 PM.png







Scenario #2 - Production Writeable cluster fails over to DR site and Site 1 ECA cluster impacted


In this scenario the ECA cluster site is impacted.

  1. To recover from this scenario it is necessary to deploy a new ECA cluster at the Cold site and treat this a new install to get configured.

  2. This would be needed in the event that the ECA cluster at the Hot site is down for a long period of time.



Data Recovery Manager Integration with Ransomware Defender


Overview

This feature integrates data recovery manager feature that is part of the Cluster Storage Monitor addon licensed product.   It allows a end user recovery of files that were compromised by a security event by triggering a data recovery job that is customized to the users shares that stored the compromised files.


How to launch Data Recovery Manager Request From Ransomware defender action menu


Prerequisites


  1. Cluster Storage Monitor license key

  2. For detailed configuration and setup of Data Recovery Management portal and integration requirements review the Cluster Storage Monitor admin guide.

Procedures

  1. Initiate File Recovery: From the actions menu of a security event select Self Recovery

    1. Screen Shot 2017-04-22 at 12.49.10 PM.png

  2. Complete Version Selection of share(s): When the Data Recovery Management window appears, it lists versions of the shares detected for this user.  The versions are based on snapshots and DR copies of the listed share on the local or remote cluster.  NOTE: You can select one or more share to add to the request.

    1. Screen Shot 2017-04-22 at 10.02.20 AM.png

  3.  After selecting the versions using the checkbox for each smartconnect name (NOTE: each smartconnect name and list of shares is a separate request, only select the shares that require data recovery requests).  Click the Request Access button.

    1. Screen Shot 2017-04-22 at 10.01.48 AM.png

    2. Enter the users AD login using syntax domain\userid (Reference the Security event UserID in the Ransomware Active Events tab).  

      1. Enter UPN AD login credentials

      2. Enter the user's email address

      3. Add a comment to be sent to the user’s email and Click Request.

  4. Monitor and Approve Pending Data Recovery Requests: The request has been submitted to the Data Recovery Management Pending Requests tab to be approved.  (NOTE: Role based access allows separate admin user to review and approve data recovery requests, consult Cluster Storage Monitor documentation)

    1. Screen Shot 2017-04-22 at 10.02.34 AM.png

    2. Approving Data Recovery: Click the Approve icon to have the request processed and generate temporary share secured to the User affected by the security event.   This will create a temporary on the selected version of the share(s) with a time to live of 2 days (default setting), and email the user the UNC path to access the recovery share(s).

    3. Screen Shot 2017-04-22 at 10.09.07 AM.png

  5. Temporary Share: The share created has a syntax of share name - UserID@domain name-#  (where # is the number of the share created for this request).

    1. Screen Shot 2017-04-22 at 10.06.34 AM.png

    2. You can see in the screenshot above the share is created on a snapshot path and secured to the user in the request.

  6. User Access: User can access the read-only version of data on the temporary share to retrieve files that were compromised by the security event.

    1. Screen Shot 2017-04-22 at 10.14.29 AM.png

  7. NOTE:  You can wait for expiry of the recovery to auto delete the shares or using the Data Recovery management icon Select the Requests History tab.

  8. Click the Alarm Clock Icon Screen Shot 2017-04-22 at 7.33.38 PM.png to complete the Recovery before the expiry.  This action will delete the temporary shares created for this ser recovery.

  9. Screen Shot 2017-04-22 at 7.31.45 PM.png

  10. Recovery Process Completed.


Troubleshooting ECA Configuration Issues

This section covers how to troubleshoot cluster or detection issues for testing.


Issue - no CEE Events are processed

  1. On each node in the cluster check that CEE is arriving at the cluster node

  2. ssh to each node and run the command ‘ecactl logs --follow cee’

  3. cee_1           | Apr 24 16:37:32 2017 [EMC CEE]: CTransport+::DispatchEvent(): Got a CEPP_HEARTBEAT request

  4. cee_1           | Apr 24 16:37:34 2017 [EMC CEE]: CTransport+::DispatchEvent(): Got a CEPP_HEARTBEAT request

  5. cee_1           | Apr 24 16:37:35 2017 [EMC CEE]: CTransport+::DispatchEvent(): Got a CEPP_HEARTBEAT request

  6. cee_1           | Apr 24 16:37:42 2017 [EMC CEE]: CTransport+::DispatchEvent(): Got a CEPP_HEARTBEAT request

  7. cee_1           | Apr 24 16:37:44 2017 [EMC CEE]: CTransport+::DispatchEvent(): Got a CEPP_HEARTBEAT request

  8. cee_1           | Apr 24 16:37:45 2017 [EMC CEE]: CTransport+::DispatchEvent(): Got a CEPP_HEARTBEAT request

  9. cee_1           | Apr 24 16:37:52 2017 [EMC CEE]: CTransport+::DispatchEvent(): Got a CEPP_HEARTBEAT request



Issue - Cluster startup can not find Analytics database on Isilon

  1. On cluster startup the below image indicates the analytics database

  2. Can also run the command

    1. ecactl cluster status

  3. Screen Shot 2017-02-22 at 1.54.37 PM.png


Issue - Events being dropped or not committed to Analytics database

Use this procedure to check the rabbitmq queues if events are not being processed fast enough.  A list of events that is growing indicates slow commit to analytics database. Monitor with these commands over a period of time.


  1. On ECA node login to the rabbitmq container

    1. ecactl containers exec rmq /bin/bash

    2. rabbitmqctl -p /eyeglass list_queues

    3. rabbitmqctl -p /eyeglass list_bindings

    4. rabbitmqctl -p /eyeglass list_queues  name messages messages_ready messages_unacknowledged

run the first two on the first time you login

run the last command periodically.

This will show you the depth of the queues for the different exchanges.



Issue - Monitoring Isilon Audit Event Rate Performance


Use this procedure to check each node in the clusters uncommitted audit event backlog that has not been sent to CEE end points.


  1. isi_for_array "isi_audit_progress -t protocol CEE_FWD"

    1. Output example


tme-sandbox-4# isi_for_array "isi_audit_progress -t protocol CEE_FWD"

tme-sandbox-4: Last consumed event time: '2016-10-21 19:59:54'

tme-sandbox-4: Last logged event time:   '2017-01-17 17:05:32'

tme-sandbox-6: Last consumed event time: '2016-10-21 19:59:54'

tme-sandbox-6: Last logged event time:   '2017-01-12 20:25:10'

tme-sandbox-5: Last consumed event time: '2016-07-22 18:26:25'

tme-sandbox-5: Last logged event time:   '2017-01-12 21:50:17'


The last logged and last consumed times match indicates no performance issues on the node.

If the last logged and last consumed is delayed earlier date and time stamp, then calculate the differences in minutes to determine the “Lag” between an event and when it is sent to CEE endpoints.

NOTE: A large lag indicates performance issues and also means detection of an event will be delayed by this lag time.