Eyeglass Clustered Agent Admin Guide

Eyeglass Clustered Agent Admin Guide

Administration Guide



Abstract:

This guide covers configuration, setup and monitoring of  ECA clusters .

August, 2017


.

Contents

  1. 1 Administration Guide
    1. 1.1 Abstract:
  2. 2 What's New
  3. 3 Chapter 1 - Introduction to this Guide
    1. 3.1 Overview
    2. 3.2 Eyeglass Clustered Agent Deployment Options
    3. 3.3 Supported Topologies
      1. 3.3.1 #1 Eyeglass topology --HOT COLD
      2. 3.3.2 #1A Eyeglass topology --HOT COLD - 3rd party auditing compatibility - External CEE Server
      3. 3.3.3 #2 Eyeglass topology --HOT COLD
      4. 3.3.4 #3 Eyeglass topology --HOT COLD
      5. 3.3.5 #4 Eyeglass topology --HOT HOT clusters
  4. 4 Ransomware Defender or Easy Auditor Eyeglass Clustered Agent (ECA) Architecture
  5. 5 High Availability and Resilience
    1. 5.1 Cluster Operational Requirements
    2. 5.2 Architectural Data flow of CEE events through the Eyeglass Clustered Agent
  6. 6 Message flow Inside ECA Node
  7. 7 Eyeglass User Lockout Active Directory Planning
    1. 7.1 Scenario #1:
    2. 7.2 Scenario #2:
  8. 8 Role Based Access
  9. 9 Remote Service Authentication and Protocol
    1. 9.1 Service Registration Monitoring in Eyeglass
      1. 9.1.1 Service States
      2. 9.1.2 Health States
  10. 10 ECA Cluster Operational Procedures
    1. 10.1 Eyeglass Cluster Maintenance Operations
      1. 10.1.1 Cluster OS shutdown or restart
      2. 10.1.2 Cluster Startup
      3. 10.1.3 Cluster IP Change
      4. 10.1.4 Single ECA Node Restart or Host crash Affect 1 or more ECA nodes
    2. 10.2 Eyeglass ECA Cluster Operations
      1. 10.2.1 Stop ECA Database
      2. 10.2.2 ecactl db stop
      3. 10.2.3 Start ECA Database
      4. 10.2.4 ecactl db start
    3. 10.3 Checking ECA database Status:
      1. 10.3.1 ecactl db shell
      2. 10.3.2 Bring Up the ECA Cluster:
      3. 10.3.3 ecactl cluster up
      4. 10.3.4 Bring down the ECA Cluster:
      5. 10.3.5 ecactl cluster down
      6. 10.3.6 Checking events and logs
      7. 10.3.7 Verifying events are being stored in the DB:
      8. 10.3.8 Tail the log of ransomware signal candidates: ecactl logs --follow iglssvc
    4. 10.4 CEE Event Rate Monitoring for Sizing ECA Cluster Compute
      1. 10.4.1 Statistics provided by OneFS for CEE Auditing
      2. 10.4.2 CEE Statistics ISI command examples
      3. 10.4.3 Monitor Cluster Event forwarding Delay to ECA cluster
      4. 10.4.4 Set the CEE Event forwarding start Date
    5. 10.5 How to Generate Support logs on ECA cluster Nodes
      1. 10.5.1 Steps to create an ECA log archive locally on each node
    6. 10.6 Eyeglass ECA Performance Tuning
      1. 10.6.1 vCenter ECA OVA CPU Performance Monitoring
      2. 10.6.2 vCenter OVA CPU limit Increase Procedure
      3. 10.6.3 How to check total ESX host MHZ Capacity
  11. 11 ECA CLI Command Guide
  12. 12 Eyeglass CLI command for ECA
    1. 12.1 Root user lockout behaviour
      1. 12.1.1 Security Guard CLI
      2. 12.1.2 Analytics Database Export Tables to CSV
  13. 13 ECA Cluster Disaster Recovery and Failover Supported Configurations
    1. 13.1 Scenario #1 - Production Writeable cluster fails over to DR site
    2. 13.2 Scenario #2 - Production Writeable cluster fails over to DR site and Site 1 ECA cluster impacted
  14. 14 Troubleshooting ECA Configuration Issues
    1. 14.1 Issue - no CEE Events are processed
      1. 14.1.1 Issue - Cluster startup can not find Analytics database on Isilon
      2. 14.1.2 Issue - Events being dropped or not committed to Analytics database
      3. 14.1.3 Issue - Monitoring Isilon Audit Event Rate Performance



What's New



Chapter 1 - Introduction to this Guide

Overview













Eyeglass Clustered Agent Deployment Options


It's best practice to place ECA clusters near Isilon clusters to reduce latency between the cluster and the ECA instance that is processing the audit log.  


The ECA is 3 VM’s that all receive CEE audit log events from the cluster as three separate end points. The supported configurations are outlined below.

Supported Topologies

Blue lines = Service broker communication heartbeat 23457

Orange Lines = Isilon REST API over TLS 8080 and SSH

Green lines = CEE messages from the cluster port 12228

Red lines = SyncIQ replication

#1 Eyeglass topology --HOT COLD




#1A Eyeglass topology --HOT COLD - 3rd party auditing compatibility - External CEE Server


See Section in install guide on configuring external CEE server to work with ECA cluster


#2 Eyeglass topology --HOT COLD



#3 Eyeglass topology --HOT COLD

#4 Eyeglass topology --HOT HOT clusters




Ransomware Defender or Easy Auditor Eyeglass Clustered Agent (ECA) Architecture

The flow of information through the ECA is shown in the picture below:




High Availability and Resilience


The ECA cluster is an active active design that offers Matrix Processing of events. This design uses dedicated docker containers that perform a specific function on each node. The solution allows for multiple container failures within a node and between nodes.

The solution allows a distribution of event processing at the functional container level on any of the nodes in the cluster.  This allows greater than a single point of failover within a node and between nodes.  This ensures processing contains under most common conditions with greater than 2x HA level of redundancy.



Cluster Operational Requirements


The platform is a robust high performance event processing cluster for threat and audit detection capabilities. The cluster will remain operational as long as 2 of the 3 nodes are running and can reach the HDFS cluster database.

Architectural Data flow of CEE events through the Eyeglass Clustered Agent

How the ECA processes incoming events, should be understood when debugging


  1. The ECA cluster is an active active active solution which means all nodes process and analyze CEE data from the cluster.   

  2. The cluster load balances CEE messages to each node in the cluster.

  3. Each node has CEE listener container, fastanalysis container, Eyeglass service container, CEE filter container, DB container.

  4. Each user in AD  hashed and assigned to one node in the cluster so single user behaviour patterns can be processed by a single node in the cluster.

  5. If a node goes down another node takes over the active directory user processing for the failed node .







Message flow Inside ECA Node





Eyeglass User Lockout Active Directory Planning

The lockout process identifies all shares the user has access permissions based on searching all shares in all access zones on all clusters managed by Eyeglass.  This list of shares will have a real-time deny permission added to the share for the affected user.

A special case is handled for the “Everyone” well known group which should be understood how it operates in multi-domain Active Directory configurations.

Two scenarios can exist with AD domains on Isilon clusters.  


Scenario #1:

  • The first is parent and child AD domains that are members of the same forest and a trust relationship exists.

Scenario #2:

  • The second scenario covers two domains that are not members of the same forest and no trust relationship exists between the domains


The “Everyone” well known group if applied to a share in each scenario is shown below and a lockout permission applied regardless of which domain the user is located.  This is required since Eyeglass has no way to know if the domains trust each other or not.  This solution ensures all everyone shares are locked out, which is more secure than skipping some shares.

Reference the diagram below.








Role Based Access

  1. To create a dedicated role to perform administration and monitoring for Ransomware Defender or Easy Auditor events

  2. Open User Roles Icon

  3. Assign the Ransomware Defender or Easy Auditor role to a user or group

Screen Shot 2017-03-30 at 9.00.36 PM.png


Remote Service Authentication and Protocol

Eyeglass can communicate with multiple Ransomware Defender or Easy Auditor endpoints. Each endpoint must have a unique API token, generated by the Superna Eyeglass REST API window:


Once a token has been generated for a specific ECA, it can be used in that ECA’s startup command for authentication, along with the location of Eyeglass.


Communication with the ECA is bidirectional at the start (ECA -> Eyeglass for security events).  Eyeglass will query the analytics database and test database access on regular interval.   


The ECA should:

  1. Heartbeat

  2. Notify Eyeglass of any detected threats

  3. Periodically send statistics on processed events.

  4. Periodically poll for updated Ransomware definitions, thresholds, and Ignore list settings.




Service Registration Monitoring in Eyeglass


Eyeglass icon “Manage Services” displays all registered ECA’s and CA UIM probes operating remotely from the Eyeglass appliance.  The screenshot below shows 3 ECA nodes registered and the health of each process running inside the node.



Service States

  1. Active:  Has checked in with heartbeat

  2. In-Active: Has failed to heartbeat, no longer processing

Health States

  1. Up - running and up time in days

  2. Down - not running


The Delete icon per service registration should not be used unless directed by support. This will remove the registration from the remote service.


ECA Cluster Operational Procedures


Eyeglass Cluster Maintenance Operations


Note:  Restart of the OS will not auto start up the cluster post boot.  Follow steps in this section for cluster OS shutdown, restart and boot process.

Cluster OS shutdown or restart


  1. To correctly shutdown the cluster

  2. Login as admin via ssh on the master node (Node 1)

  3. ecactl cluster down (wait until all nodes are down)

  4. Now shutdown the OS nodes from ssh login to each node

  5. ssh to each node

  6. Type sudo -s (enter admin password)

  7. Type shutdown


Cluster Startup

  1. ssh to the master node (node 1)

  2. Login as admin user

  3. ecactl cluster up

  4. Verify boot messages shows user tables exist and signal table exists (this step verifies connection to analytics database over HDFS on startup)

  5. Verify cluster is up

  6. ecactl cluster status (verify containers and table exist in the output)

  7. Done.


Cluster IP Change


To correctly change  the cluster node ip addresses:

  1. Login as admin via ssh on the master node (Node 1)

  2. ecactl cluster down (wait until completely down)

  3. Sudo to root

    1. sudo -s (enter admin password)

    2. Type yast

    3. Navigate to networking to change the IP address on the interface)

Screen Shot 2017-04-25 at 4.50.15 PM.png


Screen Shot 2017-04-25 at 4.50.35 PM.png

Screen Shot 2017-04-25 at 4.50.44 PM.png

Screen Shot 2017-04-25 at 4.51.16 PM.png

Screen Shot 2017-04-25 at 4.51.24 PM.png

    1. Each screenshot shows ip, dns, router settings

    2. Save and exit yast

    3. Repeat on all nodes in the cluster

  1. Once completed changes verify network connectivity with ping and DNS nslookup

  2. Edit  with ‘ nano /opt/superna/eca/eca-env-common.conf ’ on the master node (Node 1)

    1. Edit the ip addresses of each node to match new new settings

    2. export ECA_LOCATION_NODE_1=x.x.x.x

    3. export ECA_LOCATION_NODE_2=x.x.x.x2

    4. export ECA_LOCATION_NODE_3=x.x.x.x3

    5. Control X to exit and save

  3. Start cluster up

    1. From master node (Node 1)

    2. ecactl cluster up (verify boot messages look as expected)

  4. Eyeglass /etc/hosts file validation

    1. Once the ECA cluster is up

    2. Login to Eyeglass as admin via ssh

    3. Type cat /etc/hosts

    4. Verify the new ip address assigned to the ECA cluster is present in the hosts file.

    5. If it is not correct edit the hosts file and correct the IP addresses for each node.

  5. Login to Eyeglass and open the Manage Services window.  Verify active ECA nodes are detected as Active and Green.

  6. Done


Single ECA Node Restart or Host crash Affect 1 or more ECA nodes


Use this procedure when restarting one ECA node, which under normal conditions should not be done unless directed by support.  The other use case is when a host running an ECA VM is restarted for maintenance and a node will leave the cluster and needs to rejoin.



  1. On the master node

  2. Login via ssh as admin

  3. Type command :  ecactl cluster refresh  (this command will re-integrate this node back into the cluster and check access to database tables on all nodes)

  4. Verify output

  5. Now type: ecactl db shell

  6. type : status

  7. Verify no dead servers are listed

  8. If no dead servers

  9. Login to Eyeglass GUI, check Managed Services and verify all nodes are green.

  10. Cluster node integration procedure completed.

Eyeglass ECA Cluster Operations


Stop ECA Database


  • ecactl db stop

eca-node01:~ # ecactl db stop

Warning: Permanently added '[localhost]:2200' (ECDSA) to the list of known hosts.

stopping hbase...................

db_node_1.eca_superna_local: Warning: Permanently added '[db_node_1.eca_superna_local]:2200,[172.18.0.2]:2200' (ECDSA) to the list of known hosts.

db_node_1.eca_superna_local: stopping zookeeper.

db_node_2.eca_superna_local: Warning: Permanently added '[db_node_2.eca_superna_local]:2200,[172.22.1.19]:2200' (ECDSA) to the list of known hosts.

db_node_2.eca_superna_local: stopping zookeeper.

db_node_3.eca_superna_local: Warning: Permanently added '[db_node_3.eca_superna_local]:2200,[172.22.1.20]:2200' (ECDSA) to the list of known hosts.

db_node_3.eca_superna_local: stopping zookeeper.

eca-node01:~ #





Start ECA Database

  • ecactl db start

eca-node01:~ # ecactl db start

Warning: Permanently added '[localhost]:2200' (ECDSA) to the list of known hosts.

db_node_3.eca_superna_local: Warning: Permanently added '[db_node_3.eca_superna_local]:2200,[172.22.1.20]:2200' (ECDSA) to the list of known hosts.

db_node_3.eca_superna_local: starting zookeeper, logging to /opt/hbase/bin/../logs/hbase-root-zookeeper-db_node_3.eca_superna_local.out

db_node_2.eca_superna_local: Warning: Permanently added '[db_node_2.eca_superna_local]:2200,[172.22.1.19]:2200' (ECDSA) to the list of known hosts.

db_node_2.eca_superna_local: starting zookeeper, logging to /opt/hbase/bin/../logs/hbase-root-zookeeper-db_node_2.eca_superna_local.out

db_node_1.eca_superna_local: Warning: Permanently added '[db_node_1.eca_superna_local]:2200,[172.18.0.2]:2200' (ECDSA) to the list of known hosts.

db_node_1.eca_superna_local: starting zookeeper, logging to /opt/hbase/bin/../logs/hbase-root-zookeeper-db_node_1.eca_superna_local.out

starting master, logging to /opt/hbase/bin/../logs/hbase-root-master-db_node_1.eca_superna_local.out

db_node_3.eca_superna_local: Warning: Permanently added '[db_node_3.eca_superna_local]:2200,[172.22.1.20]:2200' (ECDSA) to the list of known hosts.

db_node_3.eca_superna_local: starting regionserver, logging to /opt/hbase/bin/../logs/hbase-root-regionserver-db_node_3.eca_superna_local.out

db_node_2.eca_superna_local: Warning: Permanently added '[db_node_2.eca_superna_local]:2200,[172.22.1.19]:2200' (ECDSA) to the list of known hosts.

db_node_2.eca_superna_local: starting regionserver, logging to /opt/hbase/bin/../logs/hbase-root-regionserver-db_node_2.eca_superna_local.out

db_node_1.eca_superna_local: Warning: Permanently added '[db_node_1.eca_superna_local]:2200,[172.18.0.2]:2200' (ECDSA) to the list of known hosts.

db_node_1.eca_superna_local: starting regionserver, logging to /opt/hbase/bin/../logs/hbase-root-regionserver-db_node_1.eca_superna_local.out

db_node_3.eca_superna_local: Warning: Permanently added '[db_node_3.eca_superna_local]:2200,[172.22.1.20]:2200' (ECDSA) to the list of known hosts.

db_node_3.eca_superna_local: starting master, logging to /opt/hbase/bin/../logs/hbase-root-master-db_node_3.eca_superna_local.out

db_node_2.eca_superna_local: Warning: Permanently added '[db_node_2.eca_superna_local]:2200,[172.22.1.19]:2200' (ECDSA) to the list of known hosts.

db_node_2.eca_superna_local: starting master, logging to /opt/hbase/bin/../logs/hbase-root-master-db_node_2.eca_superna_local.out

eca-node01:~ #



Checking ECA database Status:


  • ecactl db shell

2017-01-24 21:29:39,191 WARN  [main] util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

HBase Shell; enter 'help<RETURN>' for list of supported commands.

Type "exit<RETURN>" to leave the HBase Shell

Version 1.2.4, r67592f3d062743907f8c5ae00dbbe1ae4f69e5af, Tue Oct 25 18:10:20 CDT 2016


hbase(main):001:0> status

1 active master, 2 backup masters, 3 servers, 0 dead, 1.6667 average load


hbase(main):002:0>



hbase(main):002:0> status 'detailed'

version 1.2.4

0 regionsInTransition

active master:  db_node_1.eca_superna_local:16000 1485293148048

2 backup masters

   db_node_2.eca_superna_local:16000 1485293150704

   db_node_3.eca_superna_local:16000 1485293150606

master coprocessors: []

3 live servers

   db_node_2.eca_superna_local:16020 1485293149046

       requestsPerSecond=0.0, numberOfOnlineRegions=1, usedHeapMB=20, maxHeapMB=955, numberOfStores=2, numberOfStorefiles=2, storefileUncompressedSizeMB=0, storefileSizeMB=0, memstoreSizeMB=0, storefileIndexSizeMB=0, readRequestsCount=0, writeRequestsCount=0, rootIndexSizeKB=0, totalStaticIndexSizeKB=0, totalStaticBloomSizeKB=0, totalCompactingKVs=0, currentCompactedKVs=0, compactionProgressPct=NaN, coprocessors=[]

       "user,,1485281658079.c6d7bca7da467b5a0d0ebee99c1811f9."

           numberOfStores=2, numberOfStorefiles=2, storefileUncompressedSizeMB=0, lastMajorCompactionTimestamp=0, storefileSizeMB=0, memstoreSizeMB=0, storefileIndexSizeMB=0, readRequestsCount=0, writeRequestsCount=0, rootIndexSizeKB=0, totalStaticIndexSizeKB=0, totalStaticBloomSizeKB=0, totalCompactingKVs=0, currentCompactedKVs=0, compactionProgressPct=NaN, completeSequenceId=-1, dataLocality=0.0

   db_node_3.eca_superna_local:16020 1485293149077

       requestsPerSecond=0.0, numberOfOnlineRegions=2, usedHeapMB=26, maxHeapMB=955, numberOfStores=3, numberOfStorefiles=3, storefileUncompressedSizeMB=0, storefileSizeMB=0, memstoreSizeMB=0, storefileIndexSizeMB=0, readRequestsCount=16, writeRequestsCount=4, rootIndexSizeKB=0, totalStaticIndexSizeKB=0, totalStaticBloomSizeKB=0, totalCompactingKVs=0, currentCompactedKVs=0, compactionProgressPct=NaN, coprocessors=[MultiRowMutationEndpoint]

       "file,,1485281646872.c6968096ee56fd2214675f939f06e470."

           numberOfStores=2, numberOfStorefiles=2, storefileUncompressedSizeMB=0, lastMajorCompactionTimestamp=0, storefileSizeMB=0, memstoreSizeMB=0, storefileIndexSizeMB=0, readRequestsCount=0, writeRequestsCount=0, rootIndexSizeKB=0, totalStaticIndexSizeKB=0, totalStaticBloomSizeKB=0, totalCompactingKVs=0, currentCompactedKVs=0, compactionProgressPct=NaN, completeSequenceId=-1, dataLocality=0.0

       "hbase:meta,,1"

           numberOfStores=1, numberOfStorefiles=1, storefileUncompressedSizeMB=0, lastMajorCompactionTimestamp=0, storefileSizeMB=0, memstoreSizeMB=0, storefileIndexSizeMB=0, readRequestsCount=16, writeRequestsCount=4, rootIndexSizeKB=0, totalStaticIndexSizeKB=0, totalStaticBloomSizeKB=0, totalCompactingKVs=0, currentCompactedKVs=0, compactionProgressPct=NaN, completeSequenceId=-1, dataLocality=0.0

   db_node_1.eca_superna_local:16020 1485293149663

       requestsPerSecond=0.0, numberOfOnlineRegions=2, usedHeapMB=22, maxHeapMB=955, numberOfStores=3, numberOfStorefiles=3, storefileUncompressedSizeMB=0, storefileSizeMB=0, memstoreSizeMB=0, storefileIndexSizeMB=0, readRequestsCount=4, writeRequestsCount=0, rootIndexSizeKB=0, totalStaticIndexSizeKB=0, totalStaticBloomSizeKB=0, totalCompactingKVs=0, currentCompactedKVs=0, compactionProgressPct=NaN, coprocessors=[]

       "hbase:namespace,,1485281642137.d63ebb0b372206aa32b632867b4fcc56."

           numberOfStores=1, numberOfStorefiles=1, storefileUncompressedSizeMB=0, lastMajorCompactionTimestamp=0, storefileSizeMB=0, memstoreSizeMB=0, storefileIndexSizeMB=0, readRequestsCount=4, writeRequestsCount=0, rootIndexSizeKB=0, totalStaticIndexSizeKB=0, totalStaticBloomSizeKB=0, totalCompactingKVs=0, currentCompactedKVs=0, compactionProgressPct=NaN, completeSequenceId=-1, dataLocality=0.0

       "signal,,1485281668216.a4bbcac83e4857b17789f374c69fee92."

           numberOfStores=2, numberOfStorefiles=2, storefileUncompressedSizeMB=0, lastMajorCompactionTimestamp=0, storefileSizeMB=0, memstoreSizeMB=0, storefileIndexSizeMB=0, readRequestsCount=0, writeRequestsCount=0, rootIndexSizeKB=0, totalStaticIndexSizeKB=0, totalStaticBloomSizeKB=0, totalCompactingKVs=0, currentCompactedKVs=0, compactionProgressPct=NaN, completeSequenceId=-1, dataLocality=0.0

0 dead servers


hbase(main):003:0>




Bring Up the ECA Cluster:


  • ecactl cluster up

eca-node01:~ # ecactl cluster up

Starting services on all cluster nodes.


Starting service containers on node: 172.22.1.18

Creating network "eca_superna_local" with driver "bridge"

Creating db_node_1

Creating eca_rmq_1

executing: docker-compose -f /opt/superna/eca/docker-compose.yml up -d rmq db


Starting service containers on node: 172.22.1.19

Creating network "eca_superna_local" with driver "bridge"

Creating db_node_2

Creating eca_rmq_1

executing: docker-compose -f /opt/superna/eca/docker-compose.yml up -d rmq db


Starting service containers on node: 172.22.1.20

Creating network "eca_superna_local" with driver "bridge"

Creating db_node_3

Creating eca_rmq_1

executing: docker-compose -f /opt/superna/eca/docker-compose.yml up -d rmq db


Starting database service on node 172.22.1.18

Warning: Permanently added '[localhost]:2200' (ECDSA) to the list of known hosts.

db_node_1.eca_superna_local: Warning: Permanently added '[db_node_1.eca_superna_local]:2200,[172.18.0.2]:2200' (ECDSA) to the list of known hosts.

db_node_1.eca_superna_local: starting zookeeper, logging to /opt/hbase/bin/../logs/hbase-root-zookeeper-db_node_1.eca_superna_local.out

db_node_3.eca_superna_local: Warning: Permanently added '[db_node_3.eca_superna_local]:2200,[172.22.1.20]:2200' (ECDSA) to the list of known hosts.

db_node_3.eca_superna_local: starting zookeeper, logging to /opt/hbase/bin/../logs/hbase-root-zookeeper-db_node_3.eca_superna_local.out

db_node_2.eca_superna_local: Warning: Permanently added '[db_node_2.eca_superna_local]:2200,[172.22.1.19]:2200' (ECDSA) to the list of known hosts.

db_node_2.eca_superna_local: starting zookeeper, logging to /opt/hbase/bin/../logs/hbase-root-zookeeper-db_node_2.eca_superna_local.out

starting master, logging to /opt/hbase/bin/../logs/hbase-root-master-db_node_1.eca_superna_local.out

db_node_3.eca_superna_local: Warning: Permanently added '[db_node_3.eca_superna_local]:2200,[172.22.1.20]:2200' (ECDSA) to the list of known hosts.

db_node_3.eca_superna_local: starting regionserver, logging to /opt/hbase/bin/../logs/hbase-root-regionserver-db_node_3.eca_superna_local.out

db_node_2.eca_superna_local: Warning: Permanently added '[db_node_2.eca_superna_local]:2200,[172.22.1.19]:2200' (ECDSA) to the list of known hosts.

db_node_2.eca_superna_local: starting regionserver, logging to /opt/hbase/bin/../logs/hbase-root-regionserver-db_node_2.eca_superna_local.out

db_node_1.eca_superna_local: Warning: Permanently added '[db_node_1.eca_superna_local]:2200,[172.18.0.2]:2200' (ECDSA) to the list of known hosts.

db_node_1.eca_superna_local: starting regionserver, logging to /opt/hbase/bin/../logs/hbase-root-regionserver-db_node_1.eca_superna_local.out

db_node_3.eca_superna_local: Warning: Permanently added '[db_node_3.eca_superna_local]:2200,[172.22.1.20]:2200' (ECDSA) to the list of known hosts.

db_node_3.eca_superna_local: starting master, logging to /opt/hbase/bin/../logs/hbase-root-master-db_node_3.eca_superna_local.out

db_node_2.eca_superna_local: Warning: Permanently added '[db_node_2.eca_superna_local]:2200,[172.22.1.19]:2200' (ECDSA) to the list of known hosts.

db_node_2.eca_superna_local: starting master, logging to /opt/hbase/bin/../logs/hbase-root-master-db_node_2.eca_superna_local.out

Connection to 172.22.1.18 closed.


Checking for the existence of a schema in the db...


2017-01-24 21:35:06,091 WARN  [main] util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

file table already exists


Checking for the user table...

2017-01-24 21:35:18,118 WARN  [main] util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

user table already exists


Checking for the signal table...

2017-01-24 21:35:22,786 WARN  [main] util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

signal table already exists


Initializing application


Starting application on node 172.22.1.18

db_node_1 is up-to-date

eca_rmq_1 is up-to-date

Creating eca_iglssvc_1

Creating eca_ceefilter_1

Creating eca_cee_1

Creating eca_fastanalysis_1

executing: docker-compose -f /opt/superna/eca/docker-compose.yml up -d


Starting application on node 172.22.1.19

db_node_2 is up-to-date

eca_rmq_1 is up-to-date

Creating eca_fastanalysis_1

Creating eca_cee_1

Creating eca_ceefilter_1

Creating eca_iglssvc_1

executing: docker-compose -f /opt/superna/eca/docker-compose.yml up -d


Starting application on node 172.22.1.20

db_node_3 is up-to-date

eca_rmq_1 is up-to-date

Creating eca_ceefilter_1

Creating eca_cee_1

Creating eca_fastanalysis_1

Creating eca_iglssvc_1

executing: docker-compose -f /opt/superna/eca/docker-compose.yml up -d


eca-node01:~ #



Bring down the ECA Cluster:

  • ecactl cluster down


eca-node01:~ # ecactl cluster down

Stopping services on all cluster nodes.


Stopping service containers on node: 172.22.1.18

Stopping eca_fastanalysis_1 ... done

Stopping eca_iglssvc_1 ... done

Stopping eca_cee_1 ... done

Stopping eca_ceefilter_1 ... done

Stopping eca_rmq_1 ... done

Stopping db_node_1 ... done

Removing eca_fastanalysis_1 ... done

Removing eca_iglssvc_1 ... done

Removing eca_cee_1 ... done

Removing eca_ceefilter_1 ... done

Removing eca_rmq_1 ... done

Removing db_node_1 ... done

Removing network eca_superna_local

executing: docker-compose -f /opt/superna/eca/docker-compose.yml down



Stopping service containers on node: 172.22.1.19

Stopping eca_iglssvc_1 ... done

Stopping eca_ceefilter_1 ... done

Stopping eca_cee_1 ... done

Stopping eca_fastanalysis_1 ... done

Stopping eca_rmq_1 ... done

Stopping db_node_2 ... done

Removing eca_iglssvc_1 ... done

Removing eca_ceefilter_1 ... done

Removing eca_cee_1 ... done

Removing eca_fastanalysis_1 ... done

Removing eca_rmq_1 ... done

Removing db_node_2 ... done

Removing network eca_superna_local

executing: docker-compose -f /opt/superna/eca/docker-compose.yml down



Stopping service containers on node: 172.22.1.20

Stopping eca_iglssvc_1 ... done

Stopping eca_cee_1 ... done

Stopping eca_fastanalysis_1 ... done

Stopping eca_ceefilter_1 ... done

Stopping eca_rmq_1 ... done

Stopping db_node_3 ... done

Removing eca_iglssvc_1 ... done

Removing eca_cee_1 ... done

Removing eca_fastanalysis_1 ... done

Removing eca_ceefilter_1 ... done

Removing eca_rmq_1 ... done

Removing db_node_3 ... done

Removing network eca_superna_local

executing: docker-compose -f /opt/superna/eca/docker-compose.yml down


eca-node01:~ #




Checking events and logs

  • Verifying events are being stored in the DB:

seca-node01:~ # ecactl db shell

2017-01-24 21:39:58,157 WARN  [main] util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

HBase Shell; enter 'help<RETURN>' for list of supported commands.

Type "exit<RETURN>" to leave the HBase Shell

Version 1.2.4, r67592f3d062743907f8c5ae00dbbe1ae4f69e5af, Tue Oct 25 18:10:20 CDT 2016


hbase(main):001:0> count 'file'

145 row(s) in 0.4820 seconds


=> 145

hbase(main):002:0> count 'signal'

3 row(s) in 0.0260 seconds


=> 3

hbase(main):003:0> count 'user'

145 row(s) in 0.0950 seconds


=> 145

hbase(main):004:0>



  • Tail the log of ransomware signal candidates: ecactl logs --follow iglssvc

eca-node01:~ # ecactl logs --follow iglssvc

Attaching to eca_iglssvc_1

iglssvc_1       | 2017-01-24 21:35:30,068 IglsSvcAnalysisModule:63 INFO : Service base module starting up

iglssvc_1       | 2017-01-24 21:35:31,092 IglsSvcAnalysisModule:78 INFO : Service base module initialized




CEE Event Rate Monitoring for Sizing ECA Cluster Compute


To assist with managing event rates processed by the ECA.  OneFS provides statistics that assist with measuring performance.  


Run this commands to measure audit event rate and use the installation guide to assist with choosing a single or multi VMware host configuration based on the event rate.


Statistics provided by OneFS for CEE Auditing


These stats can be used with examples below to get stats from the cluster.

Node.audit.cee.export.rate (key state for ECA sizing)

Node.audit.cee.export.total (debug event to ECA message flow)

Node.audit.events.logged.rate (not needed for ECA sizing but indicates cluster logging rate)

Node.audit.events.logged.total (not needed for ECA sizing but indicates cluster logging rate)



CEE Statistics ISI command examples

  1. (onefs 8) isi statistics query current --stats=node.audit.cee.export.rate --nodes=all  (get event rate on all nodes)

  2. (onefs 7) isi statistics query --stats=node.audit.cee.export.rate --nodes=all  (get event rate on all nodes)

  3. (onefs 8) isi statistics query current --stats=node.audit.cee.export.total --nodes=all (get events total on all nodes)

  4. (onefs 7) isi statistics query --stats=node.audit.cee.export.total --nodes=all (get events total on all nodes)


Monitor Cluster Event forwarding Delay to ECA cluster


  1. isi audit progress global view (command allows one to see the oldest unsent protocol audit event for the cluster)  A Large gap in time between the logged event and the sent event time stamp indicates the cluster forwarding rate is not keeping up with the rate of audit events.  EMC SR should be opened if this condition persists.

  2. Command to run :

    1. isi_for_array isi_audit_progress -t protocol CEE_FWD

  3. Sample output:

    1. prod-cluster-8-1: Last consumed event time: '2017-08-23 18:48:22'

    2. prod-cluster-8-1: Last logged event time:   '2017-08-23 18:48:22'



Set the CEE Event forwarding start Date


This command can be used to set a current forwarding if auditing has been enabled without a CEE endpoint configured.   Only use if directed by Support.

  1. The following will update the pointer to forward events newer than Nov 19, 2014 at 2pm

    1. OneFS 7.x: isi audit settings modify --cee-log-time "Protocol@2014-11-19 14:00:00"

    2. OneFS 8.X: isi audit settings global modify --cee-log-time "Protocol@2017-05-01 17:01:30"


How to Generate Support logs on ECA cluster Nodes


The support process does not change with Eyeglass and ECA clusters.  A step must be completed one time on the Eyeglass appliance to support ssh-less login from Eyeglass to each ECA node to collect support logs in the normal Eyeglass backup archive creation process.  Each ECA node also has a backup script to create a local log archive.


Steps to create an ECA log archive locally on each node


Use this process if the Eyeglass created full backup fails to collect logs from the ECA cluster.  This is a fail backlog create procedure and not required under normal conditions.

  1. Each ECA node has a script that can collect logs as well that can be run independently.

  2. /opt/superna/eca/scripts/log_bundle.sh

  3. This script can be run and outputs a location of the log zip file that can be copied with SCP from the node.

  4. Logs are placed in the /tmp/eca_logs

  5. Done

Eyeglass ECA Performance Tuning


The ECA cluster is mostly CPU intensive operation.

  1. If the average CPU utilization of the ECA cluster as measured from vCenter and averages 75% or greater,  it is recommended to increase the CPU limit applied by default on the ECA cluster.



Default ECA OVA cluster reservation provides 12000 MHZ shared across all the VM’s.

Screen Shot 2017-05-03 at 6.56.06 PM.png

vCenter ECA OVA CPU Performance Monitoring

  1. To determine if the ECA MHz limit should be increased.

  2. Using vCenter select the OVA cluster Performance tab

Screen Shot 2017-05-03 at 7.22.00 PM.png

  1. As shown above the average Mhz usage is 221 well below the 12000 limit.  No change would be required until the average cpu MHZ shows 9000 MHz or greater.   The screenshot shows spikes in CPU but the average cpu is the statistic to use.

  2. To increase the limit follow procedure below.


vCenter OVA CPU limit Increase Procedure

  1. If it's determined an increase is required to new value, it is recommended to increase by 25% and monitor again.  Example 12000 * 25% = 3000 MHz

  2. Select the Resource Allocation tab on the OVA

Screen Shot 2017-05-03 at 7.27.50 PM.png

  1. Change the Limit value from 12000 to 15000 to increase by 25% and click ok.

Screen Shot 2017-05-03 at 7.29.12 PM.png

  1. Click ok to apply the settings.



How to check total ESX host MHZ Capacity


  1. Get the ESX host Summary tab CPU capacity

Screen Shot 2017-05-03 at 7.36.29 PM.png

  1. 2.699 GHZ * 1000 = 2699 MHZ per core * 16 = 43,184 MHz of total capacity.  

  2. This host example is using 6136 MHz of the total 43,184 capacity so there is plenty of unused CPU capacity available on this host.





ECA CLI Command Guide


The following table outlines each CLI command and purpose



CLI Command

Function

ecactl cluster <command>

UP -  Bring up the cluster across all nodes,

Down - Bring down the cluster across all nodes

Status -  gets status of all processes and connection to HDFS database

Refresh  - Use this command on an ECA node when it was restarted and needs to rejoin an existing cluster.

ecactl components upgrade self (on each node)

Upgrades cluster scripts on each node

ecactl components upgrade eca (on master node)

Upgrades containers on master node, backup nodes detect upgrade available on startup and auto update

ecactl logs --follow iglssvc (other services are rmq, fastanalysis, cee, ceefilter)

Tail the Eyeglass agent on a node, used for debugging.



Eyeglass CLI command for ECA



Use this command to enable SMB shutdown if root Isilon user is detected with ransomware behaviour.  Cluster wide shutdown of all SMB IO.  NOTE: Root user should never be used to access data since it can access all shares regardless of permissions set on the share.  This means no lockout is possible for root user using deny permissions.

Root user lockout behaviour

igls admin lockroot --lock_root

igls admin lockroot --lock_root true  (when set root user SID detected for Ransomware will disable cluster SMB)

igls admin lockroot -lock_root false  (to disable and take no action if root user has security event detected )



Security Guard CLI


The security guard UI allows changing the schedule of the interval it will run with.


Igls admin sched  (list schedules and get the ID)

igls admin sched set --id SecurityGuard --interval 15M (run security guard job every 15 M)



Analytics Database Export Tables to CSV


This option allows extraction of rows from the analytics database for support purposes or other requirements to view rows in the Analytics table.  This CLI command will execute a query against the Analytics database and convert to CSV file.


igls rswsignals dumpSignalTable --since='2017-05-01-08:01' --until='2017-05-02-09:30' --csv=sign

altable.csv --eca=ECA2

(note: --CSV and ECA are optional, --CSV names the CSV file and --ECA selects which ECA installation to query if multiple exist.  The ECA name used on the configuration of the ECA cluster would be used.)


igls rswsignals -- supports 2 different tables to select from.


dumpUserTable -- stores user information based on SID

dumpSignalTable -- stores detected threat records by user SID and detector was triggered.



The output will be stored in the following path:  /srv/www/htdocs/archive/


Sample output. NOTE: fields are delimited with #


'Mon May 01 08:04:05 EDT 2017'#'matched a testing signature for Superna Eyeglass Security Guard'#'S-1-5-21-2947135865-3844123249-188779117-1133'#'[THREAT_DETECTOR_06]'

'Mon May 01 08:04:05 EDT 2017'#'matched a testing signature for Superna Eyeglass Security Guard'#'S-1-5-21-2947135865-3844123249-188779117-1133'#'[THREAT_DETECTOR_06]'

'Mon May 01 08:03:59 EDT 2017'#'matched a testing signature for Superna Eyeglass Security Guard'#'S-1-5-21-2947135865-3844123249-188779117-1133'#'[THREAT_DETECTOR_06]'




ECA Cluster Disaster Recovery and Failover Supported Configurations


This section covers how ECA cluster availability and failover.


Scenario #1 - Production Writeable cluster fails over to DR site

In this scenario refer to the diagram below with an ECA cluster monitoring clusters at Site 1. Clusters are replicating in Hot Cold configuration with Eyeglass.


Requirements

  1. Single Agent license key floats to DR cluster


Overview of Failover

  1. Before Failover CEE audit data is sent from the Hot site.  

  2. Cold site is configured but not sending any events since no writeable data on this cluster and no agent license key

  3. After failover to Cold Site, Cold cluster will send audit events to the CEE cluster at Site 1.  The Site 1 ECA cluster will process events and if any security event is detected, it will be sent to the eyeglass appliance at the Cold site.

  4. Eyeglass will failover the ECA Cluster agent license to the writeable cluster automatically by detecting the Cold Cluster SyncIQ policies as enabled and writeable.  This can be confirmed by looking at the Ransomware Defender or Easy Auditor icon Statistics tab and look at the Licensed cluster section to verify the writeable Hot cluster is listed correctly.

Screen Shot 2017-04-21 at 3.02.00 PM.png







Scenario #2 - Production Writeable cluster fails over to DR site and Site 1 ECA cluster impacted


In this scenario the ECA cluster site is impacted.

  1. To recover from this scenario it is necessary to deploy a new ECA cluster at the Cold site and treat this a new install to get configured.

  2. This would be needed in the event that the ECA cluster at the Hot site is down for a long period of time.



Troubleshooting ECA Configuration Issues

This section covers how to troubleshoot cluster or detection issues for testing.


Issue - no CEE Events are processed

  1. On each node in the cluster check that CEE is arriving at the cluster node

  2. ssh to each node and run the command ‘ecactl logs --follow cee’

  3. cee_1           | Apr 24 16:37:32 2017 [EMC CEE]: CTransport+::DispatchEvent(): Got a CEPP_HEARTBEAT request

  4. cee_1           | Apr 24 16:37:34 2017 [EMC CEE]: CTransport+::DispatchEvent(): Got a CEPP_HEARTBEAT request

  5. cee_1           | Apr 24 16:37:35 2017 [EMC CEE]: CTransport+::DispatchEvent(): Got a CEPP_HEARTBEAT request

  6. cee_1           | Apr 24 16:37:42 2017 [EMC CEE]: CTransport+::DispatchEvent(): Got a CEPP_HEARTBEAT request

  7. cee_1           | Apr 24 16:37:44 2017 [EMC CEE]: CTransport+::DispatchEvent(): Got a CEPP_HEARTBEAT request

  8. cee_1           | Apr 24 16:37:45 2017 [EMC CEE]: CTransport+::DispatchEvent(): Got a CEPP_HEARTBEAT request

  9. cee_1           | Apr 24 16:37:52 2017 [EMC CEE]: CTransport+::DispatchEvent(): Got a CEPP_HEARTBEAT request



Issue - Cluster startup can not find Analytics database on Isilon

  1. On cluster startup the below image indicates the analytics database

  2. Can also run the command

    1. ecactl cluster status

Screen Shot 2017-02-22 at 1.54.37 PM.png


Issue - Events being dropped or not committed to Analytics database

Use this procedure to check the rabbitmq queues if events are not being processed fast enough.  A list of events that is growing indicates slow commit to analytics database. Monitor with these commands over a period of time.


  1. On ECA node login to the rabbitmq container

    1. ecactl containers exec rmq /bin/bash

    2. rabbitmqctl -p /eyeglass list_queues

    3. rabbitmqctl -p /eyeglass list_bindings

    4. rabbitmqctl -p /eyeglass list_queues  name messages messages_ready messages_unacknowledged

  2. run the first two on the first time you login

  3. run the last command periodically.

  4. This will show you the depth of the queues for the different exchanges.



Issue - Monitoring Isilon Audit Event Rate Performance


Use this procedure to check each node in the clusters uncommitted audit event backlog that has not been sent to CEE end points.


  1. isi_for_array "isi_audit_progress -t protocol CEE_FWD"

    1. Output example


tme-sandbox-4# isi_for_array "isi_audit_progress -t protocol CEE_FWD"

tme-sandbox-4: Last consumed event time: '2016-10-21 19:59:54'

tme-sandbox-4: Last logged event time:   '2017-01-17 17:05:32'

tme-sandbox-6: Last consumed event time: '2016-10-21 19:59:54'

tme-sandbox-6: Last logged event time:   '2017-01-12 20:25:10'

tme-sandbox-5: Last consumed event time: '2016-07-22 18:26:25'

tme-sandbox-5: Last logged event time:   '2017-01-12 21:50:17'


The last logged and last consumed times match indicates no performance issues on the node.

If the last logged and last consumed is delayed earlier date and time stamp, then calculate the differences in minutes to determine the “Lag” between an event and when it is sent to CEE endpoints.

NOTE: A large lag indicates performance issues and also means detection of an event will be delayed by this lag time.