RunBookRobot Admin Guide

Eyeglass Runbook Robot Guide



Introduction to this Guide


The purpose of this guide is to assist you in configuring and testing DR readiness using the Eyeglass Runbook Robot.  


Overview


To gain the most from Eyeglass Readiness features,  the Eyeglass Runbook Robot allows customers to set up and have continued DR operating between pairs of clusters.  This feature also allows testing of application failover logic by creating configuration data and copying data into the Robot Access zone.

This feature runs with a test access Zone to eliminate impact on production access zones.  The feature uses specially named Access Zone and SyncIQ policies so that there is no conflict with production access zone and policy data.

The following test validations are all done on a daily basis. The DR dashboard will be updated along with any failures sent as critical events. These daily Runbook Robot validation tests are your best indicator that your cluster is ready for a failover.

  1. API access to both clusters is functioning - Validated

  2. API access allows creation of export, share, quota - Validated

  3. NFS mount of data external to the cluster functions - Validated

  4. DNS resolution for SmartConnect is checked when Eyeglass configures itself to use SmartConnect service IP as its DNS resolver on the source to verify SmartConnect zone functionality on mount of data requests - Validated

  5. SyncIQ policy replication completes between source and destination cluster when data is written to the source - Validated

  6. Configuration replication of test configuration from source to destination - Validated

  7. SyncIQ failover to target cluster - Validated

  8. Test data access on target cluster post failover - Validated

  9. Verify data integrity of the test data on target cluster - Validated

  10. Configuration Sync of quotas from source to target on failover - Validated

  11. Delete of Quotas on source cluster - Validated

  12. SyncIQ Failback from target to source cluster  - Validated

The setup of a Robot DR job is as simple process that has 2 modes of operation.

  1. Basic DR Robot -  This mode only requires a SyncIQ policy with a certain pre-defined name to exist. This allows Eyeglass to use this policy to automate writing data, failing over and failing back between the clusters the policy is configured to use.

    1. DR Coverage:  This basic mode does not test all the possible functions exhaustively and only validates basic DR readiness checks.

  2. Advanced DR Robot - This mode requires the same SyncIQ policy to be created in an Access Zone with a pre-defined name AND to be the only policy in the Access Zone.

    1. DR Coverage:  Full API, DNS, smartconnect zone aliasing, data access, mounting of data, SyncIQ, Data replication, configuration creation, and configuration replication.

Advanced DR DFS mode Robot -  This mode only requires a single policy to exist and can be used with DFS mode. It's the same configuration as the Advanced Robot but allows DFS only customers to configure DFS clients to test write access to the robot data for testing DFS switching and failover operations.   It will still write data over an NFS export but also shares can be created to present a DFS folder for testing with the robot data to make sure the DFS folder is accessible daily.  

    1. DR Coverage:  mounting of data, SyncIQ, Data replication, configuration creation, configuration replication

    2. Validates DFS switching and AD configuration in DFS if configured.

    3. Configuration only changes by enabling DFS mode on the Robot Policy after creation.  See the DFS guide on how to enable DFS mode on existing policies

If Zone Readiness is not green, the Robot job will not run.  If an error occurs during the Robot DR job the job will be stopped at the failed step.  Use the Failover Log to learn about failed steps.


Runbook DR Robot FAQ’s


The following are some Frequently Asked Questions that users have asked in configuring the Runbook Robot:

  1. Does the robot support  Access Zones? yes see Advanced setup

  2. Can I create a Robot SyncIQ policy in the system zone?  Yes

  3. Does Run Book Robot support SMB access when mounting the cluster ? no, not at this time

  4. Can I copy data into the Robot Policy folder to get more data replicated?  Yes this can be used to test failover with Eyeglass as well




Where to go for Help


This guide is designed to help you through the configuration and DR testing, should you have any issues that are not addressed in this guide. Superna offers support in several forms, on line, voicemail, Email, or live on-line chat.

  1. The support site provides online ticket submission and case tracking.  Support Site link - support.superna.net 

  2. Leave a voicemail at 1 (855) 336-1580

    1. (must leave customer name email, description of question or issue, primary contact for your company with an account in our system. We will  assign the case to primary contact for email followup)

  3. Email eyeglasssupport@superna.net

  4. This is also how to download license keys.

  5. You can also raise a case right from in Eyeglass desktop using the help button, search for your issue and if want to raise a case or get a question answered, click the leave us a message, name, email and appliance ID and a case is opened directly from Eyeglass.

 http://site.superna.net/_/rsrc/1472870726155/support/LeaveUsAMessage.png?height=200&width=167

  1. Or get Support Using Chat M-F 9-5 EDT  (empty box?  we are not online yet)

  2. Eyeglass Live Chat 

  3. You should also review our support agreement here.


Basic DR Robot Configuration


The basic option requires a non-production, Eyeglass specific SyncIQ policy to exist.  The Runbook Robot will exercise the failover and failback of the data for this SyncIQ policy. This configuration does not fully exercise all automation required for failover but it’s quick and easy to setup.

Prerequisites


  1. SyncIQ policy in any Access Zone BUT only “Runbookrobot” prefixed policies will execute the failover logic.  

    1. This type of Robot policy will not failover Smartconnect names, alias, SPN’s.

    2. This is designed to ensure no Smartconnect Zones used by non Robot policies will be failed over in the Access Zone:

  2. The Basic Robot is intended to operate on non-production data.  The steps below include directions to create the Runbook Robot specific SyncIQ policy on the cluster for the Basic Runbook Robot operation.

  3. If you create shares, exports or quotas in the path of the Robot SyncIQ policy, they will be failed over as well using normal configuration sync jobs.  It's a good way to test the whole failover of the configuration.  The configuration names of shares, exports can be any name.

Note: Eyeglass will modify the description, Root Client and Map Root User settings as required for Robot operation on an existing export with same path as the Robot SyncIQ Policy root path.

  1. Quotas will be failed over during the robot execution and deleted on source automatically if quotas have been created within the SyncIQ policy used for the robot.

  2. Eyeglass Hostname on the Network Card Setup and the Network Settings (setup using yast during initial Eyeglass appliance configuration) must be identical.


Configuration Steps for Basic DR Robot


The steps are the following:

  1. Log into the Isilon Source cluster via OneFS.

  2. Click on Data Protection, then SyncIQ, then on the Policies tab.

  3. Click on + Create a SyncIQ Policy.

  1. In the Settings section of the Create SyncIQ Policy window, enter EyeglassRunbookRobot-XYZ (where XYZ is a random string) in the Policy Name field.

IMPORTANT: This name format must be followed exactly in order for the Basic Runbook Robot job to execute.

  1. In Source Cluster section, select a Source Root Directory by clicking Browse next to the box and navigating to the desired folder.

IMPORTANT: This should be a directory containing NON production data as it will be failed over by default once a day.



  1. In Target Cluster section under Target Host, enter the target cluster smartconnect zone name used for SyncIQ replication of the Target cluster , and under Target Directory the path on the Target cluster to where the data will be replicated .


  1. Do not enter or modify anything in the Target Snapshots and Advanced Settings sections.

  2. Click on Create Policy. The policy will appear in the SyncIQ Policies chart.


  1. Click on More and select Start Job on newly created policy.

  2. Ensure that the policy completes successfully.

  3. Log into Eyeglass.

  4. Open the Jobs window and select the Job Definitions section.

  5. Under the Configuration Replication jobs, look for a Job Name that resembles SourceClusterName_EyeglassRunbookRobot-XYZ (where XYZ is the random string chosen when creating the policy). Make sure it’s State is OK.

Notes:

  1. This Job will be created after Eyeglass discovery has found the SyncIQ Policy created in the previous steps.  If the Job is not present, wait for the next Eyeglass discovery cycle and then check again.

  2. Inventory runs with every Eyeglass Replication cycle ( every 5 minutes by default ) so you may need to wait for the next Eyeglass Replication cycle for the new Job to appear in the jobs window.





.

  1. If the SyncIQ Policy Name that was created has been entered properly, a new type of Job category will appear in Eyeglass: Failover: Runbook Robot (AUTOMATIC).

  1. By default, the Job will be queued to run at midnight every day.

    1. If you want to test it after creating the policy, select the job in the jobs windows and use bulk action to run now.

  2. When the Runbook Robot Job runs, in the Running Jobs section a job will appear with the Name Runbook Robot, followed by a time stamp.

  1. The Runbook Robot job will run a three-stage Failover Test, which consists of a Failover Preparation, SyncIQ Policy Failover, and Failover Validation. Once this is done, it will run a Failover Cleanup.

  2. The Failover Preparation component consists of the following steps:

    1. The job will create a new Eyeglass export or update an existing Eyeglass export with the following parameters:

      1. Description: Superna Eyeglass Runbook Robot export. Appliance ID:

      2. Root Clients: The Eyeglass IP address used

      3. Map Root User settings

      4. Directory Paths: The Source Root Directory chosen when creating the SyncIQ Policy

    2. This export is then mounted on the Eyeglass appliance.

    3. Data is then written to the export, which is dependent on the SyncIQ Policy Path and timestamp.

    4. The export is then unmounted.

      1. To view the exports created: open OneFS with the IP address of the source cluster, click on the Protocols tab, then click on UNIX Sharing (NFS), followed by the NFS Exports tab.

  1. Once the preparation is successfully completed, Eyeglass performs a SyncIQ Policy Failover.

  2. On completion of the failover, Runbook Robot will run a Failover Validation in order to make sure that the data was failed over without issues. The Failover Validation component consists of the following steps:

    1. The export created is mounted on the Eyeglass appliance using the IP address of the failover target cluster.

    2. The timestamp is read to confirm it is identical to what was written during the Failover Preparation step.

    3. A new timestamp is written to the Failover.ts file.

    4. This new timestamp is then read to confirm it has been written correctly.

    5. This export is then unmounted from the Eyeglass appliance.

  3. Once the Failover Test has been completed, the Runbook Robot job will run a Failover Cleanup, which consists of unmounting the export from the Eyeglass appliance.

  4. The default Robot jobs run every day at midnight and executes a failover Robot policy.   

    1. Creates export

    2. Mounts the cluster writes test data using the export created

    3. It failovers the policy

    4. runs the policy

    5. Mounts the data on synced export on target

    6. unmounts

    7. moves the schedule on policy to the target

    8. runs resync prep on source

    9. Goes to sleep until time to failback

  5. Methods to verify it was successful

    1. Check the DR Dashboard policies tab and verify it’s green

    2. Check the cluster syncIQ policy status on both clusters and make sure the policy moves from one cluster to the other each day

    3. Make sure the test file exists with a current date and time stamp on the active cluster (Hint; look in the policy path root file system for the test file)

    4. If a quota was applied to robot policy path, make sure the quota moved to the target cluster AND was deleted on the source cluster (Eyeglass moves policies on failover)

    5. Review the Failover Log (DR Assistant / Failover History / Open log file).  Failover log may also be downloaded from the Failover Log Viewer using the Download File link.

Advanced DR Robot Configuration


This option exercises all Eyeglass Failover automation and more closely follows the steps for an Access Zone failover and failback operation.  This takes more time to set up but offers the highest level of confidence everything required for failover is in place for production access Zones.


Prerequisites


  1. Dedicated IP pool added as member to the Robot Access zone.

    1. Create Smartconnect Zone name of your choosing on source and target cluster IP pools

  2. SyncIQ policy in an Access Zone with Runbook robot prefixed  name AND only one Runbook robot prefixed policy can exist in this Access Zone.  If any other SyncIQ policies are detected the Robot will disable itself and stop functioning.  This is designed to ensure no production data ends up getting failed over in the Access Zone

  3. Share or export created in the path of the Robot SyncIQ policy, They will be failed over as well using normal configuration sync jobs.  It's a good way to test the whole failover of configuration.  The configuration names of shares, exports can be any name.

Note: Eyeglass will modify the description, Root Client and Map Root User settings as required for Robot operation on an existing export with the same path as the Robot SyncIQ Policy root path.

  1. Quotas will be failed over during the robot execution and deleted on the source automatically if quotas have been created within the SyncIQ policy used for the robot.

  2. Zone Readiness task must be enabled (default state is disabled ).  The Zone Readiness jobs can be enabled from the jobs windows.  or the webshell cli with (igls admin schedules set --id Readiness --enabled true)

  3. Zone Readiness task must have run and Zone Readiness status for Robot Zone must be OK.  Then Configuration Replication for the Robot Job must have been run successfully.

  4. Eyeglass Hostname on the Network Card Setup and the Network Settings (setup using yast during initial Eyeglass appliance configuration) must be identical.

Screen Shot 2015-09-16 at 6.29.18 PM.png

Preparation and planning instructions for Zone Readiness can be found here.


Configuration Steps for the Advanced DR Robot


  1. Create an Access zone with name beginning with “EyeglassRunbookRoboton both source and target clusters to be tested for DR. (Note: more than one pair of clusters can be tested with runbook robot)

Screen Shot 2015-12-12 at 8.51.27 PM.png

  1. Create a SyncIQ policy with the well known name “EyeglassRunbookRobot-xxxx” where xxx is a number or string of your choosing. Run the policy once it has been created.

    1. Use a path that is a child of the access zone base path.  any path will do.

    2. See Basic setup above for detailed steps

Screen Shot 2015-12-12 at 8.52.18 PM.png

Screen Shot 2015-12-12 at 8.52.28 PM.png

  1. Create a subnet pool (example robotpool) and make it a member of the Access Zone created in Step #1,  example  “EyeglassRunbookRobot”.

Note: the IP address space used should be reachable by Eyeglass to mount with NFS export.  Create this pool on the source and target clusters.

    1. Example: subnet0:robotpool

  1. Now create mapping alias for the robot to move Smartconnect Zones from one pool to another. (see detailed section on hints for explanations)

    1. Example on Source cluster:  isi network modify pool --name=subnet0:robotpool --add-zone-aliases=igls-01

    2. Example on target cluster: isi network modify pool --name=subnet0:robotpool --add-zone-aliases=igls-01

NOTE: In this release, for the Access Zone Robot to show up in the Run Book robot jobs, some configuration data needs to exist in the access zone.   We recommend creating a share with no permissions anywhere in the access zone that is under the SyncIQ policy created within the Access Zone root path.  Example “Robotshare

  1. After inventory runs (5 minutes default interval) you should see the Runbook robot Jobs section showing the SyncIQ policy created in this section

Screen Shot 2015-08-18 at 7.18.20 AM.png

  1. To get the Zone Readiness updated (default interval 6 hours), run this job manually (See igls CLI in the admin manual to change schedule of all jobs)

  2. After the Zone Readiness job has completed, wait for the next Configuration Replication cycle to complete.

Screen Shot 2015-08-18 at 7.19.24 AM.png

  1. Now open the DR Dashboard and select Zone Readiness to view the Robot Zone Readiness status

Screen Shot 2015-08-17 at 8.11.02 PM.png

  1. If you have errors related to the robot zone,  click on the link to view which areas are not correctly setup and correct the errors by reviewing documentation.

Screen Shot 2015-08-17 at 8.12.41 PM.png

  1. To view the SmartConnect to pool mapping click the View Map link, correctly mapped with alias hints will look like the image below

Robot Access Zone with mapping example

Screen Shot 2015-08-17 at 9.23.17 PM.png

  1. Confirm  that your robot zone is setup with all Zone Readiness Status indicators green (it needs to be all green)

Screen Shot 2015-08-18 at 7.20.42 AM.png

  1. The default Robot jobs run every day at midnight and executes both Access Zone based failover Robot policies which moves all networking, SPN updates (delete, add) and aliases on subnet pools using the mapping hints each.   It also executes any Runbook Robot policies in other Access zones like system.

Robot jobs are configured without Continue on Error so initial validation check must not have any errors.

  1. The default Robot jobs run every day at midnight and executes failover Robot policies which fails over the policy.   

    1. Creates export

    2. Mounts the cluster writes test data using export created

    3. It failovers the policy

    4. runs the policy

    5. deletes SPN’s on source

    6. renames smartconnect alias on source with igls-original

    7. creates smartconnect alias on target based on mapping setup before hand

    8. creates new SPN’s against new cluster machine account

    9. Mounts the data on synced export on target

    10. unmounts

    11. moves the schedule on policy to the target

    12. runs resync prep on source

    13. Goes to sleep until time to failback time

  2. Methods to verify it was successful

    1. Check the DR Dashboard policies tab and verify it’s green

    2. Check the cluster syncIQ policy status on both clusters and make sure the policy moves from one cluster to the other each day

    3. Make sure the test file exists with a current date and time stamp on the active cluster (Hint; look in the policy path root file system for the test file)

    4. If quota was applied to robot policy path make sure the quota moved to the target cluster AND was deleted on the source cluster (Eyeglass moves policies on failover)

    5. Review the Failover Log (DR Assistant / Failover History / Open log file).  Failover log may also be downloaded from the Failover Log Viewer using the Download File link

  3. Check the DR Dashboard Zone Readiness Screen to make sure the Robot runs successfully and shows Failover or Green depending on which cluster the policies are active.

NOTE: Initially after failover, the Zone Readiness will show an error for Eyeglass Configuration Replication and Zone Configuration Replication until the next Configuration Replication task and Zone Readiness task have completed.

  1. You can also check the smartconnect zone alias on each cluster and look for igls-original prefix on SmartConnect Zone name failed over.

Screen Shot 2015-08-18 at 7.33.06 AM.png

    1. The image above shows this clusters smartconnect pool zone name was renamed on failover and indicates the dr.ad1.test zone has been moved to the production cluster based on the alias hints mapping.  Cool!

  1. The other way to check status is see which cluster has the enabled SyncIQ policy for the Robot zone






Advanced Settings


How to Change the Robot Scheduled interval


See administration guide link


Manual Export Create for Runbook Robot


In some cases, creating the nfs export used by the robot should be disabled on each run of the robot job and allow manual export creation to be done.

Follow these steps only when directed by support.

  1. Open Eyeglass shell from main menu

  2. Enter command igls adv runbookrobot set --createExport=false

  3. This will require manual export create on the Robot policy path with root client set to the ip address of the Eyeglass appliance.

  4. Configuration sync will sync the export once created on the cluster with the enabled policy




Multiple Robot Feature Support (Hub and Spoke)


Testing multi-site replication topology:

A -> B

A -> C

set it up as follows:

cluster A

policy 1 = source path 1 to cluster B

policy 2 = source path 2 to cluster C

Topology

Access Zone path

Access Zone name

Policy source path

Policy target path

A → B

ifs/AZ1/

EyeglassRunbookRobot-1

Clstr A: ifs/AZ1/P1

Clstr B:

ifs/AZ1/P1

A → C

ifs/AZ2/

EyeglassRunbookRobot-2

Clstr A:

ifs/AZ2/P2

Clstr C;


ifs/AZ2/P2


Hints pool mapping setup:


Source

Target

A → B

igls-01-prod

igls-01-dr

A → C

igls-02-prod

igls-02-dr







.


Multiple Robot Feature Support (Chain)


Testing multi-site replication topology:

A -> B

B -> C

set up as follows:

cluster A

policy 1 = source path 1 to cluster B ,

cluster B

policy 2 = source path 2 to cluster C ,  

Topology

Access Zone path

Access zone name

Policy source path

Policy target path

A → B

ifs/AZ1/

EyeglassRunbookRobot-1

Clstr A: ifs/AZ1/P1

Clstr B:

ifs/AZ1/P1

B → C

ifs/AZ2/

EyeglassRunbookRobot-2

Clstr B:

ifs/AZ2/P2

Clstr C;


ifs/AZ2/P2


Hints pool mapping setup:


Source

Target

A → B

igls-01-prod

igls-01-dr

B → C

igls-02-prod

igls-02-dr




Run Multi site runbook job:

  1. Further details for prerequisite step before execute RunbookRobot DR Automation can be found in prerequisite section in this document.

  2. Create subnet pool ( robot-pool) for Runbook Access Zone, set mapping alias for the robot pool to point Smartconnect Zones between source and target robot-pool. More inforamtion for best practice creating runbook robot policy and other configuration (like,  zone readiness and igs hints alias, etc can be found in above sections or in this link for Runbook Access Zone advance configuration.

  3. Enable the runbook job,  Run the config replication policy job.

  4. Run the runbook Failover and check the running job to find the Failover steps  









Multiple Robot Feature Support (Multiple Instances on Cluster Pairs)


Replicating in pairs:

Cluster A -> Cluster B

Cluster C -> Cluster D

1 eyeglass managing 4 clusters

Set it up as follows: Different Access Zone name and path on each pair

Summary of setup:


Access Zone

Path

SyncIQ Policy

Mapping Hint on Robot Subnet Pool

Cluster A

EyeglassRunbookRobot-1

/ifs/data/robot

EyeglassRunbookRobot-Puru

Source path: /ifs/data/robot

Target path: /ifs/data/robot

igls-robot-source8002

Cluster B

EyeglassRunbookRobot-1

/ifs/data/robot


igls-robot-target8002

Cluster C

EyeglassRunbookRobot-2

/ifs/data/robot2

EyeglassRunbookRobot-Robot2

Source path: /ifs/data/robot2

Target path: /ifs/data/robot2

igls-robot1-gbisi01

Cluster D

EyeglassRunbookRobot-2

/ifs/data/robot2


igls-robot1-gbisi02


Example:

cluster A (Source-8002)

policy 1 (EyeglassRunbookRobot-Puru) = source path: /ifs/data/robot to target path: /ifs/data/robot

Cluster C (Sourcerobot7201)

policy 1 (EyeglassRunbookRobot-Robot2) = source path: /ifs/data/robot2 to target path: /ifs/data/robot2



Detailed configuration:


  1. Create an Access zone with name beginning with “EyeglassRunbookRobot” on all four clusters ( i.e two source and two target cluster to be tested for DR. )

Note: Two source access zone name, basepath  need to be unique and not overlapping


  1. Create a SyncIQ policy on two source cluster See Basic setup above for detailed steps

  2. Create a subnet pool and make it a member of the Access Zone

Note: the IP address used should be reachable by Eyeglass to mount with NFS export.  Create this pool on all four clusters.

  1. Dual DNS delegation needs to be done on both Source cluster Smartconnect Zone name and make sure it is resolving. (see detailed section on dual delegation)

  2. Now create mapping alias for the robot to move Smartconnect Zones from one pool to another. (see detailed section on hints for explanations)

    1. Example on Source cluster A:  isi network pools modify groupnet0.subnet0.pool4 --add-sc-dns-zone-aliases=igls-robot-source8002

    2. Example on Source cluster C:  isi network modify pool --name=subnet0:prod-robot After inventory runs (5 minutes default interval) you should see the robot Jobs in “userdisabled” state therefore need to enable it and run it

    3. --add-zone-aliases=igls-robot1-gbisi01

    4. Example on target cluster B: isi network pools modify groupnet0.subnet0.pool4 --add-sc-dns-zone-aliases=igls-robot-target8002

    5. Example on target cluster D: isi network modify pool --name=subnet0:dr-robot --add-zone-aliases=igls-robot1-gbisi02

  1. To get the Zone Readiness updated See Advanced setup above for detailed steps

  2. Now open the DR Dashboard and select Zone Readiness to view the Robot Zone Readiness status


  1. After verifying Zone readiness status, we have to run the Failover: Runbook Robot (AUTOMATIC) job to start failover

  1. After Failover you can check the smartconnect zone alias on each cluster and look for igls-original prefix on SmartConnect Zone name failed over. See Advanced setup above for detailed steps

  2. Here is the Failover History

  1. Done