Access Zone Failover Guide

Eyeglass Access Zone Failover Guide

Product Name - Superna Eyeglass

Revision Changes to this Document - Version 2.0


Contents

  1. 1 Product Name - Superna Eyeglass
    1. 1.1 Revision Changes to this Document - Version 2.0
  2. 2 Introduction to this Guide
    1. 2.1 Overview
    2. 2.2 What's New with Access Zone Failover
      1. 2.2.1 Release 1.8
      2. 2.2.2 Release 1.6
      3. 2.2.3 Release 1.7
      4. 2.2.4 Release 2.0
    3. 2.3 Multi Site Access Zone Failover Guide
    4. 2.4 Help
  3. 3 How to Setup and Configure Access Zone - Overview Video
  4. 4 Requirements for Eyeglass Assisted Access Zone Failover
    1. 4.1 Cluster Version Requirements - May Block Failover
    2. 4.2 Access Zone Requirements - Blocks Failover
    3. 4.3 SyncIQ Policy Requirements - Blocks Failover
    4. 4.4 DFS Mode Requirements
    5. 4.5 Shares / Exports / NFS Alias Requirements
    6. 4.6 Eyeglass SmartConnect Requirements - Blocks Failover
    7. 4.7 Eyeglass Failover Mapping Hints Requirements - Blocks Failover
    8. 4.8 Failover Target Cluster Requirements - Block Failover
    9. 4.9 Eyeglass Quota Job Requirements - Will not Block Failover
  5. 5 Unsupported Data Replication Topology
  6. 6 Recommendations for Eyeglass Assisted Access Zone Failover
    1. 6.1 Shares / Exports / NFS Alias Recommendations
    2. 6.2 Service Principal Name Recommendations
    3. 6.3 SyncIQ Policy Recommendations - Does not Block Failover
  7. 7 Preparing your Clusters for Eyeglass Assisted Access Zone Failover
    1. 7.1 This is required when the Eyeglass Service Account is being used to execute CLI commands that require root privileges (see the following section for detail)
    2. 7.2 Update Isilon sudoer file for Eyeglass Service Account
    3. 7.3 Active Directory Machine Account Service Principal Name (SPN) Delegation
      1. 7.3.1 What is an SPN?  
      2. 7.3.2 What’s the risk if I don’t fix SPN’s?
      3. 7.3.3 Delegate SPN
    4. 7.4 Configure Eyeglass Subnet IP Pool Mapping Hints
      1. 7.4.1 Zone Aliases for Failover Overview  
      2. 7.4.2 When and how to NOT failover an IP pool SmartConnect name using hint: igls-ignore
        1. 7.4.2.1 Syntax of Igls-ignore:
        2. 7.4.2.2 When to apply ignore hints:
      3. 7.4.3 How to create Mapping Hints for IP pools between source and target clusters following best practise naming convention
      4. 7.4.4 OneFS 7 Example igls hints for user data
      5. 7.4.5 OneFS 8 Example igls hints for user data
        1. 7.4.5.1 Data Pool mapping example - Prod
        2. 7.4.5.2 Data Pool mapping example - DR
    5. 7.5 IGLS examples for ignore option on IP Pools
      1. 7.5.1 OneFS 7 Example igls hints for ignore SyncIQ pool
      2. 7.5.2 Example: Mapping with hints configured correctly will display the mapping
      3. 7.5.3 OneFS 8 Example igls hints for ignore SyncIQ pool
        1. 7.5.3.1 SyncIQ Ignore hint replication Pool
        2. 7.5.3.2 DFS Ignore example -  Prod
        3. 7.5.3.3 DFS Ignore example - DR
      4. 7.5.4 Hot-Cold Replication Topology Mapping Examples
        1. 7.5.4.1 Example: Single Access Zone, Single IP Pool for Access, Single IP Pool for SyncIQ
          1. 7.5.4.1.1 Before Mapping
          2. 7.5.4.1.2 After Mapping
        2. 7.5.4.2 Example: Multiple Access Zone, Multiple IP Pool for Access, Single IP Pool for SyncIQ
          1. 7.5.4.2.1 Before Mapping
          2. 7.5.4.2.2 After Mapping
        3. 7.5.4.3 Example: Single Access Zone containing DFS, Single IP Pool for Access, Single IP Pool for SyncIQ
          1. 7.5.4.3.1 Before Mapping
          2. 7.5.4.3.2 After Mapping
  8. 8 Isilon Administration for Clusters Configured for Eyeglass Assisted Access Zone Failover
    1. 8.1 SPN Management
    2. 8.2 Post Failover Automation
  9. 9 Failover Planning and Checklist
  10. 10 Monitoring DR Readiness for Eyeglass Assisted Failover
    1. 10.1 Access Zone DR Readiness Validation
    2. 10.2 Runbook Robot (Automate DR Testing on a schedule)
      1. 10.2.1 Overview
      2. 10.2.2 Run Book Robot Failover Coverage
  11. 11 Operational Steps for Eyeglass Assisted Access Zone Failover
    1. 11.1 Access Zone Workflow Steps  Overview
    2. 11.2 Access Zone Execution Steps
  12. 12 Post Access Zone Failover Manual Steps
    1. 12.1 Test Dual Delegation
    2. 12.2 DNS Smartconnect name tests
    3. 12.3 Check for SPN Errors
      1. 12.3.1 Example Logs for SPN Error:
    4. 12.4 Refreshing SMB connection after Failover completed
    5. 12.5 Refreshing NFS connection after Failover completed
  13. 13 Post Access Zone Failover Checklist
    1. 13.1 SyncIQ Policy Updates
    2. 13.2 Quota Updates
    3. 13.3 SPN Updates
    4. 13.4 SmartConnect Zone Updates
      1. 13.4.1 Example: SmartConnect Zone Update
        1. 13.4.1.1 FAILOVER from CLUSTER 1 to CLUSTER 2
        2. 13.4.1.2 FAILOVER AGAIN - CLUSTER 2 to CLUSTER 1
  14. 14 IP Pool Failover
    1. 14.1 IP Pool Failover
      1. 14.1.1 Prerequisites
      2. 14.1.2 Configuration Diagram
      3. 14.1.3 Policy - Pool Assignment
      4. 14.1.4 Policy - Pool Mapping Diagram
      5. 14.1.5 Configuration Steps:
      6. 14.1.6 Example of IP Pool Failover
        1. 14.1.6.1 Pool Failover Source (Pool2) ⇒ Target (Pool2)
        2. 14.1.6.2 Pool Failback  Target  (Pool2) ⇒ Source (Pool2)
    2. 14.2 Fan-In IP Pool Failover
      1. 14.2.1 Fan-In configuration Diagram
      2. 14.2.2 SyncIQ Policy to Pool Assignments
      3. 14.2.3 SyncIQ Policies - Pools Mapping Diagram
      4. 14.2.4 Example of Pool Failover
        1. 14.2.4.1 Pool Failover Source01 (Pool 2) ⇒ Target (Pool 2)
        2. 14.2.4.2 Pool Failover Source02 (Pool 2) ⇒ Target (Pool 4)
        3. 14.2.4.3 Pool Failback  Target  (Pool 2) ⇒ Source01 (Pool 2)
        4. 14.2.4.4 Pool Failback  Target  (Pool 4) ⇒ Source02 (Pool 2)
  15. 15 APPENDIX 1 - Controlled Failover Option Results Summary

Introduction to this Guide

The purpose of thisdocument is to act as a guide to  Access Zone Failover.

Overview

Access Zones keep all configuration data separated including authentication providers, this also provides segmentation for business units or application tiers.   Access Zones allow data, configuration, IP pools and SmartConnect Zones to be associated to DNS delegations for data access.

With Access Zone failover, all SyncIQ policies, all Access Zones, all SMB Shares, all NFS Exports and all Quotas are failed over as a unit. Eyeglass can then alarm, detect and correct SPN entries automatically without the user being required to know in advance which SmartConnect zones match which SPN share mounts.

Shared filesystems with NFS and SMB that MUST failover together will benefit from Access Zone failover.  If NFS only failover is required per SyncIQ policy, failover will meet your needs with less pre-configuration.  Since NFS requires unmount and remount of data, it's just as easy to change the mount path name.  

What's New with Access Zone Failover

Release 1.8

Time skew validation added to to check time differences between nodes and between Eyeglass and the clusters.    This validation has an acceptable range that will not trigger a warning.  This validation verifies that SyncIQ operations between clusters are not affected due to differences in the times between clusters.  It runs during Zone Readiness and checks all time on each node in all clusters.



Screen Shot 2016-11-23 at 9.09.22 PM.png

Release 1.6

As of release 1.6 and beyond multi site replication is now possible along 3 site fully automated failover.

This new option allows A to B and A to C site replication from the same source access zone.  This provides a DR choice to failover to B site or C site depending on the DR event.   In addition, this allows for failover and failback operations from C back to A or B back to A site.    

This feature will use triple site DNS delegation of SmartConnect zones and extends dual delegation to 3 NS records and allows the DNS name space, to failover from site A to B or C and back if needed.

This extends SyncIQ to allow the highest, data and site location availability along with flexibility of “one button failover” to more than a single site.

Release 1.7

As of release 1.7 and beyond Access Zone Failover will restrict the number of parallel Job requests to the Isilon cluster for the Run SyncIQ Policy data sync step based on cluster version:

  • OneFS 7.2 - 5 parallel job requests (OneFS 7.x cluster have a limit of 5 concurrent policies).  Eyeglass will monitor the progress for each Job and submit a new request as previously submitted requests are completed.

  • OneFS 8    - parallel job requests limit based on Eyeglass appliance configuration (default 10)

Release 2.0

This release offers a new option to failover using Access  zone Dual DNS failover.  This option is called IP pool failover and offers more flexibility to failover data within an access zone.  

Customers using only system zone will now have new option to partially failover data at the IP pool level.    

This new option now allows Active Active clusters with active data within a single Access zone on 2 clusters.

The failover logic and all previous requirements are exactly the same including dual delegation, SPN delegation, igls hints.   

New requirement is that SyncIQ policies are mapped in DR dashboard Zone Readiness UI to one and ONLY one IP pool.   A pool can have more than one SyncIQ policy mapped to a single Pool.   SyncIQ policies that are mapped should be protecting ONLY the smartconnect names assigned to the pool.  When the pool is failed over only the mapped policies will be selected for failover.

NOTE:

  1. Zone Readiness will now validate each pool’s readiness for failover independently.

  2. Policies must be mapped to at least 1 pool to be failed over

  3. The entire access zone can still be failed over in DR Assistant by selecting the zone and all pools within the zone will failover.





Multi Site Access Zone Failover Guide

For detailed instructions on how to configure multi site automated 3 site failover see Multi Site Failover Guide .  This guide should be read and understood and implemented first with 2nd site, and then add 3rd site Access Zone failover setup.

Based on extensive testing for safe failovers, make writeable and resync prep are serialized steps.  New parallel flag allows this step to run 10 threads wide but does not stop on failures.  Use with caution see Failover Design Guide on configuration.

Help

This document is designed as a guide for Access Zone Failover, should you have any issues that are not addressed in this guide, Superna offers support in several forms; on line, voicemail, Email, or live on line chat.

  1. The support site provides online ticket submission and case tracking.  Support Site link - support.superna.net 

  2. Leave a voicemail at 1 (855) 336-1580

In order to provide service you must have an account in our system. When calling in leave; customer name, email, description of question or issue, and primary contact for your company. We will  assign the case to primary contact for email followup.

  1. Email eyeglasssupport@superna.net

  2. To download license keys please go to the following  license keys.

  3. You can also raise a case right from in Eyeglass desktop using the help button, search for your issue and if want to raise a case or get a question answered, click the “leave us a message”  with your name, email and appliance ID and a case is opened directly from Eyeglass.

 http://site.superna.net/_/rsrc/1472870726155/support/LeaveUsAMessage.png?height=200&width=167

  1. Or get Support Using Chat M-F 9-5 EDT  (empty box?  we are not online yet)

  2. Eyeglass Live Chat 

  3. You should also review our support agreement here.

How to Setup and Configure Access Zone - Overview Video

The following video provides an overview tutorial on how to setup and configure Eyeglass Access Zone:

Eyeglass Access Zone How To Setup and Configure Overview

Requirements for Eyeglass Assisted Access Zone Failover

The requirements in this section must be met in order to initiate an Eyeglass Access Zone Assisted Failover.  Failure to meet some of these conditions may block the Access Zone Failover.

Cluster Version Requirements - May Block Failover

Clusters participating in an Access Zone Failover must be running the supported Isilon Cluster version for this feature.  See the Feature Release Compatibility matrix in the Eyeglass Release notes specific to your Eyeglass version found here.

Access Zone Requirements - Blocks Failover

For an Access Zone failover with Eyeglass, the Access Zone must meet the following requirements:

Release

System Access Zone Failover only Zone on the cluster

System Access Zone and Other non system access Zones with failover

1.4 update 1  to 1.5.4

Supported

Not supported

1.6.0 >

Supported

Supported


  • Access Zone must exist on Target Cluster with same name and authentication providers (NOTE: Access Zone Sync Jobs in Eyeglass can be used to create Access Zones automatically)

  • Access Zone must be associated with at least one subnet IP pool

  • Access Zone must be associated with one or more SyncIQ policies

    • SyncIQ policies associated to Access Zone by path - the SyncIQ policy source path must be the same or below the Access Zone base path

  • In Active-Active data replication topology, there is a dedicated Access Zone for each replication direction.

    • Failover with Eyeglass of an Access Zone for Active-Active data replication topology that is shared by SyncIQ Policies on both clusters is NOT SUPPORTED as there is no “partial” fail back path to only failback the subset of the SyncIQ policies that were originally failed over.


Example: Unsupported

Example: Supported

SyncIQ Policy Requirements - Blocks Failover

For an Access Zone failover with Eyeglass, it is required that the SyncIQ policy(s) identified as part of the Access Zone (based on Access Zone base path and SyncIQ Policy source path) meet the following requirements:

  • All OneFS SyncIQ Policy(s) must have the same Target Cluster provisioned (release 1.5.4 and earlier)

  • In Release 1.6 and later multi site replication allows an Access Zone policy to have a 3rd cluster target for multi site failover requirements.   This can be a simple policy based failover, OR fully automated Access Zone Multi site failover policy if configured.

  • OneFS SyncIQ Policy(s) Target Host SmartConnect Zone must be associated with a pool that is NOT going to be failed over

  • SyncIQ Policy(s) source root directory must be at or below the Access Zone Base Directory


Example: Unsupported



Example: Supported


.

DFS Mode Requirements

DFS mode does not require SmartConnect zone names to failover.  If you have DFS mode SyncIQ policies that fall within the Access Zone root path, it is required to have a dedicated subnet:pool with SmartConnect Zones that are used for UNC paths for DFS folder targets.  Please refer to the Eyeglass Microsoft DFS Mode Failover Guide here for more details on DFS mode setup and requirements.

Note:

DFS enabled policies that also protect NFS exports will require separate SmartConnect Zone for NFS export data access to take advantage of Access Zone automation or manual steps / post failover scripting to update NFS client mounts.

Shares / Exports / NFS Alias Requirements

Exports with multiple paths, where all paths are not associated with the same SyncIQ Policy, are not supported for Access Zone failover.  This will result in a “A unique readiness data …” error in Zone Readiness and the Overall Status cannot be displayed.

Eyeglass SmartConnect Requirements - Blocks Failover

  • For an Access Zone failover with Eyeglass, the SmartConnect Zone FQDN must not exceed 50 characters.

  • The Pool “SmartConnect service subnet” must be provisioned with the same Subnet that the Pool was created in

Eyeglass Failover Mapping Hints Requirements - Blocks Failover

For an Access Zone failover with Eyeglass, a mapping between subnet pools on the Source and Target cluster is required to ensure that data is accessed from the correct SmartConnect Zone IP and node pool on the Target cluster after failover.  

This mapping is used to failover SmartConnect zones from IP to IP pool and for failback.   

The Eyeglass mapping hints have the following requirements:  (NOTE: This is a one time setup process but needs to be repeated if new IP pools are created)

  • Eyeglass Mapping Hints are simple SmartConnect aliases created with ISI or UI to map IP Pools and Smartconnect Name failover mapping between pairs of clusters.  See mapping hints examples section.

  • Every subnet IP pool associated with the Access Zone being failed over is required to have a mapping PER IP pool  in the Access Zone.  DR Dashboard will raise an error if mapping hints are not found or incorrectly created.

  • Eyeglass mapping hints on both Source and Target cluster IP pools are created in UNIQUE pairs. See IGLS mapping hints section for syntax and examples.

  • Incorrectly mapped pools will be alarmed in Access Zone Readiness in the DR Dashboard Access Zone Readiness Tab.

  • DFS mode does not require SmartConnect zone names to failover.  If you have DFS mode SyncIQ policies in the Access Zone, a dedicated subnet:pool with Eyeglass igls-ignore hint applied is required to retain SmartConnect zones on source and target clusters.  

  • The Subnet Pool which is used in the SyncIQ Policy Restrict Source Nodes option must NOT have an Eyeglass mapping hint and must have an igls-Ignore Hint applied (NOTE: If misconfigured the SmartConnect zone used by SyncIQ would failover and would impact failback operations and SyncIQ replication.)

  • The Subnet Pool which is associated with SyncIQ Policy Target Host property of a SyncIQ policy.  This SmartConnect zone must NOT have an Eyeglass mapping hint and must have an Ignore Hint applied.  This is the pool on the target cluster used for SyncIQ replication.   NOTE: Failure to apply this hint can affect failback operations and SyncIQ replication if the smartconnect name is failed over by Eyeglass.  See mapping hints examples section.

  • Ignore hints are simply an alias with name of “igls-ignore” NOTE: best practise to ensure unique hints by using a naming format that uses cluster name example igls-ignore-clustername.  This allows Eyeglass to match on igls-ignore while allowing the hint to be unique to avoid SPN collision in the Active Directory if the SmartConnect alias is added to AD with check or repair ISI command on the cluster. Since Hints are SmartConnect aliases they can be inserted to AD machine account but are not required for kerberos authentication since they are not used to mount shares. See  Active Directory Machine Account Service Principal Name (SPN) Delegation in this document for detail.       


Example: Unsupported




Example: Supported


Failover Target Cluster Requirements - Block Failover

For an Access Zone failover with Eyeglass, it is required that the Isilon Cluster, that is the target of the failover, be IP reachable by Eyeglass with the required ports open.

Eyeglass Quota Job Requirements - Will not Block Failover

For an Access Zone failover with Eyeglass, there are no Eyeglass Quota Job state requirements.  Quotas will be failed over whether Eyeglass Quota Job is in Enabled or Disabled state.

Unsupported Data Replication Topology

Replication topology with shares or NFS alias with the same name on both clusters, and protected by different SyncIQ policies, is not supported.  Configuration Replication will overwrite the path on one cluster as the share / alias. It would attempt to have 2 SyncIQ policy on the same cluster with the same source path and failover will not succeed. Note:  This is an invalid DR configuration, this configuration means duplicate shares point to different data.  This is not a good DR configuration and it will not be possible with or without Eyeglass to failover successfully.

Recommendations for Eyeglass Assisted Access Zone Failover

The conditions outlined in the following section are highly recommended to ensure that all automated Access Zone failover steps can be completed.  If anyone of these conditions is not met it will result in a Warning.  

  • Warnings Will Not block Eyeglass Assisted Access Zone Failover, but potentially post failover will require additional manual steps to complete the failover.

  • Errors Will block Eyeglass Assisted Access Zone Failover

Shares / Exports / NFS Alias Recommendations

  • All shares, exports and alias should be created in the Access Zone that is being failed over.  It is not supported to have shares, exports and alias with a path that is outside (higher in the file system) than the  access zone base path.

    • Impact - Data Access Outage: The policy will not be selected for Failover based on path matching the Access Zone base path resulting in data that will NOT be failed over with the access zone.

    • Access Zones Readiness in the DR Dashboard show which policies have been matched to the access zone and should be verified to ensure all expected SyncIQ policies are present in the Access Zone Readiness.



Example: Share / Export / NFS Alias Configuration RECOMMENDED





Example #1: Share / Export / NFS Alias Configuration NOT RECOMMENDED


Example #2: Share / Export / NFS Alias Configuration NOT RECOMMENDED


  • Eyeglass Configuration Replication Jobs for the SyncIQ Policies in the Access Zone being failed over should have been completed without error

    • Impact - Data Access Outage: Any missing or incorrect share / export / NFS alias information will prevent client access to data on the Target cluster.  These configuration items will have to be corrected manually on the Target Cluster.

Service Principal Name Recommendations

For optimal Access Zone failover with Eyeglass where Access Zone contains SMB shares that are directly mounted using SmartConnect Zones, the following is recommended:

  • Setup delegation for SPN add and delete to allow Eyeglass to automatically update SPNs based on SmartConnect changes made during failover (see Eyeglass Isilon Edition Administration Guide)

    • Impact: With no SPN updates, SMB share client authentication may not complete.  SPNs will have to be updated manually for both the Source and Target cluster to enable Kerberos authentication again. NTLM fallback authentication should be verified with Active directory

SyncIQ Policy Recommendations - Does not Block Failover

SyncIQ Policy last run should have been successful:

  • OneFS SyncIQ Job(s)  should have been run at least once

  • OneFS SyncIQ Job(s) for last run should have been successful

  • OneFS SyncIQ Job(s) should not be in a Paused or Cancelled state

  • Impact: depending on it’s current status SyncIQ Policy MAY NOT be able to be run by Eyeglass assisted failover if the above recommendations have not been met .  If it does not run, you will incur data loss during failover.  

    • Example 1: SyncIQ Policy has an error state.  If it cannot be run from the OneFS, it will also not be able to run from Eyeglass.  

    • Example 2: SyncIQ Policy is paused.  Eyeglass failover cannot RESUME a paused SyncIQ Policy - this must be resumed from OneFS

You must investigate these errors and understand their impact to your failover solution.

Isilon does not support SyncIQ Policy with excludes (or includes) for failover

  • Impact: Not a supported configuration for failback.

Isilon best practices recommend that SyncIQ Policies utilize the Restrict Source Nodes option to control which nodes replicate between clusters.

  • Impact: Subnet pool used for data replication is not controlled therefore, all nodes in the cluster can replicate data from all IP pools.  This makes it hard to manage bandwidth and requires all nodes have access to the WAN.

Eyeglass failover will skip failover for any SyncIQ policies in the Access Zone which are in the disabled state.  

  • Impact - Data Access Outage: If SyncIQ Policies are disabled, the associated filesystem will be writeable on source and will NOT failover.  Data on the source cluster will most likely will not be reachable by clients due to the fact that the networking and SmartConnect Zone required for data access will have been failed over, and the source SmartConnect zone is renamed to ensure clients can not mount it.

Preparing your Clusters for Eyeglass Assisted Access Zone Failover

The following steps described in this section are required to prepare your system for the Eyeglass Assisted Access Zone Failover:

This is required to avoid Eyeglass requiring direct access Active Directory to synchronize the Service Principal Names (SPN) for production or DR clusters computer accounts.

  • Service principal names are used by Kerberos authentication and machine accounts and an New SPN name pair is created each time a new SmartConnect Zone Alias is created.

  • Mapping of SmartConnect Zones between source and target cluster.  

This is required so that Eyeglass can create SmartConnect Zone names and aliases on your DR cluster automatically in the event of a DR failover.

  • Update Isilon sudoer file for Eyeglass Service Account

This is required when the Eyeglass Service Account is being used to execute CLI commands that require root privileges (see the following section for detail)

Update Isilon sudoer file for Eyeglass Service Account

Eyeglass Access Zone Failover requires some CLI commands that must run with root level access.  Many customers also run the cluster in STIG or compliance mode for Smartlock WORM features. Root user account is not allowed to login and run commands. The “SPN machine account maintenance before and after cluster failover” command requires elevated permission  to allow this user permissions across the cluster nodes.  See (Eyeglass Service account guide for minimum permissions) for details on how to add sudo privileges to the Eyeglass cluster service account.

Active Directory Machine Account Service Principal Name (SPN) Delegation

What is an SPN?  

It’s used in Kerberos authentication from clients to network services for file serving.  It's formed from SmartConnect Zones and has two forms: the NTLM netbios name and Kerberos name URL format.  

Example:

When a client connects to \\data.example.com\sharename,  the SPN for this authentication request to active directory uses the SPN name to authenticate.  Kerberos is the default SPN request for authentication and  it uses the URL based request to the domain.     

HOST\data

HOST\data.example.com

What’s the risk if I don’t fix SPN’s?

Without SPN values set on the cluster machine accounts, Kerberos authentication will fail, but many Windows clients will fall back on NTLM authentication automatically (NTLM fallback can be disabled in the domain for higher level of security).  NTLM is a legacy authentication protocol and considered less secure than Kerberos.

The Eyeglass solution aims at removing manual steps wherever possible.  All prerequisites ensure manual steps are not required during a DR event.  The Eyeglass solution also has a goal to remove dependencies between groups within IT, to reduce the potential for communication issues impacting DR failover.    

SPN Delegation is a one time setup setup, that achieves simplified DR automation, and is required to use Eyeglass Access Zone failover feature.  SPNs related to Source Cluster Zone  and SmartConnect Zone AD providers will be deleted to avoid SPN collision in AD.  During normal operating conditions, Eyeglass will audit SPN’s on source and destination clusters to insure they are correct and will remediate prior to any failover.

Delegate SPN

The steps outlined in  “How to - Delegation of Cluster Machine Accounts with Active Directory” are required for each cluster machine account, for each AD provider that is added to each cluster.   Example four different AD providers for different domains will require four delegations to be created, as per below, to each machine account name.  Typically the cluster name is used when the cluster joins an active directory.  

This one time setup avoids the requirement for failover operations to require ADS administrative permissions to successfully failover, and have SPN source and destination cluster values managed by Eyeglass.  This reduce the risk of SPN authentication failures and ensures proper cluster self management of SPN fails required for proper Kerberos authentication for SmartConnect zones and aliases.

Note: Superna Eyeglass only manages SPN related to HOST.  SPNs related to HDFS or NFS are not updated and will need to be manually repaired post failover.




Configure Eyeglass Subnet IP Pool Mapping Hints

This section covers why they are configured and how to configure mappings between IP pools for failover.   IP pools serve data from SmartConnect names,  the IP pool used to serve the name on failover is predetermined using ip pool mapping hints.

Zone Aliases for Failover Overview  

Access Zone failover depends on dual DNS delegation to ensure no steps are required in DNS during a failover.  The target cluster Subnet IP Pools require a SmartConnect Zone name set in the Onefs UI (must be in the UI and not an alias).

This SmartConnect name is not mounted or accessed on the DR cluster but it is required to configure the 2nd name server  record in DNS to be setup.

The diagram below explains how we recommend SmartConnect Zone  names to be entered to simplify the failover of a SmartConnect Zone name.

We recommend entering the source IP pool SmartConnect Zone name entered into the OneFS UI on the target mapped IP pool by applying a prefix of “igls-original-<source cluster SmartConnect zone name>

NOTE: The target ip pool MUST have a SmartConnect value. The recommended value is as shown above or dual delegation responses will not function as expected.  Blank or no SmartConnect name is NOT supported (no validation in DR Dashboard for checking if target cluster SmartConnect name is set correctly).  

This simplifies failover visually in onefs UI and will rename on failover without an extra alias being created for failover purposes.  It swaps the name from one side to the other during failover.



When and how to NOT failover an IP pool SmartConnect name using hint: igls-ignore

The hint applied to an IP pool tells Eyeglass to not process this SmartConnect name and Aliases found on this IP pool.   

Syntax of Igls-ignore:

This mapping hint can also be made unique using igls-ignore-xxxx where -xxx can be unique value to self document the purpose of the ignore. Examples below.

  • Igls-ignore-repl  (documents ignore on smartconnect ip pool used for SyncIQ)

  • Igls-ignore-dfsprod (documents prod hint for DFS pool used for DFS clients)

  • Igls-ignore-mgmtclst1 (documents management FQDN pool used to manage the cluster)

When to apply ignore hints:

  • For SyncIQ IP pools that are used for target host

  • For DFS IP pools so that no DNS updates are done for DFS target folders, also avoids SPN updates needed

  • For IP used for cluster management, and when Clusters are added to Eyeglass with this FQDN, it's required to apply an ignore hint so that Eyeglass will not lose access to the cluster during failover or failback.  This is also validated in the Zone Readiness screen and checked to make sure the FQDN used for cluster add has an igls-ignore hint applied.  This is a blocking condition for failover

Screen Shot 2016-08-20 at 8.39.10 AM.png




How to create Mapping Hints for IP pools between source and target clusters following best practise naming convention

Subnet:Pool Failover mapping between the Source and Target Cluster is done to ensure that data is accessed from the correct SmartConnect Zone IP and node pool on the Target cluster.  Mapping of IP address Pools should be completed after installation and will be audited by  Eyeglass as part of the Failover Readiness validation.  This is done using a SmartConnect Zone alias.

The hint alias on the IP Pool is of the form igls-xxx, where “xxx” can be any string.   We recommend numbers to keep it simple. Example; igls-01-pool-name-prod   is self explanatory name and the DR mapping would be igls-01-pool-name-dr.    

NOTE: We also recommend this syntax to avoid SPN collision. Eyeglass does not inject these hints but an admin could run ISI commands and inject them, no harm if they are present in AD computer account.   for example “igls-01-prod” and “igls-01-DR”  are matching hints since only “igls-xx” needs to match.  Therefore the syntax form of “igls-xx-some-unique-string” with the  trailing string “some-unique-string” allow them to be made unique and still match

Eyeglass requires that the user decide which network pools are partnered during failover.  Create the identical alias hint on the source network pools and their target network pools.

  • A hint is a pre-fixed zone alias that instructs Eyeglass which source cluster network pool  should failover to a specific target SmartConnect subnet pool.  Eyeglass will detect when hints are missing and raises an alarm to correct it.

  • An ignore hint is used to identify the SmartConnect Zone(s) used for SyncIQ replication.  It is Isilon best practice to have a dedicated SmartConnect Zone for this purpose, and avoid using this zone name to mount data with clients.   During failover there is no need to failover the SmartConnect Zone used for SyncIQ.   Eyeglass needs to know which zone name should be ignored during failover and readiness job assessment of Access Zones and SmartConnect Zones.

To add the mapping alias to the Isilon cluster, ssh to the cluster and login as root,  execute the following command: (note subnet and pool names are case sensitive)

  • get list of pools “isi network list pools -v”

  • isi network modify pool --name=<subnet:poolname> --add-zone-alias=<hint>

In the example below, we will execute the command on the source and target cluster to map pools to each other:

OneFS 7 Example igls hints for user data

  • Prod Cluster  isi network modify pool --name=subnet0:exampleProd --add-zone-aliases igls-01-prod

  • DR Cluster  isi network modify pool --name=subnet0:exampleDR --add-zone-aliases igls-01-dr

OneFS 8 Example igls hints for user data

Data Pool mapping example - Prod

Prod cluster data access IGLS hint applied to an Access Zone IP pool example. NOTE: hint used to match is igls-marketing-marketingprod where marketingprod is used to identify the cluster the hint is applied.  The marketingprod is not used to match the pools

Screen Shot 2017-05-28 at 7.56.00 AM.png


Data Pool mapping example - DR

DR cluster data access IGLS hint applied to an Access Zone IP pool example. NOTE: hint used to match is igls-marketing-marketingdr where marketingdr is used to identify the cluster the hint is applied.  The marketingdr is not used to match the pools


Screen Shot 2017-05-28 at 7.59.25 AM.png

IGLS examples for ignore option on IP Pools

In the example below, we will execute the command on the source and target cluster to pools that will be ignored for failover and are dedicated to SyncIQ replication or for DFS dedicated IP pools in the Access Zone :

OneFS 7 Example igls hints for ignore SyncIQ pool


  • Prod Cluster  isi network modify pool --name=subnet0:siqProd --add-zone-aliases igls-ignore

  • DR Cluster  isi network modify pool --name=subnet0:siqDR --add-zone-aliases igls-ignore

Example: Mapping with hints configured correctly will display the mapping


OneFS 8 Example igls hints for ignore SyncIQ pool







SyncIQ Ignore hint replication Pool

This Pool is used by SyncIQ for replication (restrict source and or target host for replication.   The igls-ignore- is used to ignore the pool the prod8 makes the hint unique and identifies the cluster the hint is applied using the cluster name prod.

Screen Shot 2017-05-28 at 8.08.04 AM.png


DFS Ignore example -  Prod

This is another ignore hint used for a prod cluster pool that protects DFS mounted data in the access zone.   This IP pool and its smartconnect names should not failover and uses an ignore hint igls-ignore-dfs01 where igls-ignore- is used to match and dfs01 is to make the hint unique.


Screen Shot 2017-05-28 at 8.12.00 AM.png

DFS Ignore example - DR

This is another ignore hint used for a DR cluster pool that protects DFS mounted data in the access zone.   This IP pool and its smartconnect names should not failover and uses an ignore hint igls-ignore-dfs02 where igls-ignore- is used to match and dfs02 is to make the hint unique.

Screen Shot 2017-05-28 at 8.21.50 AM.png



Hot-Cold Replication Topology Mapping Examples

Example: Single Access Zone, Single IP Pool for Access, Single IP Pool for SyncIQ

Before Mapping










After Mapping

Example: Multiple Access Zone, Multiple IP Pool for Access, Single IP Pool for SyncIQ


Before Mapping

After Mapping

Example: Single Access Zone containing DFS, Single IP Pool for Access, Single IP Pool for SyncIQ


Before Mapping









After Mapping


Isilon Administration for Clusters Configured for Eyeglass Assisted Access Zone Failover

SPN Management

With SPN Delegation configured as required in preparation for Eyeglass Assisted Access Zone Failover, Eyeglass will create SPNs related to SmartConnect Zones and Alias detected.  No manual SPN management is required.  

IMPORTANT:

Eyeglass does not create SPNs for any SmartConnect Zone or SmartConnect Zone Alias that are prefixed with igls.  SPN check from Clusters configured with Eyeglass mapping hints will indicate that there are missing SPN’s for these SmartConnect Zones and Aliases .  This is expected as these SmartConnect Zones and Aliases are not used for Cluster access.  DO NOT EXECUTE SPN REPAiR  as it will fail if executed on both clusters because of conflict created by having identical mapping hints on clusters.

IMPORTANT:

Eyeglass does not remove “extra” SPN’s that do not correspond to detected SmartConnect Zones and Aliases.  This must be done manually if required for these SPN to be removed.

Post Failover Automation

Many failover scenarios depend on extra steps performed on devices, software, and infrastructure external to the NAS cluster.  Using the Eyeglass script engine, these tasks can now be automated with output captured and appended to Eyeglass failover logs.  For example:

  • DNS updates post failover for SmartConnect zone CNAME editing

  • NFS host mount and remount automation

  • DNS server cache flushing

  • Application bring up and down logic to start applications post failover

  • Send alerts or emails

  • Run API commands on 3rd party equipment example load balancer, switch, router or firewall

Please refer to the Eyeglass Admin Guide Script Engine Overview for more details.

Failover Planning and Checklist

Failover planning includes extended preparation beyond storage layer failover steps. A full Failover Plan will also take into account  clients, application owners and any dependent systems such as DNS and Active Directory.    The following link is a Failover Planning Checklist to help you develop your own Failover plan ( Failover Planning and Checklist ).

Monitoring DR Readiness for Eyeglass Assisted Failover

In addition to the Assisted Failover functionality, Eyeglass also provides the following features to monitor your Access Zone DR Readiness:

  • Access Zone DR Readiness Validation

  • Runbook Robot

Access Zone DR Readiness Validation

The DR Dashboard Zone Readiness tab provides a per Access Zone summary of all the key networking, Kerberos SPN, SmartConnect connect subnet\pool information along with SyncIQ status and Configuration replication validations performed to  assess readiness for failover by Access Zone.  The status for each are combined to provide an overall DR Status.  The Zone Readiness is updated every 15 minutes by default (See  "igls cli commands" in the Eyeglass Isilon Edition Administrative Guide to change this schedule) .

This information provides the best indicator of DR readiness for failover and allows administrators to check status on each component of failover, identify status, errors and correct them, in order to get each access zone configured and ready for failover.

By default the Failover Readiness job which populates this information is disabled.  Instructions to enable this Job can be found here.  Under Managing Eyeglass Jobs


If all of the Access Zone Requirements and Recommendations pass validation, the DR Dashboard status for the Access Zone is green indicating that the Access Zone is safe to failover.   

If any of the Access Zone Requirements do NOT pass validation, the DR Dashboard status for the Access Zone is red indicating that the Access Zone is NOT ready to failover. In this state the DR Assistant will block you from starting the failover.  Eyeglass will also issue a System Alarm for any of these conditions.

If any of the Access Zone Recommendations do NOT pass validation, the DR Dashboard status for the Access Zone is orange (Warning) indicating that the Access Zone can be failed over but there may be some additional manual steps required to complete the failover. In this state the DR Assistant will allow you to start the failover.  Eyeglass will also issue a System Alarm for any of these conditions.

Additional information for Zone Readiness can be found in the Eyeglass Admin guide here.

IMPORTANT:

If you make a change to your environment, the following Eyeglass tasks must run before the Zone Readiness will be updated:

  • Configuration Replication

  • Failover Readiness

IMPORTANT:

Readiness is NOT assessed for the Access Zone in the Failed Over state.   This means the DR Dashboard Readiness provides a status, or Readiness, from the current active cluster to the DR target cluster ONLY.  The reverse direction “Fail back” status is not assessed until failover to the target cluster.

Screen Shot 2015-12-23 at 2.15.27 PM.png


Runbook Robot (Automate DR Testing on a schedule)

Overview

Many organizations schedule DR tests during maintenance windows and weekends, only to find out that the DR procedures did not work, or documentation needed to be updated.  The Eyeglass Run Book Robot feature automates DR run book procedures that would normally be scheduled in off peak hours, and avoids down time to validate DR procedures, providing Failover and Failback automation tests with reporting.

This level of automation provides a high level of confidence that your Isilon storage is ready for failover with all of the key functions executed on a daily basis.   In addition to automating failover and failback, Eyeglass operates as a cluster witness. Eyeglass uses access zone mount paths to mount storage on both source and destination clusters the same way the cluster users and machines mount storage externally.

Run Book Robot Failover Coverage

The following validations are all performed on a daily basis,  and the DR dashboard updated along with any failures sent as critical events. This is the best indicator that your cluster is ready for a failover.

  • API access to both clusters is functioning - Validated

  • API access allows creation of export, share, quota - Validated

  • NFS mount of data external to the cluster functions - Validated

  • DNS resolution for SmartConnect is checked when Eyeglass configures itself to use SmartConnect service IP as its DNS resolver on the source, in order to verify SmartConnect zone functionality on mount of data requests - Validated

  • SyncIQ policy replication completes between source and destination cluster when data is written to the source - Validated

  • Configuration replication of test configuration from source to destination - Validated

  • SyncIQ failover to target cluster - Validated

  • Test data access on target cluster post failover - Validated

  • Verify data integrity of the test data on target cluster - Validated

  • Configuration Sync of quotas from source to target on failover - Validated

  • Delete Quotas on source cluster - Validated

  • SyncIQ Failback from target to source cluster  - Validated


Refer to the Eyeglass RunBookRobot Admin Guide for instructions on setting up and running the Runbook Robot.

Operational Steps for Eyeglass Assisted Access Zone Failover

Access Zone Workflow Steps  Overview


For detailed steps consult the failover guide table here.


Access Zone Execution Steps


For detailed steps on execution and monitoring consult the Failover Design Guide.


Post Access Zone Failover Manual Steps

Test Dual Delegation

Update DNS is not required with dual delegation.   If NSLOOKUP testing verifies the  response is from target cluster, then no extra steps are required.  If not check to insure DNS configuration is correctly using target cluster SSIP in the delegation records in DNS.

See the following reference for details on Dual Delegation: Geographic Highly Available Storage solution with Eyeglass Access Zone Failover and Dual Delegation

DNS Smartconnect name tests

Verify that DNS Updates were completed correctly:

  1. SSH admin@eyeglass ip address

  2. nslookup [enter]

  3. server x.x.x.x [enter] (ip of subnet service ip on the TARGET cluster)

  4. somesmartconnect.zone.name from the SOURCE cluster that was failed over [enter]

  5. expected response should return an IP address from the TARGET clusters ip pool that was mapped

  6. If expected output is from target cluster IP pool then failover of SmartConnect delegation to the TARGET cluster is correct

  7. Now repeat the server x.x.x.x command using production DNS server that has modified CNAME

  8. Now repeat above tests using production DNS client ip address example “server y.y.y.y” (where is ip address of updated SOA primary DNS server where the delegation record was changed)

  9. Verify the output returns ip address from TARGET cluster ip pool that was mapped


Check for SPN Errors

An Access Zone Failover with status of SUCCESS may still have SPN deletions which could create errors.  After the Access Zone Failover, check the Failover Logs for SPN errors and check for SPN Alarms (a single SPN error will result in multiple SPN alarms).

Example Logs for SPN Error:

2015-11-28              20:44:46::266          ERROR     ds-72-87-51            step "Delete SPN OTT-az1-example.ad1.test for: DS-72-87-50$" result : FAILURE:      {"AD1.TEST": [{ "Error": "Yes", "Service Principal Name": "'HOST/OTT-az1-example.ad1.test' doesn't exist on DS-72-87-50$.", "Domain": "AD1.TEST"}]};{"AD1.TEST": [{ "Error": "Yes", "Service Principal Name": "'HOST/OTT-az1-example' doesn't exist on DS-72-87-50$.", "Domain": "AD1.TEST"}]};

2015-11-28              20:44:51::887          ERROR     ds-72-87-51            step "Create SPN OTT-az1-example.ad1.test" result : FAILURE:     {"AD1.TEST": [{ "LdapError": "LdapError Failed to modify attribute 'servicePrincipalName' [19Constraint violation]", "Error": "Yes", "Domain": "AD1.TEST"}]};{"AD1.TEST": [{ "LdapError": "LdapError Failed to modify attribute 'servicePrincipalName' [19Constraint violation]", "Error": "Yes", "Domain": "AD1.TEST"}]};

2015-11-28              20:44:51::892          INFO Raised alarm: CRITICAL Failed to delete or repair SPNs during failover http://goo.gl/i5GR8g#TOC-SCA0037

If this error has occurred, manual delete and / or create SPN for SmartConnect Zone names using ADSIedit AD tool to perform delete and add of SmartConnect name  or Alias names.  The failover log contains all failover names that require manual failover .

The Microsoft ADSIedit tool is the simplest method to make computer account SPN changes post failover.  This tool requires Microsoft permissions to the computer account for the cluster being edited.  Consult Microsoft documentation on ADSIedit usage.

Screen Shot 2015-04-15 at 1.42.45 PM.png

Refreshing SMB connection after Failover completed

This section describes steps to refresh an SMB connection post failover and DNS update.

  1. If client was connected to the share during the failover, unmount the share (disconnect).

  2. Remount the share (connect). Check SPN’s on the target cluster machine account.

  3. Test read/write against newly mounted shares.

  4. If step 3 fails, the original connection information is likely cached on the client machine.  The data in this case would continue to be available, however it would be Read-Only.  Writes would fail. To remove the cached connection information, open a Window cmd window and type the command:

ipconfig /flushdns

     5.  Test read/write against newly mounted share again.

     6.  If step 5 fails, it is likely a DNS issue and needs to be looked at upstream DNS servers to very with nslookup.

Refreshing NFS connection after Failover completed

Please refer to SyncIQ Policy Failover Guide section for steps to refresh an NFS connection post failover and DNS update.

Link here

Post Access Zone Failover Checklist

The following sections outline what can be checked post Access Zone failover to verify execution of all of the steps.

IMPORTANT:

If the failover was done with the Controlled failover option unchecked, then some steps on the failover SOURCE cluster will not have been executed.  This is outlined in Appendix 1 in this document.


SyncIQ Policy Updates

On the failover SOURCE cluster (the cluster you failed over FROM), for the SyncIQ Policies in the Access Zone that were failed over:

  • SyncIQ Policies are Disabled in OneFS

  • Eyeglass configuration replication jobs related to these SyncIQ Policies are in Policy Disabled state

  • SyncIQ Policies in OneFS have their schedule set to manual

On the failover TARGET cluster (the cluster you failed over TO), for the SyncIQ Policies in the Access Zone that were failed over:

  • SyncIQ Policies are Enabled in OneFS

  • The corresponding Eyeglass Configuration Replication Jobs are also in Enabled state

  • SyncIQ Policies have same schedule that was originally set for the policy on the failover SOURCE cluster

NOTE: If you have Eyeglass INITIALSTATE property set to disabled for AUTO jobs (check this using the command in the Eyeglass Admin Guide here), the Eyeglass Configuration Replication job for the mirror SyncIQ Policy created during the first failover will be in User Disabled state.  This job should be enabled following the instructions in the Eyeglass Admin Guide here.

Quota Updates

After the upgrade, there should be no quotas on the failover SOURCE cluster for the SyncIQ Policies in the Access Zone that were failed over.  On the failover TARGET cluster you should find all quotas for the SyncIQ Policies in the Access Zone that were failed over.

SPN Updates

After the upgrade, there should be SPNs for all SmartConnect Zones and SmartConnect Zone Aliases related to the subnet pools associated with the Access Zone that was failed over.

Note: SPNs are not created for SmartConnect Zones or SmartConnect Zone Aliases that are prefixed with “igls”.

Note: SPNs are not created for HDFS or NFS.  These will need to be repaired manually.

IMPORTANT:

Due to an Isilon issue, it may occur that executing the SPN repair step results in an error (both for execution by Eyeglass and from the Isilon command line directly).  In this case the SPNs will have to be repaired manually after which the SPN repair command will resume as expected.

Use the ADSIedit tool to verify the machine account SPN’s

Screen Shot 2015-04-15 at 1.42.45 PM.png

SmartConnect Zone Updates

The following changes can be checked post failover for the SmartConnect Zones and aliases related to the subnet pools associated with the Access Zone that was failed over:

  1. Eyeglass creates SmartConnect Zone alias on failover TARGET cluster with the same name as SmartConnect Zone on the failover SOURCE cluster partner IP Pool

  2. Eyeglass updates failover SOURCE cluster SmartConnect Zone name with the prefix “igls-original”

  3. Alias for failover TARGET SmartConnect Zone is removed from failover SOURCE cluster

  4. After Failover completed, DNS Admin or post failover scripting updates DNS entry for the SmartConnect Zone name to use the SmartConnect Service IP address from CLUSTER 2.

Example: SmartConnect Zone Update

Initial Mapping Setup:

FAILOVER from CLUSTER 1 to CLUSTER 2


Cluster 1

Cluster 2

subnet1:Prod

Eyeglass renames SmartConnect Zone prod.example.com to igls-original-prod.example.com


subnet1:DR


Eyeglass creates SmartConnect Zone alias prod.example.com

subnet0:synciq-prod

no changes










FAILOVER AGAIN - CLUSTER 2 to CLUSTER 1

Subnet and Pool

Cluster 1

Cluster 2

subnet1:Prod

Eyeglass renames SmartConnect Zone igls-original-prod.example.com to prod.example.com


Eyeglass creates SmartConnect Zone alias dr.example.com


subnet1:DR


Eyeglass renames SmartConnect Zone dr.example.com to igls-original-dr.example.com


Eyeglass removes SmartConnect Zone alias prod.example.com created by previous failover

subnet0:synciq-prod

no changes












IP Pool Failover

This feature is available in Eyeglass version  2.0

IP Pool Failover

Prerequisites

The IP Pool Failover is available in Eyeglass release 2.0. It requires non-overlap multi IP Pools to be configured on the same access zone..

Configuration Diagram

The following diagram illustrates the basic configuration for the case of 1 Source cluster and 1 Target cluster.

For this IP Pool failover setup example, dual non-overlap IP Pools are configured on the source cluster and on the target cluster. All IP Pools are assigned to the System access zone.

SMB share

  • s-smb01 is configured on Source cluster

SMB shares are managed through Microsoft DFS server.

NFS export:

  • /ifs/data/s-nfs01 export is configured on Source cluster


SyncIQ Policies:

Configured  the following  SyncIQ Policies:


SyncIQ Policy

Data

From

To

s01-t01-synciq01

SMB

Source

Target

s01-t01-synciq02

NFS

Source

Target


Policy - Pool Assignment

The following table illustrates the SyncIQ Policies to Pools  assignments

From

To


Policy


Data

Cluster

Pool

Cluster

Pool

Source

Pool 1

Target

Pool 1

s01-t01-synciq01

SMB (DFS)

Source

Pool 2

Target

Pool 2

s01-t01-synciq02

NFS


Policy - Pool Mapping Diagram

The following diagram shows the IP Pools and Policies mapping setup between Source and Target clusters

The igls mapping hints including igls-ignore  also need to be configured  for this dual IP Pool setup.

Based on that diagram, the following is the example of the igls and igls-ignore hint mappings:


From

To

Cluster

Pool

SmartConnect Name / Alias (igls hints)

Cluster

Pool

SmartConnect Name / Alias (igls hints)

Source

Pool 1

cluster20-s01.ad1.test

Target

Pool 1

cluster21-s01.ad1.test

igls-ignore-c20s01

igls-ignore-c21s01

Source

Pool 2

cluster20-s02.ad1.test

Target

Pool 2

cluster21-s02.ad1.test

igls-s02-c20s02

igls-s02-c21s02

As also can be seen in the following diagram:

Configuration Steps:

  1. Configure the required  igls and igls-ignore mapping hints.

  2. Verify the DR Dashboard - Access Zone Failover Readiness that Network Mappings have been configured correctly.

  3. Configure the Advanced Network Mapping to map the Policies to the Pools as specified in the Policies - Pool Mapping table. To assign the policy to the pool, click a Pool under the Pool Name column (Smartconnect/IP Pools section) and then drag and drop the correct policy from the Available Policies section (under the Policy Name column) to each pool. Click Save button to save the modification. Note: please ensure the policies are mapped to the correct pools.

Example:

Once assigned we can see the following mappings:


The IP Pool that has the igls-ignore alias is listed in the SmartConnect/IP Pools section with Failover Status: NOT APPLICABLE.




  1. Verify  that in DR Dashboard the DR Failover Status and SmartConnect/IP Pool Failover Readiness Status are showing no error.

Example:



  1. Pool or Access Zone Failover:

    1. To perform IP Pool failover, in the DR Assistant Failover Wizard select failover type as Access Zone Failover, and then select a specific IP pool to be failed over (with no NOT APPLICABLE status).

Example:

    1. To failover  the entire Access Zone, select the Access Zone name instead of the IP Pool from the Access Zone column.

Example:

  1. For the case of DFS or Policy Failover, they can be performed as per normal with DR Assistant Wizard.



Example of IP Pool Failover

The following examples illustrate the changes of the SmartConnect zone names and alias names  for IP Pool failover in the following sequences:

  1. Pool Failover Source (Pool 2) to Target (Pool 2)

  2. Pool Failback Target (Pool 2) to Source (Pool 2)

Pool Failover Source (Pool2) ⇒ Target (Pool2)

The following table and diagram illustrate the SmartConnect zone names and alias names after performing Failover from Source (Pool 2) to Target  (Pool 2).

From

To

Cluster

Pool

SmartConnect Name / Alias (igls hints)

Cluster

Pool

SmartConnect Name / Alias (igls hints)

Source

Pool 1

cluster20.ad1.test

Target

Pool 1

cluster21.ad1.test

igls-ignore-c20s01

igls-ignore-c21s01

Source

Pool 2

igls-original-cluster20-s02.ad1.test

Target

Pool 2

cluster21-s02.ad1.test

igls-s02-c20s02

Igls-s02-c21s02

cluster20-s02.ad1.test


Pool Failback  Target  (Pool2) ⇒ Source (Pool2)

The following table and diagram illustrate the SmartConnect zone names and alias names after performing Failback from Target (Pool 2) to Source  (Pool 2).

From

To

Cluster

Pool

SmartConnect Name / Alias (igls hints)

Cluster

Pool

SmartConnect Name / Alias (igls hints)

Target

Pool 1

cluster21.ad1.test

Source

Pool 1

cluster20.ad1.test

igls-ignore-c21s01

igls-ignore-c20s01

Target

Pool 2

igls-original-cluster21-s02.ad1.test

Source

Pool 2

cluster20-s02.ad1.test

Igls-s02-c21s02


Igls-s02-c20s02

cluster21-s02.ad1.test



Fan-In IP Pool Failover

Fan-In configuration Diagram

The following diagram gives an example of Fan-In configuration topology

This configuration consists of 3 clusters,   2 of them are the source clusters  and third  one is  the target cluster. For this example:

  • Cluster20 is the Source01 Cluster

  • Cluster31 is the Source02 Cluster

  • Cluster21 is the Target Cluster


For the IP Pool failover setup, dual non-overlap IP Pools are configured on the source clusters and four non-overlap IP Pools are configured on the target cluster. All IP Pools are assigned to the System access zone.

SMB shares

  • source01-smb01 is configured on Source01 cluster

  • source02-smb01 is configured on Source02 cluster

SMB shares are managed through Microsoft DFS server.

NFS exports:

  • /ifs/data/source01-nfs01 export is configured on Source01 cluster

  • /ifs/data/source02-nfs01 export is configured on Source02 cluster

SyncIQ Policies:

Configured  the following  SyncIQ Policies:


SyncIQ Policy

Data

From

To

s01-t01-synciq01

SMB

Source01

Target

s01-t01-synciq02

NFS

Source01

Target

s02-t01-synciq03

SMB

Source02

Target

s02-t01-synciq04

NFS

Source02

Target


SyncIQ Policy to Pool Assignments

The following table illustrates the SyncIQ Policies to Pools assignments

From

To


Policy


Data

Cluster

Pool

Cluster

Pool

Source01

Pool 1

Target

Pool 1

s01-t01-synciq01

SMB (DFS)

Source01

Pool 2

Target

Pool 2

s01-t01-synciq02

NFS

Source02

Pool 1

Target

Pool 3

s02-t01-synciq03

SMB (DFS)

Source02

Pool 2

Target

Pool 4

s02-t01-synciq04

NFS



SyncIQ Policies - Pools Mapping Diagram

The following diagram shows the SyncIQ Policies and Pools  mapping setup


The igls mapping hints including igls-ignore  also need to be configured  for this dual IP Pool setup.

Based on the above diagram, the following is the example of the igls and igls-ignore mapping hints:


From

To

Cluster

Pool

SmartConnect Name / Alias (igls hints)

Cluster

Pool

SmartConnect Name / Alias (igls hints)

Source01

Pool 1

cluster20.ad1.test

Target

Pool 1

cluster21.ad1.test

igls-ignore-c20s01

igls-ignore-c21s01

Source01

Pool 2

cluster20-s02.ad1.test

Target

Pool 2

cluster21-s02.ad1.test

igls-s02-c20s02

igls-s02-c21s02

Source02

Pool 1

cluster31.ad1.test

Target

Pool 3

cluster21-s03.ad1.test

igls-ignore-c31s01

igls-ignore-c21s03

Source02

Pool 2

cluster31-s02.ad1.test

Target

Pool 4

cluster21-s04.ad1.test

igls-s04-c31s02

igls-s04-c21s04


The mapping also can be seen in the following diagram:


Configuration Steps:

  1. Configure the required  igls and igls-ignore mapping hints.

  2. Verify from Access Zone Failover Readiness that Network Mappings have been configured correctly.

  3. Configure the Advanced Network Mapping to map the Policies to the Pools as specified in the Policies - Pool Mapping table. To assign the policy to the pool, click a Pool under the Pool Name column (Smartconnect/IP Pools section) and then drag and drop the correct policy from the Available Policies section (under the Policy Name column) to each pool. Click Save button to save the modification. Note: please ensure the policies are mapped to the correct pools.

Example:


  1. Verify  that the DR Failover Status and SmartConnect/IP Pool Failover Readiness Status are showing no error.


Example:


  1. Pool or Access Zone Failover:

    1. To perform IP Pool failover select failover type as Access Zone Failover in the DR Assistant Failover Wizard, and then select a specific IP pool to be failed over (with no NOT APPLICABLE status).

Example:

    1. To failover  the entire Access Zone, select the Access Zone name instead of the IP Pool

Example:

  1. For the case of DFS or Policy Failover, they can be performed as per normal with DR Assistant Wizard.




Example of Pool Failover

The following examples illustrate the changes of the SmartConnect zone names and alias names  for IP Pool failover in the following sequences:

  1. Pool Failover Source01 (Pool 2) to Target (Pool 2)

  2. Pool Failover Source02 (Pool 2) to Target (Pool 4)

  3. Pool Failback Target (Pool 2) to Source01 (Pool 2)

  4. Pool Failback Target (Pool 4) to Source02 (Pool 2)

Pool Failover Source01 (Pool 2) ⇒ Target (Pool 2)

The following table and diagram illustrate the SmartConnect zone names and alias names after performing Failover from Source01 (Pool 2) to Target  (Pool 2).

From

To

Cluster

Pool

SmartConnect Name / Alias (igls hints)

Cluster

Pool

SmartConnect Name / Alias (igls hints)

Source01

Pool 1

cluster20.ad1.test

Target

Pool 1

cluster21.ad1.test

igls-ignore-c20s01

igls-ignore-c21s01

Source01

Pool 2

igls-original-cluster20-s02.ad1.test

Target

Pool 2

cluster21-s02.ad1.test

igls-s02-c20s02

Igls-s02-c21s02

cluster20-s02.ad1.test

Source02

Pool 1

cluster31.ad1.test

Target

Pool 3

cluster21-s03.ad1.test

igls-ignore-c31s01

igls-ignore-c21s03

Source02

Pool 2

cluster31-s02.ad1.test

Target

Pool 4

cluster21-s04.ad1.test

igls-s04-c31s02

igls-s04-c21s04


Pool Failover Source02 (Pool 2) ⇒ Target (Pool 4)

The following table and diagram illustrate the SmartConnect zone names and alias names after performing Failover from Source02 (Pool 2) to Target  (Pool 4).

From

To

Cluster

Pool

SmartConnect Name / Alias (igls hints)

Cluster

Pool

SmartConnect Name / Alias (igls hints)

Source01

Pool 1

cluster20.ad1.test

Target

Pool 1

cluster21.ad1.test

igls-ignore-c20s01

igls-ignore-c21s01

Source01

Pool 2

igls-original-cluster20-s02.ad1.test

Target

Pool 2

cluster21-s02.ad1.test

igls-s02-c20s02

Igls-s02-c21s02

cluster20-s02.ad1.test

Source02

Pool 1

cluster31.ad1.test

Target

Pool 3

cluster21-s03.ad1.test

igls-ignore-c31s01

igls-ignore-c21s03

Source02

Pool 2

igls-original-cluster31-s02.ad1.test

Target

Pool 4

cluster21-s04.ad1.test

igls-s04-c31s02

Igls-s04-c21s04

cluster31-s02.ad1.test


Pool Failback  Target  (Pool 2) ⇒ Source01 (Pool 2)

The following table and diagram illustrate the SmartConnect zone names and alias names after performing Failback from Target (Pool 2) to Source01 (Pool 2).


From

To

Cluster

Pool

SmartConnect Name / Alias (igls hints)

Cluster

Pool

SmartConnect Name / Alias (igls hints)

Target

Pool 1

cluster21.ad1.test

Source01

Pool 1

cluster20.ad1.test

igls-ignore-c21s01

igls-ignore-c20s01

Target

Pool 2

igls-original-cluster21-s02.ad1.test

Source01

Pool 2

cluster20-s02.ad1.test

Igls-s02-c21s02

Igls-s02-c20s02

cluster21-s02.ad1.test

Target

Pool 3

cluster21-s03.ad1.test

Source02

Pool 1

cluster31.ad1.test

igls-ignore-c21s03

igls-ignore-c31s01

Target

Pool 4

cluster21-s04.ad1.test

Source02

Pool 2

igls-original-cluster31-s02.ad1.test

Igls-s04-c21s04

cluster31-s02.ad1.test

igls-s04-c31s02



Pool Failback  Target  (Pool 4) ⇒ Source02 (Pool 2)

The following table and diagram illustrate the SmartConnect zone names and alias names after performing Failback from Target (Pool 4) to Source02 (Pool 2).


From

To

Cluster

Pool

SmartConnect Name / Alias (igls hints)

Cluster

Pool

SmartConnect Name / Alias (igls hints)

Target

Pool 1

cluster21.ad1.test

Source01

Pool 1

cluster20.ad1.test

igls-ignore-c21s01

igls-ignore-c20s01

Target

Pool 2

igls-original-cluster21-s02.ad1.test

Source01

Pool 2

cluster20-s02.ad1.test

Igls-s02-c21s02

Igls-s02-c20s02

cluster21-s02.ad1.test

Target

Pool 3

cluster21-s03.ad1.test

Source02

Pool 1

cluster31.ad1.test

igls-ignore-c21s03

igls-ignore-c31s01

Target

Pool 4

igls-original-cluster21-s04.ad1.test

Source02

Pool 2

cluster31-s02.ad1.test

Igls-s04-c21s04

Igls-s04-c31s02

cluster21-s04.ad1.test





APPENDIX 1 - Controlled Failover Option Results Summary

The following table indicates which failover steps are executed based on whether or not the Controlled failover option was selected when the Access Zone failover was initiated.


Steps

Description

Executed on

Access Zone

Controlled Failover selected

Controlled Failover NOT selected

1 - Ensure that there is no live access to data

Check for open files.

If Open files found, decide whether to failover or wait to be closed.

Source

Manual

Not applicable - manual step

Not applicable - manual step

2 - Begin Failover

Initiate Failover from Eyeglass

Eyeglass

Manual

Not applicable - manual step

Not applicable - manual step

3 - Validation

Wait for other Eyeglass Failover jobs to complete

Eyeglass

Automated by Eyeglass

Step Executed

Step Executed

4 - Synchronize data

Run all OneFS SyncIQ policy jobs related to the Access Zone being failed over

Source

Automated by Eyeglass (all policies in the Access Zone)

Step Executed

Step NOT Executed

5 - Synchronize configuration (shares/export/alias)

Run Eyeglass configuration replication 1

Eyeglass

Automated by Eyeglass (based on matching Access Zone base path)

Step Executed

Step Executed based on last known data in Eyeglass


*If you do not want this, uncheck the “Config Sync” option to skip this step

6 - Synchronize quota(s)

Run Eyeglass Quota Jobs related to the SyncIQ Policy or Access Zone being failed over

Eyeglass

Automated by Eyeglass (based on matching Access Zone base path)

Step Executed

Step Executed based on last known data in Eyeglass

7 - Record schedule for SyncIQ policies being failed over

Get schedule associated with the SyncIQ policies being failed over on OneFS

Source

Automated by Eyeglass

Step Executed

Step NOT Executed

8 - Prevent SyncIQ policies being failed over from running

Set schedule on the SyncIQ policy(s)  to manual  on source cluster

Source

Automated by Eyeglass

Step Executed

Step NOT Executed

9 - Provide write access to data on target

Allow writes to SyncIQ policy(s) related to failover2

Target

Automated by Eyeglass (only for policies that match the Access Zone Base path)

Step Executed

Step Executed

10 - Disable SyncIQ on source and make active on target

Resync prep SyncIQ policy related to failover (Creates MirrorPolicy on target) from OneFS

Source

Automated by Eyeglass

Step Executed

Step NOT Executed

11 - Set proper SyncIQ schedule on target

Set schedule on MirrorPolicy(Target) using schedule from step 6 from OneFS for policy(s) related to the Failover

Target

Automated by Eyeglass

Step Executed

Step NOT Executed

12 - Remove quotas on directories that are target of SyncIQ (Isilon best practice)

Delete all quotas on the source for all the policies

Source

Automated by Eyeglass

Step Executed

Step NOT Executed

13 - Change SmartConnect Zone on Source so not to resolve by Clients

Rename SmartConnect Zones and Aliases (Source)

Source

Automated by Eyeglass (Requires IP pool hints are configured See docs)

Step Executed

Step NOT Executed

14 - Avoid SPN Collision

Sync SPNs in all AD providers to current SmartConnect zone names and aliases (Source)

Source

Automated by Eyeglass (AD delegation must be completed as per install docs)

Step Executed

Step Executed

15 - Move SmartConnect zone to Target

Add source SmartConnect zone(s) as  Aliase(s) on  (Target)

Target

Automated by Eyeglass (Requires IP pool hints are configured See docs)

Step Executed

Step Executed

16 - Update SPN to allow for authentication against target

Sync SPNs in all AD providers to current SmartConnect zone names and aliases (Target)

Target

Automated by Eyeglass (Requires IP pool hints are configured See docs)

Step Executed

Step Executed

17 - Repoint DNS to the Target cluster IP address

Update DNS delegations for all SmartConnect Zones that are members of the Access Zone

DNS

Potentially Automated by Eyeglass (See Script Engine Docs) Example Infoblox Guide

Not applicable - manual or scripted step

Not applicable - manual or scripted step

18 - Refresh session to pick up DNS change

Remount the SMB share(s)

SMB Client Machines

Manual on clients (NOTE: DNS servers and clients cache DNS entries which will require touching the client or intermediate DNS servers to clear DNS caches (on Windows ipconfig /flushdns)

Not applicable - manual step

Not applicable - manual step

  1. Initiates Eyeglass Configuration Replication task for all Eyeglass jobs

  2. SyncIQ does NOT modify the ACL (Access control settings on the file system).  It locks the file system.   ls -l   will be identically on both source and target