Multi Site Failover Guide

Multi Site Failover Guide

Contents

  1. 1 Overview
    1. 1.1 Video How to - Overview Multi site Access Zone Failover
    2. 1.2 Overview of Multi Site DR and Continuous Availability
    3. 1.3 Overview of Multi Site Access Zone DR and Continuous Availability
      1. 1.3.1 Pre-requisites
    4. 1.4 Supported Access Zone Failover Operations
  2. 2 Logical Diagram of Multi Site Failover
    1. 2.1 Initial Configuration / Before Failover Diagram
    2. 2.2 Failover A ⇒ B
    3. 2.3 Failback B ⇒ A
    4. 2.4 Failover A ⇒ C
    5. 2.5 Failback C ⇒ A
    6. 2.6 Eyeglass Access Zone Failover Steps
    7. 2.7 Eyeglass Access Zone Failback Steps
  3. 3 Access Zone Failover - SyncIQ Configuration for 3 site
    1. 3.1 SyncIQ Policy multi Site support Matrix
  4. 4 Multi Site Failover and Failback Behaviours
  5. 5 Zone Readiness
    1. 5.1 Zone Readiness - Initial Configuration / before Failover / after Failback
      1. 5.1.1 Network Mapping - Initial Configuration: A ⇒ B Zone01
      2. 5.1.2 Network Mapping - Initial Configuration: A ⇒ B Zone03
      3. 5.1.3 Network Mapping - Initial Configuration: A ⇒ C Zone01
      4. 5.1.4 Network Mapping - Initial Configuration: A ⇒ C Zone03
    2. 5.2 Before Failback B ⇒ A
      1. 5.2.1 Zone Readiness
      2. 5.2.2 [Network Mapping - Before Failback B ⇒ A]  B ⇒ A Zone01
      3. 5.2.3 [Network Mapping - Before Failback B ⇒ A]  B ⇒ A Zone03
    3. 5.3 Before Failback C ⇒ A
      1. 5.3.1 Zone Readiness
      2. 5.3.2 [Network Mapping - Before Failback C ⇒ A]  C ⇒ A Zone01
      3. 5.3.3 [Network Mapping - Before Failback C ⇒ A]  C ⇒ A Zone03
    4. 5.4 Summary of Network Mappings
      1. 5.4.1 Initial Configuration / Before Failover / After Failback:
      2. 5.4.2 After Failover
  6. 6 Operation Failover or Failback Procedures
    1. 6.1 Failover from A to B Procedure:
    2. 6.2 Failback from B to A Procedure:
    3. 6.3 Failover from A to C Procedure:
    4. 6.4 Failback from C to A Procedure
  7. 7 3 Site DFS Mode Failover
    1. 7.1 Overview
    2. 7.2 Video How to - Overview Multi site DFS mode Failover
    3. 7.3 Configuration
      1. 7.3.1 DFS Mode Initial Configuration / Before Failover Diagram
      2. 7.3.2 DFS Mode Failover A ⇒ B Diagram
      3. 7.3.3 DFS Mode Failback B ⇒ A Diagram
      4. 7.3.4 DFS Mode Failover A ⇒ C Diagram
      5. 7.3.5 DFS Mode Failback C ⇒ A Diagram
      6. 7.3.6 Eyeglass DFS Mode Failover Steps
      7. 7.3.7 Eyeglass DFS Mode Failback Steps
      8. 7.3.8 DFS Configuration
        1. 7.3.8.1 Source Cluster (Site A)
        2. 7.3.8.2 Target Cluster #1 (Site B)
        3. 7.3.8.3 Target Cluster #2 (Site C)
    4. 7.4 DFS Readiness
      1. 7.4.1 DFS Readiness - Initial Configuration / Before Failover
      2. 7.4.2 DFS Readiness - Before Failback B ⇒ A
      3. 7.4.3 DFS Readiness - Before Failback C ⇒ A
      4. 7.4.4 Share Names and DFS Paths
        1. 7.4.4.1 Initial Configuration / Before Failover
        2. 7.4.4.2 After Failover / After Failback
    5. 7.5 DFS Mode Failover and Failback Procedures
      1. 7.5.1 DFS Mode Failover from A to B Procedure:
      2. 7.5.2 DFS Mode Failback from B to A Procedure:
      3. 7.5.3 DFS Mode Failover from A to C Procedure:
      4. 7.5.4 DFS Mode Failback from C to A Procedure:

Overview


The highest level of data protection comes from multi site replication, where a source clusters data is replicated to clusters at 2 different sites.  Typically, 2 clusters are in metro location and the 3rd cluster is outside a power grid failure zone.

The goal of this failover design is to provide a choice of sites to failover and have fully automated failover between sites.   In addition, failback and failover again to the same or different site will be possible, but may require some manual steps to avoid SyncIQ policies that will block the failover.   

In this configuration Eyeglass can be used with Access zone failover solution to protect the data in an access zone and allow a failover choice of target cluster 1 or target cluster 2 (at the 3rd site).   This solution can operate at the access zone level and allows one or more access zones to be 3 site protected and other access zones only 2 site protected.

Video How to - Overview Multi site Access Zone Failover

Superna Eyeglass Multi Site Fully automated Access Zone failover walk through

Overview of Multi Site DR and Continuous Availability


This solution will support Access Zone fully automated and DFS mode multi site failover and failback.  The document covers Access Zone and DFS mode in separate sections.




Overview of Multi Site Access Zone DR and Continuous Availability

This solution allows one or more independant access zones to have 2 possible replication targets and sites to failover.  This offers maximum data protection and full automation for DNS mount path failover for SMB and NFS.

Pre-requisites

  1. DNS also requires triple Name Server Delegation records.  This is similar to Dual delegation where the smartconnect zones involved in the 3 site failover must now have 3 NS records pointing to all Subnet Service IP’s for each cluster and subnet involved in the failover

  2. Igls-hints provisioned on all 3 clusters for each access zone must be configured

  3. Recommended to apply on all target clusters the smartconnect zone placeholder using igls-original-<smartconnect zone name on source cluster>

  4. Replicate the same data to site B and C to keep it simple using overlapping source path policies that replicate to cluster B and C with A being the source cluster

  5. Ensure IGLS-hints are globally unique example this avoids SPN collection issues on AD machine accounts but allows pools to be matched on the igls-xxxx-

    1. A cluster - igls-poolx-Cluster-A  

    2. B cluster - igls-poolx-Cluster-B

    3. C cluster - igls-poolx-Cluster-C  

  6. Use the same AD provider in all access zones

  7. Ensure all clusters are in the same AD Forest

  8. NOTE When following AD Delegation you MUST extend the delegation to the 3rd cluster and make sure each cluster is given Write Service principal Name permissions to its own Computer Account AND the other 2 cluster machine accounts.

Supported Access Zone Failover Operations

  • A to B or A to C

    • If B is Active Cluster:

      • A to C Access Zone failover not supported

      • A to C per SyncIQ Policy failover supported (manual steps for networking)

    • If C is Active Cluster:

      • A to B Access Zone failover not supported

      • A to B per SyncIQ Policy failover supported (manual steps for networking)



Logical Diagram of Multi Site Failover


Initial Configuration / Before Failover Diagram

The following diagram displays the initial configuration / before failover for 3-sites Failover.

  • Site A: Primary Site

  • Site B: Secondary Site #1

  • Site C: Secondary Site #2



Failover A ⇒ B

The following diagram illustrates the workflow for failover from A to B.  Take note step P1 (Preparation Step - prior to initiate Eyeglass Access Zone Failover) - refer to the procedure section for details.

Refer to this table for the list of the numbered steps shown in this diagram.

 


Failback B ⇒ A

The following diagram illustrates the workflow for failback  from B to A. Refer to this table for the list of the numbered steps shown in this diagram.


Failover A ⇒ C

The following diagram illustrates the workflow for failover from A to C. Take note step P1 (Preparation Step - prior to initiate Eyeglass Access Zone Failover) - refer to the procedure section for details.

Refer to this table for the list of the numbered steps shown in this diagram.



Failback C ⇒ A

The following diagram illustrates the workflow for failback  from C to A. Refer to this table for the list of the numbered steps shown in this diagram.



Eyeglass Access Zone Failover Steps

This table lists the Eyeglass Access Zone Failover steps with numbers as shown in the above Failover diagrams.

No

Step

P1

Preparation.

  • Failover A ⇒ B: Ensure there is no existing Mirror Policies between C to A. If there is existing Mirror Policies between C to A, delete first, before initiate Failover from A to B.

  • Failover A ⇒ C: Ensure there is no existing Mirror Policies between B to A. If there is existing Mirror Policies between B to A, delete first, before initiate Failover from A to C.

1

Ensure that there is no live access to data

2

Begin Failover

3

Validation

4

Synchronize data

5

Synchronize configuration (shares/export/alias)

6

Change Smartconnect Zone on Source so not to resolve by Clients

7

Avoid SPN Collision

8

Move Smartconnect zone to Target

9

Update SPN to allow for authentication against target

10

Repoint DNS to the Target Cluster - DNS Triple Delegation

11

Record schedule for SyncIQ policies being failed over

12

Prevent SyncIQ policies being failed over from running

13

Provide write access to data on target

14

Disable SyncIQ on source and make active on target

15

Set proper SyncIQ schedule on target

16

Synchronize quota(s)

17

Remove quotas on directories that are target of SyncIQ (Isilon best practice)

18

Refresh session to pick up DNS change (use post failover script)


Eyeglass Access Zone Failback Steps

This table lists the Eyeglass Access Zone Failback steps with numbers as shown in the above Failback diagrams.


No

Step

1

Ensure that there is no live access to data

2

Begin Failback

3

Validation

4

Synchronize data

5

Synchronize configuration (shares/export/alias)

6

Change Smartconnect Zone on Source (Secondary Cluster)  so not to resolve by Clients

7

Avoid SPN Collision

8

Move Smartconnect zone to Target (Primary Cluster)

9

Update SPN to allow for authentication against target

10

Repoint DNS to the Target Cluster - DNS Triple Delegation

11

Record schedule for SyncIQ policies being failed back

12

Prevent SyncIQ policies being failed back from running

13

Provide write access to data on target

14

Disable SyncIQ on source (Secondary Cluster) and make active on target (Primary Cluster)

15

Set proper SyncIQ schedule on target (Primary Cluster)

16

Synchronize quota(s)

17

Remove quotas on directories that are target of SyncIQ (on Secondary Cluster)  (Isilon best practice)

18

Refresh session to pick up DNS change (use post failover script)



Access Zone Failover - SyncIQ Configuration for 3 site


The source cluster SyncIQ policy path must fall inside the access zone for Target cluster 1,   a Second SyncIQ policy can use the same source path (recommended).  See the configuration guide below.

Not Recommended

Examples where failover leaves some data behind after failover and moves networking (Smartconnect Zones) to Target cluster.  After Failover both smartconnect zones would failover together leaving the namespace and data stranded since the SyncIQ policies only failed over a portion of the data.

SyncIQ Policy multi Site support Matrix

Access Zone Path for these examples is /ifs/data/AZ1

Unsupported configuration

Source cluster Target cluster 1 Target cluster 2

SyncIQ policy 1 /ifs/data/AZ1/data X

SyncIQ policy 2 /ifs/data/AZ1/marketing X    

Smartconnect Zone 1 data.example.com X X

Smartconnect Zone 2 marketing.example.com X X


A supported configuration requires that all data and all DNS name space fails over together to achieve fully automated access zone failover.  

Recommended Policy Configuration

Source cluster Target cluster 1 Target cluster 2

SyncIQ policy 1 /ifs/data/AZ1/data X

SyncIQ policy 2 /ifs/data/AZ1/marketing X

SyncIQ policy 3     /ifs/data/AZ1/data X

SyncIQ policy 4 /ifs/data/AZ1/marketing X

Smartconnect Zone 1 data.example.com X X

Smartconnect Zone 2 marketing.example.com X X




Multi Site Failover and Failback Behaviours



Operation

Direction

Supported

Require Manual Step Prior to Initiate Eyeglass Access Zone Failover

Failover

A ⇒ B

Yes

Yes - refer to this diagram and procedure

Failback

B ⇒ A

Yes

No - refer to this diagram and procedure

Failover

A ⇒ C

Yes

Yes - refer to this diagram and procedure

Failback

C ⇒ A

Yes

No - refer to this diagram and procedure



Zone Readiness

This section gives example of the Zone Readiness status and Network Mapping between Source-Target#1 and Source-Target#2 pairs.

For the purpose of this example we use the following names:


Site

Cluster Name

A (Source)

cluster20

B (Target#1)

cluster21

C (Target#2)

cluster31


There are two access zones on all those three clusters: zone01 and zone03

Zone Readiness - Initial Configuration / before Failover / after Failback

This is the  Zone Readiness for Initial Configuration / before Failover / after failback state. As we can see from this figure, that both Source-Target Pairs (A - B and A - C)  are listed in this DR Dashboard’ zone readiness window.

This shows that a failover choice can be made to any target cluster in Green OK state (Warning status also allowed).


Network Mapping - Initial Configuration: A ⇒ B Zone01

Network Mapping - Initial Configuration: A ⇒ B Zone03

Network Mapping - Initial Configuration: A ⇒ C Zone01

Network Mapping - Initial Configuration: A ⇒ C Zone03


This table shows the SmartConnect Zone Name and SmartConnect Alias Name mappings for this Initial Configuration / before failover / after failback states:

Source - Target Pair

SyncIQ direction

Zone Name

SmartConnect Zone Name

SmartConnect Alias Mapping

Source Cluster

Target Cluster

Source Cluster

Target Cluster

cluster20 - cluster21

A ⇒ B

zone01

cluster20-z01.ad1.test

igls-original-cluster20-z01.ad1.test

igls-zone01p-cluster20

igls-zone01p-cluster21

zone03

cluster20-z03.ad1.test

igls-original-cluster20-z03.ad1.test

igls-zone03p-cluster20

igls-zone03p-cluster21

cluster 20 - cluster31

A ⇒ C

zone01

cluster20-z01.ad1.test

igls-original-cluster20-z01.ad1.test

igls-zone01p-cluster20

igls-zone01p-cluster31

zone03

cluster20-z03.ad1.test

igls-original-cluster20-z03.ad1.test

igls-zone03p-cluster20

igls-zone03p-cluster31



Before Failback B ⇒ A

Zone Readiness

This Zone Readiness is for the state before Failback from B to A. As we can see from this figure, that only TargetB(cluster21))-SourceA(cluster20) pairs are listed as available in this DR Dashboard’s zone readiness window. The other pairs (SourceA(cluster20)-TargetB(cluster21) and SourceA(cluster20)-TargetC(cluster31)) are stated as FAILED-OVER.

[Network Mapping - Before Failback B ⇒ A]  B ⇒ A Zone01

[Network Mapping - Before Failback B ⇒ A]  B ⇒ A Zone03

This table shows the SmartConnect Zone Name and SmartConnect Alias Name mappings for this Before Failback B ⇒ A state:

Source - Target Pair

SyncIQ direction

Zone Name

SmartConnect Zone Name

SmartConnect Alias Mappings

Source Cluster

Target Cluster

Source Cluster

Target Cluster

cluster20 - cluster21

A ⇒ B

zone01

STATUS: FAILED OVER

zone03

STATUS: FAILED OVER

cluster20 - cluster31

A ⇒ C

zone01

STATUS: FAILED OVER

zone03

STATUS: FAILED OVER

cluster21 - cluster20

B ⇒ A

zone01

cluster20-z01.ad1.test

igls-original-cluster20-z01.ad1.test

igls-zone01p-cluster21

igls-zone01p-cluster20

zone03

cluster20-z03.ad1.test

igls-original-cluster20-z03.ad1.test

igls-zone03p-cluster21

igls-zone03p-cluster20


Before Failback C ⇒ A

Zone Readiness

This Zone Readiness is for the state before Failback from C to A. As we can see from this figure, that only TargetC(cluster31))-SourceA(cluster20) pairs are listed as available in this DR Dashboard’s zone readiness window. The other pairs (SourceA(cluster20)-TargetB(cluster21) and SourceA(cluster20)-TargetC(cluster31)) are stated as FAILED-OVER.

[Network Mapping - Before Failback C ⇒ A]  C ⇒ A Zone01

[Network Mapping - Before Failback C ⇒ A]  C ⇒ A Zone03


This table shows the SmartConnect Zone Name and SmartConnect Zone Alias Name mappings for Before Failback C ⇒ A state:

Source - Target Pair

SyncIQ direction

Zone Name

SmartConnect Zone Name

SmartConnect Alias Mappings

Source Cluster

Target Cluster

Source Cluster

Target Cluster

cluster20 - cluster21

A ⇒ B

zone01

STATUS: FAILED OVER


zone03

STATUS: FAILED OVER

cluster20 - cluster31

A ⇒ C

zone01

STATUS: FAILED OVER

zone03

STATUS: FAILED OVER

cluster31 - cluster20

C ⇒ A

zone01

cluster20-z01.ad1.test

igls-original-cluster20-z01.ad1.test

igls-zone01p-cluster31

igls-zone01p-cluster20

zone03

cluster20-z03.ad1.test

igls-original-cluster20-z03.ad1.test

igls-zone03p-cluster31

igls-zone03p-cluster20


Summary of Network Mappings

Based on the above example, the following table summarizes the network mappings with zone names and zone alias names:

Initial Configuration / Before Failover / After Failback:


State

Access Zone

Name

Cluster20 (A)

Cluster21 (B)

Cluster31 (C)

Initial Config

zone01

Zone Name

cluster20-z01.ad1.test

igls-original-cluster20-z01.ad1.test

igls-original-cluster20-z01.ad1.test

Zone Alias Hint

igls-zone01p-cluster20

igls-zone01p-cluster21

igls-zone01p-cluster31

zone03

Zone Name

cluster20-z03.ad1.test

igls-original-cluster20-z03.ad1.test

igls-original-cluster20-z03.ad1.test

Zone Alias Hint

igls-zone03p-cluster20

igls-zone03p-cluster21

igls-zone03p-cluster31


After Failover

This table shows the zone names and zone alias names  after failover A ⇒ B / after failover A ⇒ C:


State

Access Zone

Name

Cluster20 (A)

Cluster21 (B)

Cluster31 (C)

After Failover A => B

zone01

Zone Name

igls-original-cluster20-z01.ad1.test

cluster20-z01.ad1.test

igls-original-cluster20-z01.ad1.test


Zone Alias

igls-zone01p-cluster20

igls-zone01p-cluster21

igls-zone01p-cluster31

zone03

Zone Name

igls-original-cluster20-z03.ad1.test

cluster20-z03.ad1.test

igls-original-cluster20-z03.ad1.test

Zone Alias

igls-zone03p-cluster20

igls-zone03p-cluster21

igls-zone03p-cluster31

After Failover A => C

zone01

Zone Name

igls-original-cluster20-z01.ad1.test

igls-original-cluster20-z01.ad1.test

cluster20-z01.ad1.test

Zone Alias

igls-zone01p-cluster20

igls-zone01p-cluster21

igls-zone01p-cluster31

zone03

Zone Name

igls-original-cluster20-z03.ad1.test

igls-original-cluster20-z03.ad1.test

cluster20-z03.ad1.test

Zone Alias

igls-zone03p-cluster20

igls-zone03p-cluster21

igls-zone03p-cluster31


Operation Failover or Failback Procedures

It is recommended to create SyncIQ Policies that will be used for multi site replications (e.g. to replicate from Site A to Site B and also from Site A to Site C)  with names that reflect the Source-Target pairs.

The following table is an example for 2 SyncIQ Policies per Source-Target pairs:


SyncIQ Policy Name

SyncIQ Pairs

AB-synciq-01

AB-synciq-02

A and B

AC-synciq-01

AC-synciq-02

A and C


This name format will help us to identify which Access Zones that we want to failover.

Failover from A to B Procedure:

  1. Prior to initiate Eyeglass Access Zone Failover from A to B,  we need to ensure that there is no existing SyncIQ Mirror Policies from C to A. The recovery resync prep step of this Failover A to B will create Mirror Policies from B to A with same Mirror Target Paths as the C to A (Mirror Target Paths are overlaps). This will make the Mirror Policies from B to A unrunnable and the Eyeglass Failover Job will fail. If there are existing ones, we need to delete them first. Refer to step P1 in the Failover workflow diagrams.

  2. Then we can perform Eyeglass Access Zone Failover as per normal. In DR Assistant Wizard, after we selected the source cluster (Cluster A (for this example: name cluster20)) the next wizard screen display the list of available Failover options based on Source-Target-Zone pairs.

  1. We need to be careful to select the correct Target Cluster that we want to Failover (A to B or A to C).  For this case we want to failover from A to B. Select a zone that we want to Failover from cluster20 (source) - cluster21 (target)  pairs.

  2. The next screen will gives warning to highlight that this wizard will only perform access zone failover from A to B. The other policy on the same access zone (A to C) will not be failed over.

  1. Proceed this Access Zone Failover as per normal. Refer to Eyeglass Access Zone Failover Guide for details.

  2. Repeat the same procedure for failover other access zones from A to B.


Failback from B to A Procedure:

  1. We can perform Eyeglass Access Zone Failback as per normal.

  2. In DR Assistant Wizard, after we selected the source cluster (Cluster B (for this example: name cluster21)) the next wizard screen will only display the Failover options From Cluster B (cluster21)  to Cluster A (cluster20).  


  1. Select the access zone to be failed back and the next screen will not highlight any warning about other SyncIQ policies that will not failed back.

  1. Proceed this Access Zone Failback as per normal. Refer to Eyeglass Access Zone Failover Guide for details.

  2. Repeat the same procedure for failback other access zones from B to A.

Failover from A to C Procedure:

  1. Prior to initiate Eyeglass Access Zone Failover from A to C,  we need to ensure that there is no existing SyncIQ Mirror Policies from B to A. The recovery resync prep step of this Failover A to C will create Mirror Policies from C to A with same Mirror Target Paths as the B to A (Mirror Target Paths are overlaps). This will make the Mirror Policies from C to A unrunnable and the Eyeglass Failover Job will fail. If there are existing ones, we need to delete them first. Refer to step P1 in the Failover workflow diagrams.

  2. Then we can perform Eyeglass Access Zone Failover as per normal. In DR Assistant Wizard, after we selected the source cluster (Cluster A (for this example: name cluster20)) the next wizard screen display the list of available Failover options based on Source-Target-Zone pairs.

  1. We need to be careful to select the correct Target Cluster that we want to Failover (A to B or A to C).  For this case we want to failover from A to C. Select a zone that we want to Failover from cluster20 (source) - cluster31 (target)  pairs.

  2. The next screen will gives warning to highlight that this wizard will only perform access zone failover from A to C. The other policy on the same access zone (A to B) will not be failed over.

  1. Proceed this Access Zone Failover as per normal. Refer to Eyeglass Access Zone Failover Guide for details.

  2. Repeat the same procedure for failover other access zones from A to C.

Failback from C to A Procedure

  1. We can perform Eyeglass Access Zone Failback as per normal.

  2. In DR Assistant Wizard, after we selected the source cluster (Cluster C (for this example: name cluster31)) the next wizard screen will only display the Failover options From Cluster C (cluster31)  to Cluster A (cluster20).  

  1. Select the access zone to be failed back and the next screen will not highlight any warning about other SyncIQ policies that will not failed back.

  1. Proceed this Access Zone Failback as per normal. Refer to Eyeglass Access Zone Failover Guide for details.

  2. Repeat the same procedure for failback other access zones from C to A.


3 Site DFS Mode Failover


This section will explain the configuration, failover and failback workflows for 3 Sites DFS Mode Failover with Eyeglass for Isilon. As explained in the previous sections of this document, there are 3 Sites for this setup: Site A (Source), Site B (Target #1) and Site C (Target #2).


Overview


This solution offers simply 2 site target with clients automatically redirected to the correct site.  

  • No DNS change

  • No SPN changes

  • Quotas follow shares as required to each site on failover and failback

  • 3 DFS targets per folder

  • Highest availability option for data with zero touch failover between sites

Video How to - Overview Multi site DFS mode Failover

  1. Superna Eyeglass 3 site failover Isilon with DFS and open files walk through

Configuration

For this 3 Sites DFS Mode Failover, we need to configure the DFS Target Folder to have 3 referrals to 3 Isilon Clusters. Data on the SMB folders referred as the DFS Target Folder is replicated from Site A to Site B, and also from Site A to Site C by using Isilon SyncIQ replication.

DFS Mode Initial Configuration / Before Failover Diagram

This diagram displays the initial configuration for this 3 Sites DFS Mode Failover.


DFS Mode Failover A ⇒ B Diagram

This diagram shows the Failover workflow from A to B. Take note step P1 (Preparation Step - prior to initiate Eyeglass DFS Mode Failover) - refer to the procedure section for details.

Refer to this table for the list of the numbered steps shown in this diagram.



DFS Mode Failback B ⇒ A Diagram

This diagram shows the Failback workflow from B to A.

Refer to this table for the list of the numbered steps shown in this diagram.

DFS Mode Failover A ⇒ C Diagram

This diagram shows the Failover workflow from A to C. Take note step P1 (Preparation Step - prior to initiate Eyeglass DFS Mode Failover) - refer to the procedure section for details.

Refer to this table for the list of the numbered steps shown in this diagram.

DFS Mode Failback C ⇒ A Diagram

This diagram shows the Failback workflow from C to A.

Refer to this table for the list of the numbered steps shown in this diagram.


Eyeglass DFS Mode Failover Steps

This table lists the Eyeglass DFS Mode Failover steps with numbers as shown in the above Failover diagrams.


No

Steps

P1

Preparation Step.

Failover A ⇒ B: Ensure there is no existing Mirror Policies between C to A. If there is existing Mirror Policies between C to A, delete first, before initiate Failover from A to B.

Failover A ⇒ C: Ensure there is no existing Mirror Policies between B to A. If there is existing Mirror Policies between B to A, delete first, before initiate Failover from A to C.


1

Ensure that there is no live access to data

2

Begin Failover

3

Validation

4

Synchronize data

5

Synchronize configuration (shares/export/alias)

6

Renaming Shares

7

Record schedule for SyncIQ policies being failed over

8

Prevent SyncIQ policies being failed over from running

9

Provide write access to data on target

10

Disable SyncIQ on source and make active on target

11

Set proper SyncIQ schedule on target

12

Synchronize quota(s)

13

Remove quotas on directories that are target of SyncIQ (Isilon best practice)

14

Refresh SMB session to pick up DFS change:

  1. SMB Client is accessing a domain-based namespace (e.g. \\ad1.test\dfs01\z02-smb01) . This SMB client computer sends a query to the AD  to discover a list of root targets for the namespace.

  2. AD Controller returns a list of root targets defined for the requested namespace.

  3. SMB client selects the root target from the referral list  and sends a query to the root server for the requested link.

  4. DFS root server constructs a list of folder targets in the referral.

    1. Failover A ⇒ B:

      1. The SMB Share(s) on Cluster-A is not active (Renamed with igls-dfs- prefix),

      2. The SMB Share(s) on Cluster-C is not active. (Deleted).

      3. The active path is to the Cluster-B (Renamed to the actual name). DFS root server sends this referral information to the client.

    2. Failover A ⇒ C:

      1. The SMB Share(s) on Cluster-A is not active (Renamed with igls-dfs- prefix),

      2. The SMB Share(s) on Cluster-B is not active. (Deleted).

      3. The active path is to the Cluster-C (Renamed to the actual name). DFS root server sends this referral information to the client.

  5. SMB client tries to establish a connection to the selected target (the active target  in the list).

  6. Isilon with Active Target responses to this SMB connection.


Eyeglass DFS Mode Failback Steps

This table lists the Eyeglass DFS Mode Failback steps with numbers as shown in the above Failback diagrams.


No

Steps

1

Ensure that there is no live access to data

2

Begin Failback

3

Validation

4

Synchronize data

5

Synchronize configuration (shares/export/alias)

6

Renaming Shares

7

Record schedule for SyncIQ policies being failed back

8

Prevent SyncIQ policies being failed back from running

9

Provide write access to data on target

10

Disable SyncIQ on source and make active on target

11

Set proper SyncIQ schedule on target

12

Synchronize quota(s)

13

Remove quotas on directories that are target of SyncIQ (Isilon best practice)

14

Refresh SMB session to pick up DFS change:  

  1. SMB Client is accessing a domain-based namespace (e.g. \\ad1.test\dfs01\z02-smb01) . This SMB client computer sends a query to the AD  to discover a list of root targets for the namespace.

  2. AD Controller returns a list of root targets defined for the requested namespace.

  3. SMB client selects the root target from the referral list  and sends a query to the root server for the requested link.

  4. DFS root server constructs a list of folder targets in the referral.

    1. Failback B ⇒ A:

      1. The SMB Share(s) on Cluster-B is not active (Renamed with the igls-dfs- prefix),

      2. The SMB Share(s) on Cluster-C is not active. (Renamed with the igls-dfs- prefix).

      3. The active path is to the Cluster-A (Renamed to the actual name). DFS root server sends this referral information to the client.

    2. Failback C ⇒ A:

      1. The SMB Share(s) on Cluster-C is not active (Renamed with the igls-dfs- prefix),

      2. The SMB Share(s) on Cluster-B is not active. (Renamed with the igls-dfs- prefix).

      3. The active path is to the Cluster-A (Renamed to the actual name). DFS root server sends this referral information to the client.

  5. SMB client tries to establish a connection to the selected target (the active target  in the list).

  6. Isilon with Active Target responses to this SMB connection.



DFS Configuration

Configure the DFS Target Folder to have 3 referrals - Site A, Site B and Site C.

For an example we have configured the DFS Target Folder to have these three referrals:

  1. Source (Site A): \\cluster07-z02.ad1.test\z02-smb01

  2. Target#1 (Site B):  \\cluster08-z02.ad1.test\z02-smb01

  3. Target#2 (Site C): \\cluster06-z02.ad1.test\z02-smb01

We have also configured the following target priority referral ordering:

Source Cluster (Site A)

Target Cluster #1 (Site B)


Target Cluster #2 (Site C)

DFS Readiness

This is to explain different states of DFS Readiness for this 3 Sites DFS Mode Failover / Failback.

For the purpose of this example we use the following names:


Site

Cluster Name

A (Primary / Source)

cluster07

B (Secondary#1 / Target #1)

cluster08

C (Secondary#2 / Target #2)

cluster06

DFS Readiness - Initial Configuration / Before Failover

This is the  DFS Readiness for Initial Configuration / before Failover state. As we can see from this figure, that both Source-Target Pairs (A - B and A - C)  are listed in this DR Dashboard’ DFS readiness window.

This shows that a DFS Mode failover choice can be made to any target cluster in Green OK state (Warning status also allowed).


DFS Readiness - Before Failback B ⇒ A

This DFS Readiness is for the state before Failback from B to A.

Warning: As shown in this DR Readiness Dashboard that both AB Mirror Policy  and AC  Policy have DR Status OK. During Failback from B to A, we need to carefully select Cluster B as the source. Do not select the Cluster A as the source, as this will direct the process as Failover from A to C.



DFS Readiness - Before Failback C ⇒ A

This DFS Readiness is for the state before Failback from C to A.

Warning: As shown in this DR Readiness Dashboard that both AC Mirror Policy  and AB  Policy have DR Status OK. During Failback from C to A, we need to carefully select Cluster C as the source. Do not select the Cluster A as the source, as this will direct the process as Failover from A to B.


Share Names and DFS Paths

Based on the above example, the following table describes the SMB Share Names and DFS Paths for various states:

Initial Configuration / Before Failover



Cluster07 (A)

Cluster08 (B)

Cluster06 (C)

Share Name

z02-smb01

igls-dfs-z02-smb01

igls-dfs-z02-smb01

DFS Path Resolves to

\\cluster07-z02.ad1.test\z02-smb01



Access Status

0 ( ACTIVE TARGETSET )

0xc00000cc ( TARGETSET )

0xc00000cc


After Failover / After Failback




Cluster07 (A)

Cluster08 (B)

Cluster06 (C)

After Failover A => B

Share Name

igls-dfs-z02-smb01

z02-smb01

*1


DFS Path Resolves to


\\cluster08-z02.ad1.test\z02-smb01



Access Status

0xc00000cc ( TARGETSET)

0 ( ACTIVE TARGETSET )

0xc00000cc

After Failback B => A

Share Name

z02-smb01

igls-dfs-z02-smb01

igls-dfs-z02-smb01*2


DFS Path Resolves to

\\cluster07-z02.ad1.test\z02-smb01




Access Status

0 ( ACTIVE TARGETSET )

0xc00000cc ( TARGETSET )

0xc00000cc

After Failover A => C

Share Name

igls-dfs-z02-smb01

*3

z02-smb01


DFS Path Resolves to



\\cluster06-z02.ad1.test\z02-smb01


Access Status

0xc00000cc ( TARGETSET )

0xc00000cc ( TARGETSET )

0 (ACTIVE)

After Failback C => A

Share Name

z02-smb01

igls-dfs-z02-smb01*4

igls-dfs-z02-smb01


DFS Path Resolves to

\\cluster07-z02.ad1.test\z02-smb01




Access Status

0 ( ACTIVE TARGETSET )

0xc00000cc ( TARGETSET )

0xc00000cc


Remarks for Intermediate and Final States:

*1: States:

  1. After Failover A ⇒ B process has just Completed: igls-dfs-z02-smb01 (Intermediate State)

  2. The 1st cycle of Configuration Replication (A ⇒ C) after failover A ⇒ B: igls-dfs-igls-dfs-z02-smb01 (Intermediate State)

  3. The 2nd cycle of Configuration Replication (A ⇒ C) after failover A ⇒ B: <empty> SMB shares deleted. (Final state)


*2: States:

  1. After Failback B ⇒ A just Completed: <empty> SMB shares is  not created (Intermediate State)

  2. The 1st cycle of Configuration Replication (A ⇒ C) after failback B ⇒ A: igls-dfs-z02-smb01 (Final State)



*3: States:

  1. After Failover A ⇒ C just Completed: igls-dfs-z02-smb01 (Intermediate State)

  2. The 1st cycle of Configuration Replication (A ⇒ B) after failover A ⇒ C: igls-dfs-igls-dfs-z02-smb01 (Intermediate State)

  3. The 2nd cycle of Configuration Replication (A ⇒ B) after failover A ⇒ C: <empty> SMB shares deleted. (Final state)


*4: States:

  1. After Failback C ⇒ A just Completed: <empty> SMB shares is  not created (Intermediate State)

  2. The 1st cycle of Configuration Replication (A ⇒ B) after failback C ⇒ A: igls-dfs-z02-smb01 (Final State).


Based on that table we can see that after failover, it takes 2 cycles of Configuration Replication as waiting time for the SMB share name on the 3rd cluster to have its final state.

For the case of failback, it takes 1 cycle of Configuration Replication process as waiting  time for the SMB share name on the 3rd cluster to have its final state.


DFS Mode Failover and Failback Procedures

It is recommended to create SyncIQ Policies that will be used for multi site replications (e.g. to replicate from Site A to Site B and also from Site A to Site C)  with names that reflect the Source-Target pairs.

The following table is an example:


SyncIQ Policy Name

SyncIQ Pairs

AB-synciq-01

A and B

AC-synciq-01

A and C


This name format will help us to identify which SyncIQ Pairs that we want to failover.

DFS Mode Failover from A to B Procedure:

  1. Prior to initiate Eyeglass DFS Mode Failover from A to B,  we need to ensure that there is no existing SyncIQ Mirror Policies from C to A. The recovery resync prep step of this Failover A to B will create Mirror Policies from B to A with same Mirror Target Paths as the C to A (Mirror Target Paths are overlaps). This will make the Mirror Policies from B to A unrunnable and the Eyeglass Failover Job will fail. If there are existing ones, we need to delete them first. Refer to step P1 in the Failover workflow diagrams.

  2. Then we can perform Eyeglass DFS Mode Failover as per normal. In DR Assistant Wizard, after we select the source cluster (Cluster A (for this example: cluster07)) the next wizard screen display the list of available Failover options based on Source-Target pairs (A to B or A to C).

  1. We need to be careful to select the correct Target Cluster that we want to Failover (A to B or A to C).  For this case we want to failover from A to B. Select the AB Source-Target Pair.

  2. The next screen will gives validation whether the failover configuration is valid.

  1. Proceed this DFS Mode Failover as per normal. Refer to Eyeglass DFS Mode Failover Guide for details.



DFS Mode Failback from B to A Procedure:

  1. We can perform Eyeglass DFS Mode Failback as per normal.

  2. In DR Assistant Wizard, ensure we select the correct source cluster B (name: cluster08). At this stage (After Failover A to B and before Failback from B to A), there are 2 available options to perform as also displayed in the DR Dashboard DFS Readiness. Do not select cluster A (name : cluster07) as the source, as this will lead to Failover from A to C instead.

  3. After we select the correct source cluster (Cluster B (for this example: cluster08)) the next wizard screen will only display the Failback option From Cluster B (cluster08)  to Cluster A (cluster07).  

  1. Select the AB mirror policy to failback. The next screen will gives validation whether the failover configuration is valid.

  1. Proceed this DFS Mode Failback as per normal. Refer to Eyeglass DFS Mode Failover Guide for details.


DFS Mode Failover from A to C Procedure:


  1. Prior to initiate Eyeglass DFS Mode Failover from A to C,  we need to ensure that there is no existing SyncIQ Mirror Policies from B to A. The recovery resync prep step of this Failover A to C will create Mirror Policies from C to A with same Mirror Target Paths as the B to A (Mirror Target Paths are overlaps). This will make the Mirror Policies from C to A unrunnable and the Eyeglass Failover Job will fail. If there are existing ones, we need to delete them first. Refer to step P1 in the Failover workflow diagrams.

  2. Then we can perform Eyeglass DFS Mode Failover as per normal. In DR Assistant Wizard, after we select the source cluster (Cluster A (for this example: cluster07)) the next wizard screen display the list of available Failover options based on Source-Target pairs (A to B or A to C).

  1. We need to be careful to select the correct Target Cluster that we want to Failover (A to B or A to C).  For this case we want to failover from A to C. Select the AC Source-Target Pair.

  2. The next screen will gives validation whether the failover configuration is valid.

  1. Proceed this DFS Mode Failover as per normal. Refer to Eyeglass DFS Mode Failover Guide for details.



DFS Mode Failback from C to A Procedure:

  1. We can perform Eyeglass DFS Mode Failback as per normal.

  2. In DR Assistant Wizard, ensure we select the correct source cluster C (name: cluster06). At this stage (After Failover A to C and before Failback from C to A), there are 2 available options to perform as also displayed in the DR Dashboard DFS Readiness. Do not select cluster A (name : cluster07) as the source, as this will lead to Failover from A to B instead.

  3. After we select the correct source cluster (Cluster C (for this example: cluster06)) the next wizard screen will only display the Failback option From Cluster C (cluster06)  to Cluster A (cluster07).  

  1. Select the AC mirror policy to failback. The next screen will gives validation whether the failover configuration is valid.

  1. Proceed this DFS Mode Failback as per normal. Refer to Eyeglass DFS Mode Failover Guide for details.