Geographic Highly Available Storage solution with Eyeglass Access Zone Failover and Dual Delegation

Geographic Highly Available Storage solution with Eyeglass Access Zone Failover and Dual Delegation


Let's review Access zone failover with Eyeglass at a high level before explaining how “Dual Delegation” works.   It involves Smartconnect zones applied to IP pools failing over.   This means creating an alias on the target Isilon that existed on the source cluster IP pool.   Once this is completed,  DNS delegation records are based on NS or name server records that point at the Subnet Service IP servicing the IP pool involved in the failover.

The IP Pools are updated to forward smartconnect zone lookups to the newly active clusters subnet Service ip with the newly created IP alias on the IP pool.


Skip the reading and watch the video 5M on how to setup Dual delegation with Microsoft DNS (other DNS vendors work as well, the concept is the same).


How to configure Dual Delegation with Superna Eyeglass


Continue reading below for details and how it works.


Eyeglass automates this process during failover BUT the source cluster Smartconnect Zone is renamed (Preserved for failback operations), and leaves a simple breadcrumb zone that remains that is prefixed with igls-original-whatever-the-zone-name-was.  This rename operation has a second benefit, in that the source cluster Service IP will no longer answer any queries that are sent to the source cluster Subnet Service IP.

(see diagram below) DNS resolution before failover where typically on only a single NS record points at Primary cluster.

image


(see diagram below) DNS resolution AFTER manually updating the NS record to point at the Service IP of the secondary cluster.


image

Key Questions

  1. Can two NS records be added to the delegation to use Primary and Secondary Subnet Service IP ahead of time to simplify failover and remove the DNS manual step post failover?

    1. Answer:  Yes, this is possible and removes yet another step in the failover when using Access Zone Failover

  2. Can dual delegation be used without Eyeglass?

    1. Answer: no without Eyeglass, this configuration can not be used.  This is because the Primary cluster Smartconnect zone needs to be manually removed or renamed, Eyeglass turns this into an Atomic automated process during failover when the Secondary cluster file system is made writeable (the only time you need this functionality)

  3. Will Dual Delegation support SMB and NFS failovers?

    1. Answer: Yes, both protocols benefit from this feature with SMB handling this better since retrying failed mounts requires clicking on the drive letter or accessing the UNC again to look up the name to ip address.    NFS clients will require unmount and remount since they mount an ip address after name resolution has completed, even if used in the /etc/fstab file with an FQDN.  Script Engine can now be focused  entirely on host side automation without needing and DNS updates


How to setup the dual delegation?


Simple, delegate the smartconnect zone with two NS records #1 to primary cluster and #2 the Secondary cluster SSIP that answers DNS queries for the IP pool.

Screen Shot 2016-01-03 at 5.28.04 PM.png


Let's review how this works.  The following two diagrams show the DNS setup before and after failover with Eyeglass.




How does this work with DNS?

Answer:  

  1. The DNS server can issue queries for smartconnect zone userdata.ad1.test to either Name server record.

  2. If a query is sent to the Secondary Cluster before failover the Isilon answers the query code 5 or Refused.  NOTE: apply igls-original-<production cluster smartconnect name>, dual delegation requires a smartconnect name to exist on the target IP pool.  We recommend the syntax above.   Also NOTE detection of the igls-original prefix on an access zone pool will update the DR dashboard with FAILEDOVER state.  

  3. This tells the DNS server to re-issue to the second name server record to satisfy the query.

  4. Since the Primary cluster is configured for this Smartconnect zone, it will answer the query from one of IP addresses in the IP pool as expected.

  5. The DNS server returns the IP address provided to the client that issued the query.

  6. done.

  7. Failover Steps

  8. Same as above except Eyeglass has disabled the Primary cluster smartconnect zone name with prefix igls-original-xxxx, and the Primary cluster will respond with DNS return code 5 Refused

  9. DNS server re-issues the query to the Secondary cluster SSIP to get the query answered.

  10. The above all assumes a TTL of 0 on NS records to avoid caching name to ip addresses which works with the exception of Linux mount commands.   

  11. If a real-DR event occurs versus a controlled failover where both clusters are reachable, the last step would be to ensure the Primary cluster is not ip reachable (if partial disaster), and prevent this cluster from coming up again post DR Event, so that it does not answer DNS queries again or simple remove the entry from DNS..  This is standard practice but mentioned for completeness.  


 Let's review the wireshark traces below of a failover from a DNS view of a Linux Client.


  1. Linux Client 172.31.1.101 issues query to userdata.ad1.test

  2. Screen Shot 2016-01-03 at 5.44.58 PM.png

  3. DNS server source sends query to Primary cluster SSIP (no longer the active cluster)

  4. Screen Shot 2016-01-03 at 5.49.12 PM.png

  5. Primary Cluster (ip 172.31.1.200) with igl-original renamed Smartconnect Zone answers the query from DNS server (172.16.80.6)

  6. Screen Shot 2016-01-03 at 5.51.12 PM.png

  7. DNS Server re-issues query to 2nd NS record in this case the Secondary cluster SSIP

  8. Screen Shot 2016-01-03 at 5.57.06 PM.png

  9. Secondary cluster responds with new IP address from the target failover over IP pool mapped by Eyeglass for failover

  10. Screen Shot 2016-01-03 at 5.58.51 PM.png

  11. Client mount can now succeed using new ip address on the Secondary cluster IP pool to mount shares or exports

  12. Done.







DNS Return codes

DNS Return Message

DNS Response Code

Function

NOERROR

RCODE:0

DNS Query completed successfully

FORMERR

RCODE:1

DNS Query Format Error

SERVFAIL

RCODE:2

Server failed to complete the DNS request

NXDOMAIN

RCODE:3

Domain name does not exist.  

NOTIMP

RCODE:4

Function not implemented

REFUSED

RCODE:5

The server refused to answer for the query

YXDOMAIN

RCODE:6

Name that should not exist, does exist

XRRSET

RCODE:7

RRset that should not exist, does exist

NOTAUTH

RCODE:8

Server not authoritative for the zone

NOTZONE

RCODE:9

Name not in zone