RPO Trending and Reporting

Eyeglass Isilon Edition Recovery Point Objective Trending and Reporting

Contents

  1. 1 RPO Monitoring What’s New
  2. 2 Recovery Point Objective Key Features
  3. 3 Check if RPO Trending and Reporting License is installed
  4. 4 Setup RPO Target by Cluster
    1. 4.1 Setup Email Notification
  5. 5 RPO Calculations
    1. 5.1 Maximum Age of Unreplicated Data
    2. 5.2 Job Duration
    3. 5.3 Amount of Replicated Data
    4. 5.4 Recovery Point Analysis
    5. 5.5 Average Data Transfer Rate in Mb per second
    6. 5.6 SyncIQ Jobs Troubleshooting
  6. 6 RPO Summary & Compliance Email Report
    1. 6.1 How to Generate the RPO on Demand and specify time period
      1. 6.1.1 Generate report from the cli
      2. 6.1.2 Generate on UI and change the reporting period
  7. 7 Example Report:
    1. 7.1 Number of Jobs
    2. 7.2 Total Amount of Replicated Data in GB
    3. 7.3 Percentage of Jobs Violating RPO
    4. 7.4 Average Amount of Replicated Data in GB
    5. 7.5 Average Job Duration in minutes
    6. 7.6 Average Data Transfer Rate in Mb per second
    7. 7.7 Recovery Point Analysis
    8. 7.8 SyncIQ Jobs Troubleshooting
    9. 7.9 SyncIQ Job RPO Chart
    10. 7.10 Data Transfer Chart
    11. 7.11 Am I meeting my RPO business target on my Isilon cluster?
    12. 7.12 Are my SyncIQ Policies performing as expected?
    13. 7.13 How far back can I recover my business data?
    14. 7.14 How much bandwidth is used and when is it used?
    15. 7.15 How to use the Average Data Transfer Rate Analysis?
    16. 7.16 Per SyncIQ and Cluster Wide Graph and data Examples
    17. 7.17 Example Cluster Wide RPO CSV file data
    18. 7.18 Example Per SyncIQ RPO CSV file data
  8. 8 Advanced Settings
    1. 8.1 Disable Screenshots for SyncIQ Job Report
    2. 8.2 Modify SyncIQ Job Report Schedule
    3. 8.3 Custom Schedule
    4. 8.4 Modify SyncIQ Job Troubleshooting Thresholds
    5. 8.5 To change the Troubleshooting Thresholds:
    6. 8.6 Modify SyncIQ Job Report Time Range



RPO Monitoring What’s New


As of Release 1.9, Eyeglass now includes:

  • RPO Reporting and backup monitoring report now track failed synciq jobs and report includes 30 day and last 24 hour  out per policy and per cluster of all failed synciq jobs that started but finished with an error code

  • This allows the report to track % completed SyncIQ jobs when used in a backup monitoring solution.


As of Release 1.8, Eyeglass now includes:

  • On demand report generation CLI and GUI option

  • Total GB transferred in reporting period as well as Avg GB transferred per Job in reporting period

  • Advanced Settings for Report Screenshot Enable/Disable, Report Time Range, Transfer Rate Troubleshooting Threshold, Interval Troubleshooting Threshold


As of Release 1.6, Eyeglass now includes:

  • All new features are automatically enabled after upgrade

    • per SyncIQ RPO reporting in the reports

    • CSV file with per cluster and per policy data included for including in Excel or other reporting tools

    • Automatic 30 Day rolling average per policy data loss in minutes graph png files attached to the email report for simple review and inclusion into reports or powerpoint

The solution now allows simple reporting for business units where a syncIQ policy or more than one policy is specific to a business unit that requires SLA on the DR service of the files they consume on a cluster.

New section added to this guide with examples of output of the Per Cluster and Per policy csv and images that are automatically generated now every night.


Recovery Point Objective Key Features

With the Eyeglass Isilon Edition Recovery Point Objective (RPO) Trending and Reporting feature you will have the data to answer these questions

  • Am I meeting my RPO business target on my Isilon cluster?

Eyeglass solution: Daily email with your RPO business target compliance by cluster over last 24 hours and last 30 days in a simple and easy to read summary report.


  • Are my SyncIQ Policies performing as expected?

Eyeglass solution: Daily email with SyncIQ policies by cluster where job interval or transfer rate do not meet historical performance over the last 30 days.


  • How do I optimize my SyncIQ schedule to lower my RPO to match my WAN bandwidth and data change rate?

Eyeglass solution: Tune your SyncIQ Replication based on tracking your data change rate with deep dive graphing that allows time of data, week, month trending to assist with lowering your replication schedule or increasing it to achieve shorter replication cycles with SyncIQ.


  • How far back can I recover my business data?

Eyeglass solution: Graph and trend your business's recovery point graphed as age of data in minutes in the past per SyncIQ policy.


  • How much bandwidth is used and when is it used?

Eyeglass solution: Graph and trend GB data transferred per SyncIQ Policy to help with WAN bandwidth planning and quality of service at the network lawyer to ensure your critical business data has adequate bandwidth to meet your RPO objectives.


Getting Started

Follow these 2 easy steps to get RPO Trending and Reporting up and running on your Eyeglass appliance:

  1. Setup RPO target by Cluster

  2. Setup Email Notification for daily reports to be enabled


Check if RPO Trending and Reporting License is installed

Eyeglass Isilon Edition RPO Trending and Reporting requires a separate feature license.  Open the Manage Licenses window to check your licenses.  If you see

license type “Isilon RPO Reporting” you are licensed for this feature.





If you do not have the Isilon RPO Reporting license, please contact your Eyeglass sales representative.


Setup RPO Target by Cluster


To do the analysis, Eyeglass requires you to enter your RPO target by cluster in minutes.  This is the target RPO that your company would like to achieve.  Eyeglass will calculate actual RPO achieved and provide comparison to the target entered here in daily emailed reports.


Note: This will be an average RPO for the entire cluster.


For a new cluster, the RPO is entered on the Add Isilon Cluster window in the Maximum RPO Value field.  Enter the RPO target in minutes.




To update the RPO value for an existing Cluster, open the Inventory View window, right click on the cluster you want to update and select Edit.




In the Edit Isilon Cluster window that opens enter the new RPO target in minutes in the Maximum RPO Value field and then Submit to save your changes.




Setup Email Notification


To receive the daily RPO compliance email, your Eyeglass appliance must have an email server configured and email recipient email addresses configured.  Please refer to the Eyeglass Isilon Quickstart Guide for details on setting up Email Notification.



RPO Calculations


General Notes

Data for RPO calculations are based on the SyncIQ Job Reports that OneFS generates each time a SyncIQ Policy Job is run.  These reports are collected as follows:

  • when an Isilon cluster is provisioned in Eyeglass the last 10 reports for each SyncIQ Policy are collected

  • once every 5 minutes, the last 10 reports for each SyncIQ Policy are collected

  • reports for failed jobs are not collected and are not included in the statistics

  • for a cancelled SyncIQ Job, the RPO calculation starts when the cancelled Job was started and ends when the next successful Job for that SyncIQ Policy is completed

  • if a SyncIQ Policy is deleted in OneFS, Eyeglass summary statistics include reports for the deleted policy until they are no longer relevant for the reported timeframe


Maximum Age of Unreplicated Data

The last completion time of a SyncIQ policy does not represent your recovery point.  The “age” plot assumes the change rate of the data was the same as the last SyncIQ policy report and this data is inflight but not yet successfully replicated.   


Example: Last successful completion of a SyncIQ policy was 10 minutes ago and the job took 5 minutes to replicate.  

For this example, the Maximum Age of Unreplicated Data is calculated as the 10 minutes since the Job was last successfully completed + the 5 minutes it would take to replicate the data (assumed to be the same as last time Job was successfully executed).  So, in this example a DR event would result in 15 minutes of data being lost.


Job Duration

The Job Duration reported in Eyeglass is the Duration that is reported in the OneFS SyncIQ Report for a policy Job.


Amount of Replicated Data

The Amount of Replicated Data reported in Eyeglass is the Total Data that is reported in the OneFS SyncIQ Report for a Policy Job.


Recovery Point Analysis

Using the target RPO and average SyncIQ Job duration, data transfer rate and data change rate for a cluster, Eyeglass will calculate a recommended SyncIQ Job interval to reduce the number of Jobs which violate the target RPO.


Average Data Transfer Rate in Mb per second

The Average Data Transfer Rate reported in Eyeglass is the Total Data that is reported in the OneFS SyncIQ Report for the Cluster Policy Jobs divided by the total duration of all Jobs for the reported time period.


SyncIQ Jobs Troubleshooting

SyncIQ Policy Interval and Transfer rate are assessed against the average SyncIQ Policy interval and transfer rate over the last 30 days.  

Note:

Interval is calculated as the difference between the start time of 2 consecutive SyncIQ Jobs. Interval may not be the same as the schedule for example if a Job takes longer to run than its schedule.


The following anomalies are reported in the daily email report by policy:

  • the number of jobs where the transfer rate is at least 50% below the average transfer rate (the measured difference factor - ie 50% - is configurable -  please refer to the advanced configuration section of this document for details)

  • the number of SyncIQ Policy jobs that did not run in the last 24 hours, that should have been run based on the 30 day average interval

  • the number of jobs where the interval is greater than double the average interval (the measured difference factor - ie 2x or double - is configurable - please refer to the advanced configuration section of this document for details)



RPO Summary & Compliance Email Report

Prerequisites:

  • RPO target configured for each cluster managed by Eyeglass

  • Eyeglass email notification configured

  • Isilon RPO Feature License


How to Generate the RPO on Demand and specify time period


Follow these step to generate a report on demand for current reporting period that is 24 hours or change the data range that report covers and have the report emailed.

Generate report from the cli

  1. Follow the admin guide CLI guide to execute the report generation

  2. See admin guide http://documentation.superna.net/eyeglass-isilon-edition/igls-administration/eyeglass-administration-guide#TOC-RPO-Reporting-CLI-commands

Generate on UI and change the reporting period

  1. Screen Shot 2017-03-27 at 8.46.07 PM.pngScreen Shot 2017-03-27 at 8.46.07 PM.pngScreen Shot 2017-03-27 at 8.46.07 PM.pngScreen Shot 2017-03-27 at 8.46.00 PM.png


Details

The RPO Summary and Compliance Email Report is sent out each day at midnight.  It contains the following information for each cluster managed by Eyeglass:


Field

Description

RPO

The RPO target in minutes as entered into Eyeglass for the cluster.

Number of Jobs

The total number of SyncIQ Policy Jobs that ran over the previous 24 hours and the previous 30 days by cluster.

Total Amount of Replicated Data in GB

Total amount of replicated data in GB per cluster.


This statistic is provided based on the total number of SyncIQ Jobs that ran over the previous 24 hours and the previous 30 days for each cluster managed by Eyeglass.

Percentage of Jobs Violating RPO

Percentage of SyncIQ Jobs where the maximum age of the unreplicated data has been calculated to be greater than the RPO target for the cluster.


This statistic is calculated based on the total number of SyncIQ Jobs that ran over the previous 24 hours and the previous 30 days for each cluster managed by Eyeglass.

Average Amount of Replicated Data in GB

Average amount of replicated data in GB per SyncIQ Job.


This statistic is provided based on the total number of SyncIQ Jobs that ran over the previous 24 hours and the previous 30 days for each cluster managed by Eyeglass.

Average Job Duration in minutes

Average Job Duration in minutes.


This statistic is provided based on the total number of SyncIQ Jobs that ran over the previous 24 hours and the previous 30 days for each cluster managed by Eyeglass.

Average Data Transfer Rate in Mb per second

Average Data Transfer Rate in Mb per second

This statistic is provided based on the total number of SyncIQ Jobs that ran over the previous 24 hours and the previous 30 days for each cluster managed by Eyeglass.

Recovery Point Analysis - Diagnostics

Diagnostics provides a recommendation for SyncIQ Policy schedule such that the RPO target for the cluster can be better met.

SyncIQ Job Troubleshooting

  • Summary of any SyncIQ Policies where the Job interval or transfer rate fall below historical average over the last 30 days.  This may indicate networking issue, or cluster resource on the cluster that is impacting replication performance.  Adding more nodes to the SyncIQ pool for replication or worker threads should be increased.

  • Jobs that failed to run in the last 24 hours (no job report) are also captured on this list and should be investigated on the cluster to see why the job failed to run.  



Example Report:

SyncIQ Jobs Report 2016-12-16 00:00:00 UTC

Number of Jobs

Cluster

last 24 hours

last 30 days

ds-sim-8-1

12

23

ds-sim-8-2

0

21

Total Amount of Replicated Data in GB

Cluster

last 24 hours

last 30 days

ds-sim-8-1

less than 0.01

less than 0.01

ds-sim-8-2

no data

less than 0.01

Percentage of Jobs Violating RPO

Cluster

RPO

last 24 hours

last 30 days

ds-sim-8-1

3

100

43

ds-sim-8-2

5

no data

23

Average Amount of Replicated Data in GB

Cluster

last 24 hours

last 30 days

ds-sim-8-1

0.00

0.00

ds-sim-8-2

no data

0.00

Average Job Duration in minutes

Cluster

last 24 hours

last 30 days

ds-sim-8-1

0.18

0.13

ds-sim-8-2

no data

0.14

Average Data Transfer Rate in Mb per second

Cluster

last 24 hours

last 30 days

ds-sim-8-1

0.00

0.00

ds-sim-8-2

no data

0.00

Recovery Point Analysis

Cluster

Diagnostics

ds-sim-8-2


ds-sim-8-1


Policy accesszone1: setting to run job at the interval 2 minutes per 24 hour period will lower the RPO violation rate.

Refer to the Eyeglass documentation for the explanation of the calculation of diagnostics

Eyeglass Documentation

SyncIQ Jobs Troubleshooting

ds-sim-8-2

Policy Name

Job ID

Detected Problems




EyeglassRunbookRobot_mirror


No SyncIQ jobs have been run for the last 24 hours. The average 30 days interval between jobs is 320.14minutes.

ds-sim-8-1

Policy Name

Job ID

Detected Problems




test


Found 43 jobs that have the transfer rate lower than the policy average rate 193.92..

Refer to the Eyeglass documentation for the explanation of the calculation of diagnostics

Eyeglass Documentation

** average transfer rate in Mb/s


The Charts

Pre-requisites

  • RPO target configured for each cluster managed by Eyeglass

  • Isilon RPO Feature License


SyncIQ Job RPO Chart


The Eyeglass Job Duration Chart plots the Maximum Age of Unreplicated Data per SyncIQ Job successfully run in minutes over time and on the same chart the Job Duration for each SyncIQ Job run. Up to 6 SyncIQ Policies can be selected to be graphed on the same chart.  


To generate the Data Transfer Chart:

  1. Login to the Eyeglass web page

  2. Open the DR Dashboard.

  3. Select the checkbox for the policy of interest.

  4. For a multi-Job chart, up to 5 additional policies can be selected.

  5. Select the Generate SyncIQ Job Charts button.

  6. Select the From and To date and time in the Report Time Range Setting window.

  7. Select the Launch SynciQ Job RPO Chart button.

  8. The SyncIQ Job RPO chart opens.

    • Maximum age of Unreplicated Data plotted in minutes corresponding to each SyncIQ Job successfully executed (diamond marker)

    • Job Duration plotted in minutes corresponding to each SyncIQ Job successfully executed (circle marker)

    • Each policy plotted in a different colour

    • mouse over a data point to see the details




  9.  Select a cluster on the right hand table to add the RPO target for that cluster to the graph.  Points above the RPO target have not met the target and are displayed in red.  You can use the RPO slider at the bottom of the chart to see the effect of changing RPO target.





Data Transfer Chart


The Eyeglass Data Transfer Chart plots the total amount of data replicated per SyncIQ Job successfully run in GB over time. Up to 6 SyncIQ Policies can be selected to be graphed on the same chart.  


To generate the Data Transfer Chart:

  1. Login to the Eyeglass web page

  2. Open the DR Dashboard.

  3. Select the checkbox for the policy of interest.

  4. For a multi-Job chart, up to 5 additional policies can be selected.

  5. Select the Generate SyncIQ Job Charts button.

  6. Select the From and To date and time in the Report Time Range Setting window.

  7. Select the Launch SyncIQ Job Data Transfer Chart  button.

  8. The SyncIQ Job Data Transfer chart opens.

  • Total data transferred is plotted in GB for each job executed for the selected policies in the selected time period

  • Each policy plotted in a different colour

  • mouse over a data point to see the details





Am I meeting my RPO business target on my Isilon cluster?


Use the Eyeglass daily RPO compliance email to quickly and easily gain a view of whether or not you are meeting your business targets.  For each cluster that Eyeglass is managing, you will receive an email comparing the last 24 hours to the last 30 days.  To get the details you can use the Eyeglass graphs to find out why RPO times might be increasing outside your targets.


The summary report provides the % of SyncIQ jobs that failed to meet the objectives and which provides a quick summary of your targets per cluster.


Are my SyncIQ Policies performing as expected?

Use the Eyeglass daily email to quickly and easily gain a view whether your SyncIQ Policies are performing as expected.  The SyncIQ Jobs Troubleshooting section will highlight by cluster:

  • which SyncIQ Policies did not run that should have run in the last 24 hours

  • which SyncIQ Policies had jobs where the transfer rate was lower than the policy average rate over the last 30 days

  • which SyncIQ Policies had jobs where the interval is greater than the policy average interval over the last 30 days


How far back can I recover my business data?

Use the Eyeglass Job Duration Chart to analyse the maximum age of your data per SyncIQ policy over a selected time period.  The maximum age number is calculated based on the last time the data was successfully replicated to the remote cluster plus the time it would take to replicate the same amount of data.  



This chart can be customized:

  • start and end date and time

  • 1 to 6 SyncIQ policies on the same graph

  • moving RPO scale to show which days and time the SyncIQ data “Age” exceeded your cluster target.   This is shown as red dots above the dotted line.

  • To ensure 100% of all change rate data is below your target to understand your worst case RPO value within the time period, slide the RPO slider to the right until no Red plots exist.  This value is now the worst case data age exposure.




How much bandwidth is used and when is it used?

Use the Eyeglass Job Data Transfer Chart to analyse the maximum age of your data per SyncIQ policy over a selected time period.  This chart can be customized:

  • start and end date and time

  • 1 to 6 SyncIQ policies on the same graph



How to use the Average Data Transfer Rate Analysis?

This provides an overall view of the average WAN rate in Mbps that is required to maintain the RPO in the reports.   If the goal is to lower the RPO this number can be used to provide the WAN or network team input on the current network load required to meet current RPO levels.   

Network QOS or SyncIQ threads per node can be verified to increase network throughput and use this report summary to track improvements in WAN throughput.


Per SyncIQ and Cluster Wide Graph and data Examples

Example #1 of 30 day Per SyncIQ graph

Note:  File name is the name of the syncIQ policy.png




Example #2 of 30 day Per SyncIQ graph

Note:  File name is the name of the syncIQ policy.png





Example Cluster Wide RPO CSV file data


Example Per SyncIQ RPO CSV file data


Advanced Settings

Run SyncIQ Job Report On-Demand

To run the SyncIQ Job Report On-Demand:

  1. ssh to the Eyeglass appliance

  2. Login as the admin user

  3. Enter the command

igls adv runreports

The time that the command is run is the starting time for the report and associated calculations.

Disable Screenshots for SyncIQ Job Report

To disable screenshots for the SyncIQ Job Report:

  1. ssh to the Eyeglass appliance

  2. Login as the admin user

  3. Enter the command

igls adv skipscreenshots set --skip=true

(To enable screenshots: igls adv skipscreenshots set --skip=false)

Modify SyncIQ Job Report Schedule

Standard Schedule

To change the SyncIQ Job Report Schedule to a standard schedule ( 1M 2M 3M 4M 5M 6M 10M 15M 20M 30M 1H 2H 3H 4H 6H 8H 12H 1D 7D 31D):

Note: Default schedule is 1D (once every 24 hours at midnight)

  1. ssh to the Eyeglass appliance

  2. Login as the admin user

  3. Enter the command

igls admin schedules set --id InventoryReport --interval <interval>

Example

igls admin schedules set --id InventoryReport --interval 12H


Custom Schedule

To change the SyncIQ Job Report Schedule to a custom schedule (for example, to change schedule to run at 09:00 every day)

  1. ssh to the Eyeglass appliance

  2. sudo su to root user (default admin password 3y3gl4ss)

  3. cd /opt/superna/sca/data

  4. make a backup of the file we are going to edit

    1. cp sync.xml sync.xml.bak

  5. vi sync.xml

  6. Update the line that starts with the tag <InventoryReport so that cron string is correct for the interval you would like the report to run at.  Example below to run the report daily at 09:00

    1. <InventoryReport IsConfigurable="true" Label="Eyeglass Reports">0 9 * * *</InventoryReport>

  7. Save your changes

  8. Restart the Eyeglass sca service

    1. systemctl restart sca


Modify SyncIQ Job Troubleshooting Thresholds

SyncIQ Job Troubleshooting Thresholds are the factors used when comparing SyncIQ Job Report data to the 30 day average data to determine whether a Troubleshooting notification should be posted.  There are 2 thresholds configured:

  1. Transfer Rate Threshold

The 24 hour Transfer Rate Troubleshooting notice is posted when the 24 hour Transfer rate is less than the 30 day Average Transfer Rate / Transfer Rate Threshold.  By default the Transfer Rate Threshold is 2.  Thus SyncIQ Job Troubleshooting notice is posted when the 24 hour Transfer Rate is less that 50% of the average Transfer Rate over the last 30 days.

  1. Interval Threshold

The 24 hour Interval Troubleshooting notice is posted when the 24 hour Interval is greater than the 30 day Average Interval * Interval Threshold.  By default the Interval Threshold is 2.  Thus SyncIQ Job Troubleshooting notice is posted when the 24 hour Interval is more than double the average Interval over the last 30 days.

To change the Troubleshooting Thresholds:

  1. ssh to the Eyeglass appliance

  2. sudo su to root user (default admin password 3y3gl4ss)

  3. cd /opt/superna/sca/data

  4. make a backup of the file we are going to edit

    1. cp system.xml system.xml.bak

  5. vi system.xml

  6. To update the Transfer Rate Threshold, edit the line that starts with the tag <transferatethld>

Example below to change to 3 - meaning Troubleshooting message only posted when Transfer Rate is less than ⅓ of the average 30 day Transfer Rate

<transferatethld>3</transferatethld>

  1. To update the Interval Threshold, edit the line that starts with the tag <intervalthld>

Example below to change to 5 - meaning Troubleshooting message only posted when Interval is greater than 5 times the average 30 day Interval

<intervalthld>5</intervalthld>

  1. Save your changes

  2. Restart the Eyeglass sca service

    1. systemctl restart sca


Modify SyncIQ Job Report Time Range

SyncIQ Job Job Report Time Range is the number of hours SyncIQ Job Report data is analyzed from the time the report is run. By default, last 24 hours from the time the report was run are analysed.  To customize the Report Time Range:

  1. ssh to the Eyeglass appliance

  2. sudo su to root user (default admin password 3y3gl4ss)

  3. cd /opt/superna/sca/data

  4. make a backup of the file we are going to edit

    1. cp system.xml system.xml.bak

  5. vi system.xml

  6. Edit the line that starts with the tag <reporttimerange>

Example below to change to change the Report Time Range to 12 hours (value in hours)

<reporttimerange>24</reporttimerange>

  1. Save your changes

  2. Restart the Eyeglass sca service

    1. systemctl restart sca