One of the most ignored but most important pieces of your IT Infrastructure is backup and DR. It is often overlooked or pushed to the back burner because it costs money, and you don't see an immediate return. The ROI is low, so it becomes a low priority when it should be one of the highest priorities.
Despite the use of multiple SVMs, there remains a singular point of vulnerability - the storage system. In the event of a catastrophe such as an asteroid strike on the datacentre, the entire cluster could become inoperable, and your clients would be left without service unless you have made provisions for disaster recovery.
NetApp tackles tenant DR by using SVM-DR
Storage Virtual Machines (SVMs) run Data ONTAP and act as tenants in a cluster, representing individual divisions, companies, or test/prod environments. However, even with multiple SVMs, there is still a single point of failure - the storage system itself. In the event of a disaster, this could render the cluster unusable and result in significant downtime for clients. This is where SVM DR comes into play, allowing for disaster recovery at a granular SVM level.
SVM-DR does the following:
Leverages NetApp SnapMirror to replicate data to a secondary site and the new Configuration Replication Service (CRS) application to replicate SVM configuration, including CIFS/SMB shares, network information, NFS exports, etc.
Allows two versions of SVM DR – Identity Preserving and Identity Discarding.
Identity Preserving replicates the primary SVM's configuration and allows for a change to that identity in a failover scenario. This is useful for DR on the same physical campus/site (two separate buildings).
Identity Discarding allows for a different network configuration on a secondary SVM and brings it online as its own identity. This is useful for DR to a different geographical location in the world.
SVM-DR works by creating an SVM DR relationship/schedule, initialising the SnapMirror, ensuring updates are successful, and testing DR. In a test or real failover scenario, the SnapMirror is broken to allow for R/W operations. Depending on the identity type, the old identity is either preserved or discarded, and the SVM DR destination goes from dp-destination to default. Once the source site is back up, a resync/flip-resync occurs, where data written to the DR destination gets synced back to the source to ensure a current copy of data and configuration.
How it works
The flow of operation in SVM-DR is essentially:
- Create SVM DR relationship/schedule
- Initialise the SnapMirror
- Ensure updates are successful
- Test DR
When we test (or do a real failover) to DR, the following happens:
- SnapMirror break; break means we can now do R/W operations
- SnapMirror goes from SnapMirroredto broken-off
- Depending on identity type, we either preserve or discard old identity
- SVM DR destination goes from dp-destination to default
- Once the source site is back up, we can do a resync/reverse-resync
- When the reverse-resync written to DR destination gets synced back to source to ensure we have current copy of data and config; this uses a new SVM DR relationship
- After we’re synced up, the original SVM-DR relationship is re-established
- The reverse resync SnapMirror gets broken off and removed
- SVM-DR destination changes from default to dp-destination
- SnapMirror goes from broken-off to SnapMirrored