1. Trang chủ
  2. » Tất cả

dell-emc-sc-series-sync-replication-live-volume

104 3 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 104
Dung lượng 6,07 MB

Cấu trúc

  • 1.1 Features of SC Series synchronous replication (7)
  • 1.2 Synchronous replication requirements (7)
  • 2.1 Replication methods (9)
  • 3.1 Modes of operation (13)
  • 3.2 Minimal recopy (14)
  • 3.3 Asynchronous replication capabilities (15)
  • 3.4 Multiple replication topologies (16)
  • 3.5 Live Volume (17)
  • 3.6 Dell Storage Manager recommendations (18)
  • 3.7 Dell Storage Manager DR recovery (19)
  • 3.8 Support for VMware vSphere Site Recovery Manager (19)
  • 4.1 Overview (20)
  • 4.2 High consistency (20)
  • 4.3 High availability (22)
  • 4.4 Remote database replicas (25)
  • 4.5 Disaster recovery (26)
  • 5.1 Reference architecture (34)
  • 5.2 Proxy data access (36)
  • 5.3 Live Volume ALUA (37)
  • 5.4 Live Volume connectivity requirements (40)
  • 5.5 Replication and Live Volume attributes (42)
  • 6.1 Primary and secondary Live Volume (46)
  • 7.1 MPIO policies for Live Volume (47)
  • 8.1 Path Selection Policies (PSP) (49)
  • 8.2 Round Robin with Live Volume ALUA (49)
  • 8.3 Fixed (52)
  • 8.4 Single-site MPIO configuration (52)
  • 8.5 Multi-site MPIO configuration (53)
  • 8.6 VMware vMotion and Live Volume (54)
  • 8.7 vSphere Metro Storage Cluster (54)
  • 8.8 Live Volume automatic failover (55)
  • 8.9 vMSC storage presentation (55)
  • 8.10 Tiebreaker service (58)
  • 8.11 Common automatic failover scenarios (58)
  • 8.12 Detailed failure scenarios (61)
  • 8.13 Live Volume automatic restore (63)
  • 8.14 VMware DRS/HA and Live Volume (64)
  • 8.15 vSphere Metro Storage Cluster and Live Volume Requirements (67)
  • 8.16 VMware and Live Volume managed replication (68)
  • 9.1 MPIO (70)
  • 9.2 Round Robin (70)
  • 9.3 Round Robin with Subset (ALUA) (71)
  • 9.4 Windows Server support limitations with Live Volume ALUA (72)
  • 9.5 Failover Only (73)
  • 9.6 Uniform server mappings with Live Volume and Round Robin (74)
  • 9.7 Hyper-V and Live Volume (75)
  • 9.8 SCVMM/SCCM and Performance and Resource Optimization (PRO) (76)
  • 9.9 Live Volume and Cluster Shared Volumes (76)
  • 9.10 Live Volume automatic failover for Microsoft (77)
  • 9.11 Live Volume with SQL Server (82)
  • 10.1 Live Volume and Synchronous Replication (83)
  • 10.2 Live Volume managed replication (83)
  • 10.3 Live Volume automatic failover (84)
  • 10.4 Live Volume and Linux MPIO (84)
  • 10.5 Live Volume with ALUA (86)
  • 10.6 Identify parent SC Series arrays for Linux storage paths (90)
  • 10.7 Use cases (93)
  • 11.1 Zero-downtime SAN maintenance and data migration (99)
  • 11.2 Storage migration for virtual machine migration (100)
  • 11.3 Disaster avoidance and disaster recovery (101)
  • 11.4 On-demand load distribution (102)
  • 11.5 Cloud computing (102)
  • 11.6 Replay Manager and Live Volume (103)
  • A.1 Related resources (104)

Nội dung

Features of SC Series synchronous replication

Mode migration: Existing replications may be migrated to an alternate type without rebuilding the replication or reseeding data

Live Volume support: Live Volumes may leverage any available type of replication offered with SC Series storage including both modes of synchronous (high consistency or high availability) and asynchronous

Live Volume managed replication: Live Volume allows an additional synchronous or asynchronous replication to a third SC Series array that can be DR activated using Dell™ Storage Manager (DSM)

In the case of an unexpected outage affecting the primary Live Volume, users can manually promote the secondary Live Volume to take over as the primary using DSM This process ensures continuity of service and minimizes downtime during critical situations.

Live Volume automatic failover: In the event an unplanned outage occurs impacting availability of a primary

Live Volume, the secondary Live Volume can be promoted to the primary Live Volume role automatically

Live Volume automatic restore: After Live Volume automatic failover has occurred, Live Volume pairs may be automatically repaired after the impacted site becomes available.

Synchronous replication requirements

Replicating volumes between SC Series systems requires a combination of software, licensing, storage, and fabric infrastructure The following sections itemize each requirement

Dell™ Storage Manager (DSM) 2018 or newer is required to leverage all available replication and Live Volume features

Dell Storage Center OS (SCOS) 7.3 or newer is required to leverage all available replication and Live Volume features

Replication licensing, encompassing synchronous and asynchronous replication, is essential for every SC Series array involved in volume replication Furthermore, a Live Volume license is necessary for utilizing all Live Volume features Notably, Dell EMC SC All-Flash storage arrays, including the SC5020F and SC7020F, come with replication and Live Volume licensing included.

SC Series systems enable array-based replication through Fibre Channel (FC) or iSCSI connectivity, without necessitating a dedicated network; however, implementing isolation for enhanced performance or security is advisable Synchronous replication demands higher bandwidth and lower latency compared to asynchronous replication, as applications and end users are sensitive to the effects of increased latency.

Data replication is a crucial strategy for ensuring data protection and availability, driven by the need to manage significant data growth, reduce backup windows, and enhance disaster recovery solutions As traditional backup methods became less effective due to increasing data volumes and availability challenges, the demand for continuous data protection (CDP) emerged, particularly in the context of e-commerce and high transaction rates Replication serves multiple purposes, including disaster recovery, high availability, minimizing transaction loss, and providing a flexible environment for development and testing Ultimately, effective data protection is vital for safeguarding an organization's reputation by ensuring the security of end-user data.

Replication methods

There are two prominent replication methods widely recognized today: asynchronous and synchronous The SC Series arrays offer a versatile range of replication techniques that are classified under these two categories.

Synchronous replication ensures zero data loss and data consistency by requiring that write I/O commitments are made at both the source and destination before sending a successful write acknowledgment to the storage host and application If a write cannot be committed at either location, it will not proceed, maintaining consistency In the event of a write failure, an error is communicated back to the storage host, allowing application error handling to determine the next steps for the pending transaction This method alone provides Continuous Data Protection (CDP), and when combined with hardware redundancy, application clustering, and failover resiliency, it achieves continuous availability for applications and data.

Synchronous replication ensures data consistency, but any issues affecting the source or destination storage, or the replication link, can lead to increased latency and reduced availability for applications This also applies to Live Volumes utilizing synchronous replication Therefore, it is crucial to properly size the performance of both source and destination storage, as well as the replication bandwidth and any upstream infrastructure that supports the storage.

Figure 1 demonstrates the write I/O pattern sequence with synchronous replication:

1 The application or server sends a write request to the source volume

2 The write I/O is mirrored to the destination volume

3 The mirrored write I/O is committed to the destination volume

4 The write commit at the destination is acknowledged back to the source

5 The write I/O is committed to the source volume

6 Finally, the write acknowledgement is sent to the application or server

The process is repeated for each write I/O requested by the application or server

Asynchronous replication achieves data protection by replicating data from source storage to destination storage, but it differs from synchronous replication in the method and frequency of this process In asynchronous replication, writes are committed only at the source, and an acknowledgment is sent to the storage host and application The committed writes are then accumulated and replicated to the destination volume in batches at scheduled intervals, where they are subsequently committed.

Asynchronous replication in SC Series storage is linked to the source volume's replication schedule, allowing new snapshots created on the source volume to be replicated to the destination volume Snapshots can be generated automatically based on a schedule or manually through various integration tools, occurring on a per-volume basis This enables volumes to follow independent replication schedules or share schedules with others using the same snapshot profile Known as point-in-time replication, this method leverages volume snapshots and ensures that asynchronously replicated transactions do not experience delays from write committals at the destination, thereby preventing application or transaction latency at the source volume.

Figure 2 demonstrates the write I/O pattern sequence with respect to asynchronous replication

1 The application or server sends a write request to the source volume

2 The write I/O is committed to the source volume

3 Finally, the write acknowledgement is sent to the application or server

The process is repeated for each write I/O requested by the application or server

4 Periodically, a batch of write I/Os that have already been committed to the source volume are transferred to the destination volume

5 The write I/Os are committed to the destination volume

6 A batch acknowledgement is sent to the source

The SC Series storage features semi-synchronous replication, which allows application transactions to be instantly sent to the replication destination, provided that the replication link and destination storage can handle the current data change rate Unlike synchronous replication, write I/O is confirmed at the source volume, sending an acknowledgment to the storage host and application without ensuring the write I/O is committed at the destination This replication method is set up in Dell Storage Manager by establishing asynchronous replication between two volumes and enabling the "Replicate Active Snapshot" option An Active Snapshot represents newly written or updated data that hasn't yet been frozen While semi-synchronous replication provides a recovery point objective (RPO) similar to synchronous replication without adding application latency, it cannot guarantee RPO or prevent data loss during unplanned outages.

Figure 3 demonstrates the write I/O pattern sequence with semi-synchronous replication

1 The application or server sends a write request to the source volume

2 The write I/O is committed to the source volume

3 The write acknowledgement is sent to the application or server

The process is repeated for each write I/O requested by the application or server

Each write I/O operation involves a parallel process that includes sending the write request to the destination, committing the write I/O at the destination, and sending a write acknowledgment of the mirror copy back to the source array.

The commits at the source and destination volumes are not guaranteed to be in lockstep with each other

Semi-synchronous replication write I/O sequence

SC Series storage supports a wide variety of replication features Each feature is outlined in the following sections.

Modes of operation

Recent advancements in synchronous replication for SC Series arrays include the ability to select replication modes on a per-volume basis Users can configure synchronous replication in either high consistency or high availability modes, allowing for tailored performance based on specific needs.

Synchronous replications established before SCOS 6.3 are classified as legacy after upgrading to SCOS 6.3 or later, and these legacy replications cannot be created or utilize the new features available in the updated version To transition a legacy synchronous replication to either synchronous high consistency or synchronous high availability replication, it is necessary to delete the legacy replication and recreate it after ensuring that both source and destination SC Series arrays are running SCOS 6.3 or newer It is important to note that this process will lead to temporary data inconsistency between the replication source and destination volumes until the initial and journaled replication is fully completed.

Synchronous high consistency mode adheres strictly to the storage industry's specifications for synchronous replication, ensuring data consistency between source and destination volumes unless paused by an administrator Latency can adversely affect applications if the replication link or destination volume cannot handle the data being replicated If write transactions cannot be committed to the destination volume, they will also fail on the source volume, potentially leading to application failures when a threshold of write failures is reached Therefore, addressing application latency and ensuring high availability are crucial considerations in storage designs that implement synchronous replication in high consistency mode.

Synchronous high availability mode modifies traditional synchronous replication by easing the strict requirements of high consistency mode When the replication link and the destination storage can handle the write throughput, this mode operates similarly to high consistency mode, ensuring that data is consistently committed at both the source and destination volumes However, any additional latency in the replication link or destination storage will manifest as application latency at the source volume.

High availability mode prioritizes data availability over data consistency, meaning that if the replication link or destination storage becomes unavailable or experiences high latency, the SC Series array will eliminate the dual write commitment requirement at the destination volume This allows application write transactions to proceed without delays, in contrast to high consistency mode, where write I/O is halted or slowed When the SC Series array enters an "out-of-date" state, inconsistent write I/O is recorded at the source volume Once the destination volume is accessible within an acceptable latency threshold, the journaled I/O is flushed and committed, ensuring that both volumes are synchronized and consistent After this process, application latency at the source volume returns to normal, highlighting the importance of balancing application latency and data consistency in designs that utilize synchronous replication in high availability mode.

High availability mode synchronous replication in an out-of-date state

In SCOS 6.5 and newer, users can seamlessly migrate replications between different modes—such as from asynchronous to synchronous high consistency, or from synchronous high availability to asynchronous—without the need to destroy and rebuild the replication and destination replica volumes This mode migration feature not only saves significant time and replication bandwidth but also minimizes the risk of data availability during the transition Additionally, it preserves predefined disaster recovery settings in Dell Storage Manager linked to restore points and replica volumes For these compelling reasons, leveraging this feature is highly recommended.

Note: This feature is compatible with all replication modes except legacy synchronous replication.

Minimal recopy

Synchronous replications in high availability mode permit write access to the source volume if the destination volume is unavailable or lagging A journalizing mechanism tracks write I/O to maintain consistency between the source and destination volumes Prior to SCOS 6.3, legacy replication required complete re-replication of the source data when the destination became available after being offline However, the minimal recopy feature now enables replication of only the changed data from the journal, significantly reducing recovery time, minimizing data inconsistency risks, and conserving replication link bandwidth This feature is also utilized in high consistency mode if the destination volume is unavailable during initial synchronization or if replication is paused by an administrator.

Flushing journaled writes to the destination volume to regain volume consistency

Asynchronous replication capabilities

Synchronous replication has seen numerous improvements over time and includes key features that were previously associated only with asynchronous replication

The SC Series now offers enhanced asynchronous replication of snapshots, allowing automatic replication of all snapshot data from the source to the destination This improvement provides customers with greater flexibility in recovery options, as they can access multiple historical restore points Additionally, the integration of snapshot functionality with synchronous replication and consistency groups ensures snapshot interval consistency across replicated volumes In high consistency mode, this guarantees reliable snapshot consistency.

In high availability mode, snapshot consistency is highly likely

Note: Consistent snapshots may be created for asynchronous and synchronous replications However, consistent snapshots are not supported with Live Volumes

Synchronous replications set to high consistency or high availability modes can be paused without affecting the availability of applications dependent on the source volume This pause can help reduce bandwidth utilization on the replication link, allowing other processes to take priority when bandwidth is shared Additionally, pausing replication is beneficial when anticipating scheduled outages of the replication link or fabric.

Multiple replication topologies

Dell enhances synchronous replication capabilities, allowing for support beyond just two SC Series volumes located at the same or different sites Users can choose from two distinct topologies or opt for a hybrid combination of both for greater flexibility.

The mixed topology, or 1-to-N configuration (with N=2 in SCOS 6.5), enables a source volume to be replicated to two destination volumes, allowing one replication to be either synchronous or asynchronous while the others remain asynchronous This setup, limited by the value of N, is essential for safeguarding data across multiple locations, providing flexible recovery options when data restoration is required.

If the volume replication source becomes unavailable, volume replication stops

The source volume of a replication becomes unavailable and replication stops

For recovery purposes, the replica can be activated and mapped by Dell Storage Manager to a storage host (for instance, at a disaster recovery site)

In a mixed topology, a replica volume can replicate to another replica—either asynchronously or synchronously—without the need to reseed most of the data that both volumes already possess prior to the original source volume becoming unavailable This setup is particularly beneficial when multiple disaster recovery sites are in place.

After DR activation, a replica volume can be replicated to another replica with efficiency

The cascade topology facilitates the chaining of asynchronous replications to both synchronous and asynchronous replication destination volumes, ensuring immediate reprotection for recovery sites This flexible approach, akin to mixed topology, allows for diverse data recovery and business continuity options, whether within the same data center or at a remote location Common applications include creating replicas of Microsoft® SQL Server® or Oracle® databases for parallel testing, development, or QA environments.

A hybrid topology can also be created by combining mixed and cascade topology types This configuration is adaptable to virtually any replica or data protection needs a business may require

Live Volume

The Live Volume feature, detailed further in this document, relies on replication technology In SCOS versions before 6.5, it was only compatible with asynchronous replication However, starting with SCOS 6.5, Live Volume is now designed to function with both asynchronous and synchronous replication methods.

In addition, Live Volume supports many of the current synchronous replication features such as modes of operation and mode migration

In SCOS 6.5 and newer, data recovery from a secondary Live Volume is enhanced, offering increased speed, ease, and flexibility when the primary Live Volume is unavailable Users can promote secondary Live Volumes to the primary role, maintaining volume identity and storage host mappings, or alternatively, recover data by creating a new View Volume and mapping it to one or more storage hosts.

SCOS 6.7 has implemented automated failover for Live Volumes, streamlining the recovery process during unplanned outages This feature operates similarly to the Preserve Live Volume function but is fully automated, enabling rapid recovery within seconds and at scale.

Live Volume's automatic failover ensures high availability for configured volumes during unplanned outages, but the risk to volume availability increases if another unexpected event occurs If the initial outage is minor and the site can be restored, the automatic restore feature will return the Live Volume to a redundant state without requiring administrator intervention It's important to note that a Live Volume role swap does not take place during this process; therefore, a Secondary Live Volume that became the Primary during the automatic failover will continue to function as the Primary after the restore.

Live Volume managed replication is a supplementary replication method that utilizes the primary Live Volume as its source This type of replication can be synchronous or asynchronous, based on the configuration of the Live Volume To ensure data integrity and consistency, the Live Volume managed replication continuously tracks the primary Live Volume during role swaps or failovers.

Live Volume managed replication before and after swap role or failover

Dell Storage Manager recommendations

Dell Storage Manager periodically checks the status of replication and records the progress of completeness

In the event of a source site failure, DSM recommends utilizing the destination replica for recovery For this recommendation to be deemed safe, high consistency synchronous replication must ensure that data between the source and destination remains consistent.

High availability synchronous replication ensures that data between source and destination volumes can vary in consistency based on the replication status during a failure If replication is in sync at the time of failure, the destination replica volume is deemed data consistent and safe for recovery, as advised by DSM.

If synchronous replication is outdated, it indicates that journaled transactions on the source volume have not been replicated to the destination, resulting in an inconsistent destination replica that is not recommended for use In such cases, data recovery options include utilizing a data-consistent snapshot as the recovery point or continuing with the inconsistent replica While the most recent transactions will be lost at the destination, recovering from a snapshot offers a precise point in time for recovery.

Dell Storage Manager DR recovery

Synchronous replication volumes are supported in the scope of the DSM predefined disaster recovery and

Disaster Recovery (DR) activation features allow users to apply the same test and execution processes used with asynchronously replicated volumes to synchronously replicated volumes The Data Storage Manager (DSM) offers essential functionalities at no cost to SC Series customers, making it a valuable and economical solution for enhancing recovery time objectives However, it's important to note that DR settings cannot be predefined for Live Volumes, and Live Volume restore points cannot be test activated.

Support for VMware vSphere Site Recovery Manager

Standard asynchronous or synchronous (either mode) replication types can be leveraged by VMware ® vSphere ® Site Recovery Manager (SRM) protection groups, recovery plans, and reprotection

SRM version 6.1 introduced support for stretched storage with Live Volume in DSM 2016 R1, with deployment configurations detailed in the Dell EMC SC Series Best Practices with VMware Site Recovery Manager document For additional insights on use cases and the integration of stretched storage with SRM, refer to the VMware Site Recovery Manager Administration documentation.

Data replication is a powerful tool that becomes truly effective when aligned with specific business use cases This article explores various scenarios where synchronous replication can be utilized to achieve organizational objectives.

Overview

Array-based replication is essential for ensuring high availability and disaster recovery in upper-tier applications, enabling effective image or file-level backup and recovery It also serves as a valuable development tool for creating data copies in near or remote locations for application development and testing Asynchronous replication strikes a favorable balance between recovery point objective (RPO) and recovery time objective (RTO) service level agreements, making it a cost-effective solution that avoids the need for expensive infrastructure like dark fibre, additional networking hardware, or extra storage Consequently, it is commonly employed between data centers over longer distances.

An increasing number of designs prioritize data loss prevention, with synchronous replication being the method that ensures zero transaction loss The following sections will explore examples of synchronous replication, emphasizing high consistency for complete data integrity and high availability for scenarios with more flexible data consistency needs.

High consistency

Synchronous replication is essential for preventing data loss and ensuring data consistency between the source and destination replica volumes It offers significant data protection advantages for both proactive and reactive scenarios For more detailed information on the operational characteristics of synchronous high consistency replication, please refer to section 3.

Virtualized server workloads in data centers are encapsulated into a few files representing virtual BIOS, hardware resources, and disks, which facilitate data access The I/O profile varies based on the virtual machine's role and the applications it hosts Virtual machines excel in replication due to their portable and hardware-independent compute resources, allowing for seamless migration between sites with minimal effort This mobility, coupled with storage replication, enables relocation for load balancing or disaster recovery while maintaining high consistency through synchronous replication Consequently, when migrating vSphere or Microsoft® Hyper-V® virtual machines to a new host or cluster, data consistency is assured, ensuring that the virtual machine's contents remain synchronized at both the source and destination sites For further details on disaster recovery, refer to section 4.5.

Dell Storage Manager does not support predefined disaster recovery (DR) plans with Live Volumes However, it does allow for predefined DR plans using standard asynchronous or synchronous volume replications, as well as managed cascading or hybrid asynchronous replications originating from a Live Volume.

High consistency synchronous with consolidated vSphere or Hyper-V sites

A replication link or destination volume issue in St Paul results in a VM outage in Minneapolis

4.2.2 Microsoft SQL Server and Oracle Database/Oracle RAC

Database servers and clusters in critical environments are engineered for high availability, large throughput, and low latency data access for application servers and users Unlike virtual machines, the primary focus for database servers is the protection of database volumes rather than the operating system, which is less critical for data recovery Booting from SAN and replicating that SAN volume to a compatible remote site can significantly enhance Recovery Time Objectives (RTO) In performance-oriented designs, critical data is often distributed across multiple volumes, necessitating application or instance isolation High consistency mode ensures that, unless replication is paused, the write order at the destination mirrors that of the source, thereby maintaining consistency across volumes.

High consistency synchronous with databases

A replication link or destination volume issue in St Paul results in database outage in Minneapolis

In summary, high consistency use cases can effectively integrate with virtualization and database platforms, offering significant advantages such as data consistency and zero transaction loss It's crucial that the infrastructure facilitating synchronous replication between sites operates at optimal performance levels For high consistency scenarios, this infrastructure must be highly redundant and resilient to outages; otherwise, any slowness or downtime in the replication link or destination site will directly impact the source site and the end-user applications.

When choosing a replication type, it's crucial to consider that maintaining strong connectivity between two sites, especially over long distances, can be costly.

Stakeholders often express concerns about the potential impact on application availability due to failures at secondary sites or connection issues, leading them to prefer asynchronous replication However, the SC series storage provides high availability synchronous replication, offering customers greater flexibility than traditional synchronous replication solutions.

High availability

Organizations often favor asynchronous replication due to its cost-effectiveness and reduced risk of application outages when destination storage becomes unavailable SC Series arrays offer high availability synchronous replication, ensuring data consistency during normal operation In the event of unexpected issues affecting the replication link or destination storage, production application connectivity at the source remains unaffected, providing a level of flexibility not typically associated with traditional synchronous replication This approach combines the advantages of both synchronous and asynchronous methods, while SC Series storage automatically adjusts to changes in destination replica availability For more information on high availability synchronous replication operational characteristics, refer to section 3.

Encapsulated virtual machines are replicated in a data-consistent manner similar to high consistency mode replication However, if the replication link or destination replica volume experiences high latency or becomes unavailable, writes are committed and journaled at the source volume instead of failing This allows applications to continue operating, albeit with a temporary loss of data consistency while the destination volume is inaccessible.

Using high availability mode instead of high consistency mode does not inherently enable designs to extend over greater distances without addressing application latency High availability mode remains a type of synchronous replication, distinct from asynchronous replication Increasing the distance between sites typically leads to higher latency, which will be evident in applications at the source side as long as high availability replication remains synchronized.

Finally, if virtual machines are deployed in a configuration that spans multiple volumes, consider using Replay Manager or consistency groups Replay Manager is covered in section 11.6

High availability synchronous with consolidated vSphere or Hyper-V sites

A replication link or destination volume issue in St Paul results in no VM outage in Minneapolis

Note: Consistent snapshots may be created for asynchronous and synchronous replications However,

Consistent snapshots are not supported with Live Volumes

4.3.2 Microsoft SQL Server and Oracle Database/Oracle RAC

The behavioral differences between high availability and high consistency modes are minimal until extreme latency or outages affect destination volume availability When high availability synchronous replication becomes out-of-date, write I/O at the source volume is journaled, leading to inconsistency in the destination volume Recovery from this inconsistency may be acceptable or not, depending on the situation Dell Storage Manager offers guidance on the safety of recovering from the active snapshot on the destination volume, assessing its data consistency level If inconsistency is detected, it is recommended to revert to the most recent consistent snapshot associated with the destination volume, highlighting a new feature in synchronous replication that includes snapshot replication.

When dealing with storage hosts that have application data distributed across multiple volumes, such as virtual machines with various disk files or database servers with isolated data and logs, ensuring snapshot consistency is crucial for creating reliable restore points To achieve this, it is essential to quiesce all volumes of a dataset and capture snapshots simultaneously, for instance at 8:00, to maintain data consistency across all related volumes This can be effectively managed using Replay Manager, particularly for Microsoft products through VSS integration, or by organizing volumes into consistency groups.

To ensure consistency across snapshots, a snapshot profile with a Consistent Creation Method is established and applied to all relevant volumes containing the dataset For virtual machines, this includes volumes for virtual disks such as C: and D: drives or Linux mount points like / and /tmp In the case of Microsoft SQL Server, the volumes represent system databases, application databases, transaction logs, and tempdb For Oracle databases, the dataset should encompass all volumes related to the database, including data, indexes, data dictionaries, temporary files, control files, and online redo logs, with the option to include offline redo logs and OCR files or voting disks for Oracle RAC While separate volumes for hot dumps, archived redo logs, or boot from SAN may exist, they typically do not need to be included in the consistency group with the primary database files.

Creating a consistency group in Unisphere Central for SC Series

Note: Consistent snapshots may be created for asynchronous and synchronous replications However, consistent snapshots are not supported with Live Volumes

Replay Manager is an effective solution for achieving application-consistent snapshots across multiple volumes, particularly beneficial for users of Microsoft Windows, SQL Server, Exchange, Hyper-V, or VMware vSphere With its robust storage integration and VSS awareness, Replay Manager can create these snapshots and facilitate their replication in both synchronous and asynchronous modes.

After achieving data consistency across volumes with Replay Manager or consistency groups, the resulting snapshots are replicated to the destination volume These snapshots act as historical restore points, facilitating high availability mode recovery, disaster recovery, and remote replicas, which will be explored in the following sections.

High availability synchronous with databases

A replication link or destination volume issue in St Paul results in no database outage in Minneapolis

Remote database replicas

Organizations using Microsoft SQL Server or Oracle databases often create database clones to ensure minimal disruption to production systems and end users These copies serve various purposes, such as providing a separate environment for application developers to test code, allowing DBA staff to evaluate index changes and troubleshoot performance issues, and supporting I/O intensive queries or reporting Utilizing SC Series storage snapshots and View Volumes effectively addresses the need for local database replicas on the same SC Series array.

When storing a replica on a different array, whether in the same building or a different geographic region, it's essential to use replication or portable volume to seed data remotely and refresh it as needed For developer or DBA testing, asynchronous replication may suffice, but synchronous replication is crucial for reporting to ensure the data is current when the reporting database is updated It's important to decide in advance between high consistency mode for zero data loss and a more flexible high availability mode, fully understanding the implications of each option.

SC Series snapshots, along with asynchronous and synchronous replication, optimize space and bandwidth efficiency for storage and replication links Only modified data is captured in a snapshot and replicated to remote SC Series arrays The Minneapolis data center utilizes high availability synchronous replication to ensure data consistency, mitigating the risk of internal reporting database discrepancies without causing production outages for the organization.

Database replicas distributed in a mixed topology

Disaster recovery

As data footprints expand and the need for efficient backup solutions increases, organizations are increasingly migrating to online storage-based data protection strategies Legacy tape-based processes, once seen as cost-effective, are being replaced by more affordable and efficient online storage options Data replication, whether within or between sites, serves as the backbone for scalable data protection strategies, allowing various vendor tools to enhance recovery processes The SC Series' support for multiple replication topologies adds flexibility for disaster recovery, particularly for businesses with distributed site architectures Understanding two key disaster recovery metrics is essential for effective business continuity planning.

The Recovery Point Objective (RPO) defines the maximum acceptable data loss, expressed in time, from which data can be restored It is established through negotiations with business units and is a critical component of a disaster recovery plan To meet RPO targets, organizations must select the right type of data replication, ensure that replication is up-to-date, and be familiar with the necessary tools and processes for recovering data from the specified restore point.

The Recovery Time Objective (RTO) is the maximum time permitted to restore a functional production environment after a disruption Similar to the Recovery Point Objective (RPO), RTO is established in collaboration with business units and included in disaster recovery plans and service level agreements (SLAs) Achieving the desired RTO can differ across data centers, but it fundamentally relies on process efficiency and the use of automation tools Additionally, replication plays a critical role in meeting RTO targets, particularly in large-scale operations.

Utilizing replication allows organizations to effectively target aggressive Recovery Point Objectives (RPOs) and Recovery Time Objectives (RTOs) While the data footprint and growth rate may increase consistently, achievable RPO and RTO goals remain stable as long as the replication infrastructure—comprising network, fabric, and storage—can scale to accommodate the volume of data being replicated and its rate of change.

Replicating the file objects that comprise a virtual machine leverages the inherent encapsulation and portability of VMs, allowing for seamless relocation to any compatible hypervisor This process enables rapid registration and activation of virtual machines, contrasting sharply with traditional disaster recovery methods that require extensive rebuilding and configuration at recovery sites In the event of a disaster, virtual machines, along with their pre-configured applications, can be quickly integrated into the hypervisor’s inventory and powered on, significantly reducing recovery time and helping to meet targeted Recovery Time Objectives (RTO) Furthermore, when these VMs are activated, they already contain the latest application data from the most recent replication, fulfilling the Recovery Point Objective (RPO) of the disaster recovery plan.

Virtualization and replication combined meet aggressive RTOs and RPOs

4.5.2 Microsoft SQL Server and Oracle Database/Oracle RAC

In the recovery process of IT infrastructure, database servers are prioritized alongside critical components like Microsoft Active Directory®, LDAP, DNS, WINS, and DHCP Classified as tier 1 assets, these database servers are the first to be restored in a disaster recovery (DR) plan, followed by application tier servers and finally the application front end, which can be accessed through client desktops or a load-balanced web portal.

RTO is a crucial metric in testing and implementing a live business continuity plan, where all steps in a disaster recovery (DR) plan are predefined and executed systematically, with some actions possibly occurring in parallel A successful recovery of database servers is essential from the outset of the DR plan, as it enables the activation of application and web servers that are critically linked to these databases The impact of a shared database server increases with the number of databases it hosts, as this leads to a wider effect on dependent applications and front-end tiers.

Industry analysis indicates that data is expanding rapidly across various sectors, presenting significant challenges in data protection despite advancements in technology As data growth influences technological and strategic shifts, maintaining Service Level Agreements (SLAs), Recovery Time Objectives (RTOs), and Recovery Point Objectives (RPOs) becomes increasingly complex, especially since these metrics were established when data volumes were much smaller For instance, restoring 10 TB of data from tape may not meet a 24-hour RTO, as the reliance on tape storage results in longer seek times and increases the risk of recovery failure due to potential tape issues Consequently, data replication emerges as a crucial strategy for achieving RTO targets amidst these challenges.

Intra-volume consistency is crucial in distributed virtual machine disk or database volume architectures High consistency replication modes ensure data consistency across all replicated volumes at different sites However, this comes with the trade-off of increased latency at the destination site, which can lead to application downtime if the destination volume becomes unavailable or surpasses latency limits.

For disaster recovery (DR) purposes, high availability mode, also known as asynchronous replication, is often more appealing than traditional synchronous replication This mode can provide data consistency under ideal conditions and allows for some latency during synchronization However, it is important to note that consistency may be compromised if the destination volume becomes unavailable, potentially impacting the uptime of production applications.

To ensure consistency in high availability environments, it is essential to utilize VSS-integrated Replay Manager snapshots or consistency groups with synchronous replication, especially in multi-volume relationships typical of SQL Server and Oracle setups Although this approach may not guarantee active snapshot consistency across all volumes, the subsequent frozen snapshots replicated to the remote array are expected to maintain consistency across those volumes.

4.5.3 Preparing and executing volume and data recovery

Using the right hypervisor, tools, and automation in your disaster recovery (DR) plan makes powering on a virtual machine straightforward Additionally, setting up volumes and launching database servers is significantly faster than traditional methods, particularly when booting database servers from SAN (BFS) volumes on comparable hardware at the DR site.

When accessing volumes at the disaster recovery (DR) site, it's crucial to determine the purpose of the access, whether it is for a validation test of the DR plan or in response to a declared disaster Understanding this distinction is essential, especially when considering the site as an active replication destination target.

Destination volumes at the disaster recovery (DR) site cannot be mounted for read/write access to a storage host, irrespective of the asynchronous or synchronous mode or topology in use For situations related to Live Volume, please see section 5 for further details.

To effectively test a Disaster Recovery (DR) plan, it is essential to present view volumes from snapshots of each test volume to the storage hosts Both asynchronous and synchronous replications support snapshots and view volumes in high consistency or high availability modes These features are crucial during DR testing, as they ensure ongoing replication between source and destination volumes, thereby maintaining the Recovery Point Objective (RPO) in the event of a real disaster occurring during the test However, if a disaster is declared and the Activate Disaster Recovery feature is activated, replication from the source to the destination must be halted, especially if the active volume at the destination is designated for data recovery in the DR plan.

Proxy data access

An SC Series Live Volume consists of two replication-enabled volumes: a primary Live Volume and a secondary Live Volume Access to a Live Volume is possible through either array that supports its replication, but the primary Live Volume can only be active on one array at a time All read and write operations for the Live Volume are handled by the array that hosts the primary Live Volume If a server connects to the Live Volume via uniform or non-uniform paths to the secondary Live Volume array, the I/O requests are routed through the Fibre Channel or iSCSI replication link to the primary Live Volume system.

A mapped server utilizes proxy access to connect to a Live Volume through the secondary Live Volume system, reaching the primary Live Volume system For optimal performance, this proxy data access necessitates a replication link between the two arrays that offers sufficient bandwidth and minimal latency to meet the I/O operations and latency demands of the application data access.

Proxy data access through the Secondary Live Volume

Live Volume ALUA

The Live Volume Asymmetric Logical Unit Access (ALUA) feature, introduced in SCOS 7.3 and DSM 2018, adheres to the T10 SCSI-3 SPC-3 specification for Microsoft Windows, Hyper-V, and vSphere This feature operates on a per-Live-Volume basis, enabling the designation of MPIO paths to the primary Live Volume as optimal and those to the secondary Live Volume as non-optimal When paired with an ALUA-supporting Round Robin path selection policy (PSP), it ensures that optimal paths are utilized for read and write I/O when available, while non-optimal paths serve as backups In the event of a Live Volume role swap or Automatic Failover, the SC Series array promptly communicates ALUA path state changes to the storage host This feature facilitates the use of Round Robin PSP with Live Volume in both uniform and non-uniform storage presentations, offering easy deployment, minimal administrative overhead, and effective utilization of storage ports, fabric, and controllers.

Uniform Live Volume with ALUA and Round Robin PSP

ALUA functionality can be activated on Live Volumes when both SC Series arrays are operating on SCOS version 7.3 or later To enable Report Non-optimized Paths for new Live Volumes, simply choose the Live Volume option.

Select Report Non-optimized Paths for the secondary Live Volume

For pre-existing Live Volumes, once both SC Series arrays are upgraded to SCOS 7.3, a banner will be displayed allowing these Live Volumes to upgraded to support ALUA capability

Banner in DSM 2018 showing Live Volumes can be upgraded for ALUA capability

Dell Storage Manager 2018 and later versions offer comprehensive guidance for the upgrade process, allowing users to upgrade all Live Volumes or select a specific subset This flexibility accommodates situations where not all storage host operating system types fulfill the necessary requirements for the upgrade.

Microsoft Windows, Hyper-V, or vSphere Live Volumes can be upgraded for ALUA capability

The wizard's next step allows users to unmap and remap secondary Live Volume mappings to the storage host, with the process varying based on the operating system of the storage host.

To effectively recognize ALUA state changes on the Live Volume, administrators may need to perform volume mappings or reboot the storage host during upgrades It is crucial to understand the implications of these actions, as testing has shown that vSphere hosts necessitate a reboot Additionally, the default workflow does not reset the secondary server mappings.

Reset the secondary Live Volume mappings or reboot the storage host as necessary

After the Live Volume is upgraded for ALUA capability, the last step is to choose whether or not to Report

Non-optimized paths are reported by default, but if the storage host operating system does not support ALUA, users can still upgrade the Live Volume for ALUA support while keeping the ALUA feature disabled from the storage host's perspective.

Report Non-optimized Paths option

Live Volumes upgraded for ALUA capability or created in SCOS 7.3 with native ALUA support cannot be downgraded or have their ALUA functionality removed Nevertheless, users can disable the ALUA feature at any time by editing the Live Volume and unchecking the Report Non-Optimized Paths option.

Live Volume connectivity requirements

Live Volume connectivity requirements differ based on the intended use, particularly when migrating workloads Specifically, the requirements change depending on whether the virtual machines are powered on or off during the migration process.

When considering Live Volume configurations outside of automatic failover setups like vSphere Metro Storage Cluster or Microsoft Server/Hyper-V clusters, there are no strict limitations on bandwidth or latency However, to enable data access between SC Series arrays, a high-bandwidth, low-latency replication link is essential Many operating systems and applications perform best with disk latency below 10 ms, although noticeable performance issues may not arise until latency exceeds 25 ms Since some applications are particularly sensitive to latency, a scenario where the primary data center experiences a 5 ms latency, while the inter-data center connection averages 30 ms, could result in a total storage latency of 35 ms or more when writing data This latency may be acceptable for certain applications but could pose challenges for others.

Utilizing Live Volume proxy communication or synchronous replication is best achieved through site-to-site replication connectivity, ensuring consistent bandwidth and minimal latency The required bandwidth for this connectivity largely depends on the volume of changed data needing replication and the existing traffic on the same network However, if a site does not intend to proxy data access between arrays using asynchronous replication, latency becomes less critical.

To enhance performance and security, it is advisable to implement dedicated VLANs or fabrics to separate IP-based storage traffic from general-purpose LAN traffic, particularly when extending across data centers Although this is not mandatory for Live Volume, it remains a widely accepted best practice for managing IP-based storage systems.

To effectively utilize hypervisor virtualization products like VMware vSphere, Microsoft Hyper-V, and Citrix XenServer, a site needs a minimum 1 GB connection with latency of 10 ms or less between servers for VMware vMotion Metro or live migration For standard vMotion, latency must be 5 ms or less between the source and destination hosts.

High-speed fiber connectivity is essential for inter-data-center and campus environments, allowing speeds of up to 16 Gb with Multi-mode fiber and 1 Gb with dark single-mode fiber over distances of up to 60 miles This connectivity is crucial for minimizing latency during the implementation of Live Volume alongside synchronous replication It is particularly recommended for live migrating virtual machine workloads between arrays and is necessary for synchronous Live Volume with automatic failover in vSphere Metro Storage Cluster configurations or Microsoft Windows Server/Hyper-V clusters.

For optimal performance when using Live Volume over low-bandwidth, high-latency replication links, it is advisable to manually control swap role activities This involves shutting down the application at site A, executing a Live Volume swap role, and subsequently restarting the application at the remote site This approach minimizes storage proxy traffic and allows for a pause in replication I/O, enabling the replication to catch up and facilitating a swift swap role To manage manual swap role activities, ensure the Automatically Swap Roles option is deselected in the Live Volume configuration In cases of high latency, asynchronous replication is recommended to prevent negative impacts on application performance, while automatic failover for Live Volume supports synchronous replication exclusively in high availability mode.

Replication and Live Volume attributes

Once a Live Volume is created, additional attributes can be viewed and modified in Unisphere Central for SC Series as depicted in Figure 40

Live Volume utilizes standard SC Series storage replicated volumes, allowing each volume to be configured for asynchronous, synchronous high availability, or synchronous high consistency It is important to note that automatic failover for Live Volume necessitates the use of synchronous high availability mode Additional details on these features are available in this document and the Dell Storage Manager Administrator’s guide.

Type: This refers to asynchronous or synchronous replication

Sync Mode refers to the method of synchronous replication, which can prioritize either high consistency or high availability It is important to note that Sync Mode is not applicable to asynchronous Live Volumes.

Sync Status indicates the state of synchronous replication, which can be either Current or Out Of Date When the status is Current, it signifies that the source and destination volumes are consistent, with any cumulative latency observed at the primary Live Volume application Conversely, an Out Of Date status means that the data between the two volumes is inconsistent, with changes tracked in a journal until synchronization is restored It is important to note that Sync Status does not apply to asynchronous Live Volumes.

The Deduplication feature efficiently replicates only the modified portions of snapshot history on the source volume, minimizing replication traffic and bandwidth usage Although this process is more processor-intensive, it can significantly optimize data transfer However, if the connection has adequate bandwidth, Dell Storage advises disabling Deduplication for Live Volumes to conserve controller CPU resources for other essential tasks.

Replicate Active Snapshot: It is recommended that Replicate Active Snapshot is enabled for asynchronous

Live Volumes enable real-time data replication, significantly reducing the time needed for a Live Volume Swap role When configured with synchronous replication modes, the Active Snapshot is replicated instantly, provided that the replication remains in sync (HA mode) and is not paused by an administrator (HA and HC modes).

The "Replicate Storage to Lower Tier" feature is automatically activated for new Live Volumes, allowing users to replicate data to a lower storage tier by default Users often choose to replicate the initial Live Volume to the lowest tier and then disable this option after the initial replication is complete This approach helps conserve tier 1 storage capacity, particularly beneficial when utilizing SSD or 15K drives For additional details on Data Progression with Live Volume, refer to the section on Data Progression and Live Volume.

QoS Nodes are essential for managing egress traffic shaping during the replication process from the primary to the secondary Live Volume While the secondary QoS Node does not handle ingress traffic shaping, it becomes crucial for egress shaping after a role swap, turning into the primary Live Volume These nodes specifically apply to replication traffic and do not govern the Live Volume proxy traffic between arrays To prevent congestion on the replication link, especially if it shares bandwidth with other traffic, throttling may be necessary; however, this can introduce latency for applications reliant on the Live Volume Therefore, careful sizing of replication links and QoS Nodes is vital, considering factors such as data volume per Live Volume, change rates, application latency needs, and other services sharing the link This is particularly critical for synchronous replication, where the latency between sites in a vSphere Metro Storage Cluster should not exceed 10 ms.

In a scenario where a 20 Gbps replication link is utilized for all intra-data-center traffic, setting a replication Quality of Service (QoS) limit at 10 Gbps allocates half of the bandwidth for replication tasks This configuration ensures that non-Live-Volume replication traffic can still access a fair portion of the bandwidth However, it may lead to application latency if synchronous replication traffic surpasses 1,000 MBps.

Editing a Live Volume with QoS nodes

To ensure optimal performance and reliability in Live Volume managed replications, it is essential not to share common QoS Nodes between a single SC Series source and multiple SC Series destinations For example, when volume A is synchronously replicating to volume B and asynchronously replicating to volume C, each replication should utilize a separate, independent QoS Node This practice helps maintain the integrity and efficiency of the replication process.

A Live Volume provides additional attributes that control the behavior of the Live Volume and are listed as follows:

When the "Swap Roles Automatically" feature is enabled, the primary Live Volume will be automatically transferred to the array experiencing the highest I/O load, provided it meets the necessary conditions for a swap The Live Volume system collects I/O samples every 30 seconds to assess primary access, whether from servers directly connected to the primary array or those using a secondary array Automated role-swap decisions are made based on the most recent ten samples, representing a total of five minutes This process operates continuously on the primary Live Volume array, independent of the 30-minute delay timer.

The autoswap design is engineered to intelligently manage the autoswap movement of Live Volume primary systems, ensuring that role swaps do not occur rapidly between arrays.

The "Min Amount Before Swap" attribute defines the minimum data accessed from a secondary system for a Live Volume If access to the Live Volume from a secondary array is infrequent, it may be beneficial to consider transferring the primary system to that array, in which case a minimal value should be set This value is calculated by dividing the read/write access by the seconds-per-sample value A sample is deemed satisfactory if the secondary array access exceeds this specified threshold.

The Min Secondary Percent Before Swap refers to the percentage of total access from a secondary array to a Live Volume on a per-sample basis, determined by the Min Secondary Percent for Swap attribute A sample meets the criteria if the secondary array accesses the Live Volume more than the set threshold, which is typically 60% The SC Series array collects samples every 30 seconds, retaining the latest ten samples (equating to five minutes) for evaluation Consequently, for a sample to satisfy this condition, the secondary Live Volume must demonstrate higher I/O than the primary system in six out of ten samples, achieving the 60% requirement.

The Min Time As Primary Before Swap feature in Live Volume settings includes a default timer of 30 minutes, which prevents autoswaps from occurring immediately after a role swap This delay allows the SC Series array to wait before reassessing autoswap conditions, thereby minimizing the risk of thrashing in dynamic environments where primary access points may change frequently Additionally, it ensures stability in scenarios where a Live Volume is utilized by applications running on servers at both primary and secondary sites.

Failover Automatically refers to the configuration of the Live Volume, determining its capability to automatically switch to a backup and maintain availability during an unexpected outage of the primary Live Volume.

Ngày đăng: 08/04/2022, 09:12

TÀI LIỆU CÙNG NGƯỜI DÙNG

TÀI LIỆU LIÊN QUAN

w