//
you're reading...

English

Oracle over NFS on NetApp storage – 2 – Primary and DR site solution design

In today’s article, we will look again bit into deployment of Oracle DB over NFS protocol on NetApp storage. This article is based on practical experience with deployment of SAP using Oracle (10g) as a back-end as well as on experience with Oracle DB deployments (10g & 11g). Most parts of this article are from a documentation I have created for my last customer, who has been implementing DR solution between two sites with third site for Backup and Recovery. This article is a part of Oracle over NFS on NetApp series, during entire series, we will introduce you to our approach towards projects and as example we chose to give you deep insight into Oracle deployment which we are using as a show-case. In today’s second episode we will look into solution design, volume layout and how we hook it up to fit with our RPO and RTO described in article Oracle over NFS on NetApp storage – 1 – Analysis.

As we have already mentioned, in our first episode, customer has three sites, all these three sites are a collocation sites rented from a service provider. Between primary and Disaster Recovery site, there is a DWDM ring with Round Trip Delay of approximately 0,56 ms, between primary site and Backup and Recovery site, there is a MPLS connection with Round Trip Delay of approximately 16 ms and between Disaster Recovery and Backup and Recovery site, there is also MPLS connection with Round Trip Delay of approximately 12 ms.

SnapMirror design

In order to meet defined Recovery Point Objective, we need to use either NetApp MetroCluster or Synchronous SnapMirror. Since availability is a priority for our customer, we will use Synchronous SnapMirror to have better error isolation and to save needed investment on Fibre Channel equipment.

Connectivity between our primary site and Disaster Recovery – 10Gb/s Ethernet with Round Trip Delay (RTD) of approximately 0,56ms allow us to run synchronous replication between both sites. Data from primary site will be also SnapMirrored to Backup and Recovery site with asynchronous SnapMirror on QTree level to allow us to keep different amounts of snapshots on production and Backup and Recovery site.

 

SnapMirror layout
SnapMirror layout

Disks sizing, aggregates and volumes (Primary and DR site)

For optimal balance between performance and costs, we will use SAS drives on primary and Disaster Recovery site and SATA drives on Backup and Recovery site. Customer currently have NetApp FAS 3170HA on primary and Disaster Recovery site for Windows homefolders. Currently installed capacity is 7 DS4243 disk shelfs with 600GB SAS spindles. Due to high IOPS load for this database, we will use also SAS drives, but we will create bigger raid groups out of smaller drives.

Spindle limit for NetApp FAS3170HA is 840, and currently 7 disk shelfs with 24 spindles makes 168 spindles. So we can add additional 672 spindles. For performance we will use smallest SAS drives – 300GB SAS. As estimated in our analysis we will need approximately 6,57TB of storage in a first month, however customer will make his purchases for storage every six months so we need to have 6,99TB for first six months and additional 10% reserve as a safety margin. All together this makes 7,689TB.

Our data will be distributed across two aggregates, one will contain Oracle Data Files, and the other one will contain rest. Our used version of ONTAP is 7.3.5.1P4, so we have available only 32-bit aggregates. To get maximal aggregate size and performance, we will scale up to three raid groups in each aggregate with Raid Group Size of 19 spindles within Raid Group.

Aggregate for Oracle Data Files will need 6155,66GB, for 4 days of hourly snapshots, we will need approximately 4*120*0.5 additional disk space = 240GB. Total amount is 6395,66GB, with additional safety margin of 10% we scale to 7,04TB. For Oracle data files. This means we will need two Raid Groups of 19 disk spindles.

Aggregate for rest of Oracle data will need 666,86GB for Oracle archive logs, 4GB for control files, 4GB for on-line redo logs and 32GB for Oracle binaries. This makes all together 782GB of storage with 10% safety margin. This means we will need one Raid Group of 19 disk spindles.

Aggregate name RG size Disk size RG data size Aggregate size RAW
aggr_oradata 19 300 5100 10200
aggr_oramisc 19 300 5100 5100

We will need 3 times 19 disk spindles + at least 2 spindles with this disk type and size to enable Disk Maintenance Center = 59 spindles, each shelf has 24 disks, so we need 2,45 disk shelfs -> 3 disk shelfs needs to be ordered for primary and Disaster Recovery site.

Volumes

We will need couple volumes, we will need volume for Oracle Binaries and Configuration files, Oracle archive logs, Oracle temporary data files, Oracle redo logs, Oracle control files and Oracle data files. Also in RAC environment, volume for Oracle cluster files needs to be added. In our case we are going to have Oracle Single Instance of two nodes (active / passive)  in each site. We will have following volumes in our vFiler, in each side, _m_ in name indicates that this volume is mirrored between primary and Disaster Recovery site – wr01 is database SID. vf_wr01 and vf_wr01dr are names of vFilers, first one is on primary site, second one is on Disaster Recovery site.

For QSM (QTree snapmirror), we will also create a QTree on top of each volume,

Mirrored Volumes
Primary site name DR site name QTree name Volume size [GB] Aggregate
vf_wr01_oraarch_m_wr01 vf_wr01dr_oraarch_m_wr01 arch 700 aggr_oramisc
vf_wr01_oralogtemp_m_wr01 vf_wr01dr_oralogtemp_m_wr01 temp 64 aggr_oramisc
vf_wr01_oraredo_m_wr01 vf_wr01dr_oraredo_m_wr01 log 4 aggr_oramisc
vf_wr01_oractrl_m_wr01 vf_wr01dr_oractrl_m_wr01 ctrl 4 aggr_oramisc
vf_wr01_oradata_m_wr01 vf_wr01dr_oradata_m_wr01 data 7689 aggr_oradata
Site specific volumes
vf_wr01_oracle_wr01 vf_wr01dr_oracle_wr01 bin 32 aggr_oramisc

Even though NetApp and Oracle recommends to multiplex log files and control files, we are using RAID-DP and we consider probability of volume loss low, so our logs and control files are located only on one volume with no cross volume multiplexing. We will have two redo log groups, but both located on same volume – /vol/vf_wr01_oralogtemp_m_wr01.

  • $ORACLE_HOME/Redo_Grp1
  • $ORACLE_HOME/Redo_Grp2

Our control files will not even be multiplexed on a same volume, we will just set CONTROL_FILE_DEST parameter to $ORACLE_HOME/control_file1 on a filer volume -/vol/vf_wr01_oractrl_m_wr01.

Volume options

As per NetApp Best Practice Guidelines for Oracle®, we will disable automatic snapshots on all volumes using vol options <volume name> nosnap on as well as presence of a .snapshot directory by vol options <volume name> nosnapdir on.  Due to performance considerations, we will disable access time updates on all volumes, using vol options <volume name> no_atime_update on.

There is also minra option, this disable read ahead cache on volumes. For certain volumes such as Oracle Binaries and Configuration files, Oracle archive logs, Oracle temporary data files, Oracle redo logs, Oracle control files it makes no sense to have Read Ahead Cache, so we will disable Read Ahead Cache using minra option – vol options <volume name> minra on.

For last, Oracle data files volume, it might or might not make sense to enable minra option it depends on a profile of your application, Read Ahead Cache helps in a situations when database performs a lot of index or full table scans. It is worth to experiment with minra option on Oracle data files volume. In our case we will just leave minra option on and we will have Read Ahead Cache enabled for Oracle data volume.

Primary and DR site BoM

In our primary and DR site, we need to have available following equipment in each site:

  • NetApp FAS3170HA
  • 3x DS4243 Disk Shelf with 24 300GB SAS drives
  • 2x 10Gb/s enabled switch for storage network
  • 2x 1Gb/s or 10Gb/s enabled switch for database access network
  • 2x 10Gb/s enabled servers with sufficient memory and CPU

In our case, we used two Cisco Catalyst 6500 for both access and storage network. For resilience, these switches have Cisco Catalyst Virtual Switching System (VSS) enabled and configured. As servers we are using Cisco UCS C460 M2 with 2x 10 cores E7-4800 series CPUs and 256GB of memory.

Logical design for Primary and DR site
Logical design for Primary and DR site

Next time

In next episode, we will create aggregates and volumes, provision a vFiler for Secure Multi-Tenancy, configure virtual interfaces on a filer side.

Series Navigation<< Oracle over NFS on NetApp storage – 1 – AnalysisOracle over NFS on NetApp storage – 3 – Primary configuration >>

Marek Stopka

Marek Stopka is a Senior Consultant at STOPKA Consulting s.r.o. and his specialities include Business Resilience, Business Continuity, Disaster Recovery and Data Storage.

More Posts - Website

Discussion

No comments yet.

Post a Comment

* Copy this password:

* Type or paste password here: