Wednesday, October 24, 2007

Oracle RAC design example

A Case of RAC Database Design and Setup on Sun Platform

[Callout Text]
This article describes a case of RAC database design and setup on Sun platform with data guard, which offers high availability by a relatively low cost.

A case of RAC database design and setup on Sun platform 1
0. Introduce 1
1. Cluster Hardware Architecture 1
1.1 Hardware Diagram 1
1.2 Servers 3
1.3 Cluster Interconnect 4
1.4 Public Network 4
1.5 Shared Disk storage 4
2. Software Architecture 5
2.1 Software Diagram 5
2.2 Operating System 6
2.3 Cluster File Systems 6
2.3.1 Local File System 6
2.3.2 OCR and Voting Disk 7
2.3.3 ASM 7
2.4 Cluster Software 8
2.4.1 Sun Clusterware 8
2.4.2 Oracle Clusterware 9
3. Oracle RAC Database Setup 9
3.1 Oracle RAC Database Software Installation 9
3.2 Create RAC Database 10
3.3 Data Guard Setup 12
3.3.1 Configure Data Guard Broker 12
3.3.2 Creating a Physical Standby Database 12
3.3.3 Configure the Primary Database for Data Guard 14
4. References 15


0. Introduce

Oracle RAC(Real Application Clusters) database enables multiple instances that are linked by an interconnect to share access to an Oracle database. It provides high availability, scalability, and redundancy . However, the RAC only architecture without Data Guard is still not able to protect the database against some unexpected events such as shared storage faulty, site failure or rolling maintenance. This article describe a case of RAC database design with data guard which provides a high level of availability while in the meantime keeps a relatively low setup cost.

1. Cluster Hardware Architecture

1.1 Hardware Diagram





Figure 1 Hardware Architecture

The RAC system designed for a financial institute consists of a primary site and a secondary site. The primary site contains a two nodes RAC database with shared storage, and the secondary site contains a single instance physical standby with the same configuration as primary database except that it's non-cluster. The primary site and the standby site are connected by an existing leased line. Data Guard operates to transfer redo data from the primary site to the secondary site.

The IP addresses assigned to each NIC are listed as below:




NIC Name IP Address Assigned
node1_priv_eth1 192.168.3.1
node2_priv_eth1 192.168.3.2
node1_priv_eth2 192.168.4.1
node2_priv_eth2 192.168.4.2
node1_pub_eth0 203.117.80.52
node2_pub_eth0 203.117.80.53
node1_vip 203.117.80.54
node2_vip 203.117.80.55
dr_eth0 203.117.123.15

Table 1 The IP addresses assigned to each NIC



1.2 Servers

RAC can be installed on almost all Operating Systems: Macintosh, Linux, Windows, OpenVMS, Unix, OS/390. Certified hardware configurations can be checked from Reference[1]. Each node must use the same hardware as well as Operating System configuration. In our system, we use Sun servers because of its good compatibility with Oracle software.


Server Sun Fire V490
OS Version SUN Solaris 10
CPU 2x1.5GHZ
Memory 8GB
Harddisk 4 x 146GB with RAID 1+0
NICs 2 Gigabit Ethernet Adapters and 1 100Mbps Ethernet Adapter

Table 2 Server Configuration





A Sun StorEdge LT0-2 tape drive attached to the standby server for scheduled backup.



1.3 Cluster Interconnect

Cluster members communicate with the other nodes in the cluster through one or more physically independent networks called private networks. The set of private networks in the cluster is referred to as the high-speed interconnect for the Cache Fusion purposes. It can be any high-bandwidth connection, such as Gigabit Ethernet, ATM, SCI or other solutions. Cluster Interconnect should have higher bandwidth than the public network.


In our system, we use Gigabit Ethernet for our 2-node Oracle 10g RAC cluster, which is much cheaper and even faster than Fiber channel network. Each cluster node is connected to a gigabit Ethernet switch through two redundant Gigabit Ethernet adaptors to avoid a single point of failure.




1.4 Public Network

Public network adapters attach nodes to the public networks, providing client access to the cluster.

To maintain high availability, each cluster node is assigned a virtual IP address (VIP) as its logical interface eth0:1. In the event of node failure, the failed node's IP address can be reassigned to a surviving node to allow applications to continue accessing the database through the same IP address.



1.5 Shared Disk storage

Oracle RAC relies on a shared disk architecture. The database files, online redo logs, and control files for the database must be accessible to each node in the cluster. The shared disks also store the Oracle Cluster Registry and Voting Disk . There are a variety of ways to configure shared storage including direct attached disks (typically SCSI over copper or fiber), Storage Area Networks (SAN), and Network Attached Storage (NAS).

In our system, two Sun StorEdge T3 disk arrays were mirrored together (hardware RAID 1+0) and connected to V490 Servers via SCSI connections.

Sun Cluster require quorum/voting disks on cluster file system or raw device. Quorum is a mechanism to determine node participation in the cluster by using a dynamic voting algorithm in order to maintain data and resource integrity. In our system /dev/did/rdsk/d2 had been configured as quorum/voting disk.

In standby site, there is also a disk array connected to Sun server with the same configuration except mirroring.



2. Software Architecture

2.1 Software Diagram



Figure 2 Software Architecture

2.2 Operating System


Sun Solaris 10 has been certified by Oracle RAC as per Reference [1]. After installation, the operating system needs to be properly configured as following:

A. Installing the necessary software packages;
B. Setting kernel parameters with shared memory configuration;
C. Configuring the network for both for private and public network;
D. Establishing accounts with the proper security;
E. Storage devices configuration.

It's better that the Operating System is 64-bit in order to use large memory for SGA, therefore we use 64-bit sparcv9 kernel modules in our system.


2.3 Cluster File Systems


2.3.1 Local File System


The local hard disks have been mirrored with RAID 1+0. We use Solstice Disk Suite (also called Solaris Volume Manager in Solaris 10) to create the local file system as below:



File System Minimum Required Current Size Usage
physical memory > 500 MB 8 GB OS processes and oracle instance
swap >1GB 16 GB OS to swap pages out of physical memory
/tmp >500 MB 5 GB OS temporary files
Metadata >20 MB 50 MB state database replica for Solstice Disk Suite
root (/) >100 MB 5 GB root mount point
/var >100 MB 5 GB OS log files/messages
/usr >100 MB 10 GB OS binaries and library
/opt >100 MB 20 GB OS and cluster software
/u01 >4 GB 40 GB Oracle base and CRS base

Table 3 Local Filesystem Layout


where
/u01 contains node-specific info directories such as:
• RDBMS binaries
• CRS binaries
• Alert and trace files
• listener configuration files

The advantage of local binary and init.ora file is that when oracle is patched or upgraded, the data service does not need to be shutdown. Furthermore, local device has performance benefit over global device in the cluster.

In addition to the local file system above, Sun Cluster installer created /globaldevices of size 512M global devices used by the sun cluster scinstall utility in each node.



2.3.2 OCR and Voting Disk




Oracle Cluster Registry (OCR) contains database configuration data and Oracle Cluster Ready Services voting disk contains cluster membership information. They need to be shared across all RAC nodes .

In our system, We created two shared raw partitions on /dev/did/rdsk/d3 of 200MB each for OCR and for voting disk, change their ownership to oracle:dba and permission to 660. These two partitions are accessible from all the nodes.






2.3.3 ASM


Oracle Automatic Storage Management (ASM) in Oracle Database 10g provides the services of a file system, logical volume manager in a platform-independent manner. ASM has the following advantages:

• Easy for DBA to manage;
• I/O evenly distributed across disks and Load Balancing;
• Fault Tolerance.


We planned the ASM disks as below:


A. Choose ASM disk LUN(logical unit number)


We created symbolic links /asmdisks/PRODDB/vdiskn (n=0,1,2,3..) pointed to the device /dev/did/rdsk/d*.


B. Define Disk groups

We plan the disk groups as below for ASM instances in later step(3.2. Create RAC Database). The mirroring for files in a disk group is selected as External redundancy since we already have the hardware RAID disk storage.


Diskgroup Disks Usage
DG_DAT /asmdisks/PRODDB/vdisk0,
/asmdisks/PRODDB/vdisk1,
/asmdisks/PRODDB/vdisk2,
/asmdisks/PRODDB/vdisk3 data files
DG_REDO1 /asmdisks/PRODDB/vdisk4 control file1, redo file1, spfile
DG_REDO2 /asmdisks/PRODDB/vdisk5 control file2, redo file2
DG_RECO /asmdisks/PRODDB/vdisk6,
/asmdisks/PRODDB/vdisk7 Flash Recovery Area

Table 4 ASM Disk Groups



2.4 Cluster Software



2.4.1 Sun Clusterware


we use Sun Cluster to support oracle CRS(clusterware service) because Sun Cluster 3.1 has been seamlessly integrate with Solaris and has been certified by RAC.

Sun Cluster v3 software and its patches needed to be installed on all nodes. To setup the Sun Cluster, the following installation tasks need to be completed:

A. Install cluster control panel from administrative console;
B. Install Solaris software on each node;
C. Establish cluster;
D. Configure cluster;
E. Install sun management center cluster module;
F. Install Disk Suite and Storage management software;
G. Install patches.

For more details on the installation steps, reference [2] "Sun Cluster Software Installation Guide for Solaris OS" can be referred.


After the installation, RAC framework resource group need to be registered and configured in order to enable sun cluster commands for RAC, refer to [3] for more details.




2.4.2 Oracle Clusterware

After the Sun cluster 3.1 installed, an Operating System Dependent clusterware layer - Oracle CRS(cluster ready services, introduced from Oracle RAC 10g Release 1) need to install to provide the HA support services. CRS(cluster ready services) supports services and workload management and helps to maintain the continuous availability of the services. CRS also manages resources such as virtual IP (VIP) address for the node and the global services daemon. In RAC 10g Release 2, CRS has been renamed as Oracle Clusterware.

For the RAC in Sun Clusters, Oracle’s UNIX distributed lock manager (DLM) is used to coordinate shared concurrent access to the database. As the Solaris UNIX DLM and Oracle DLM is different, Oracle UDLM patch need to be installed before the Cluster Ready Services. Oracle UDLM patch need to be installed onto each node in the cluster from the /racpatch directory of installation CD.


Oracle Clusterware setup by running runInstaller command from the clusterware directory on node 1 (such as /cdrom/crs/runInstaller) and the software will be pushed to the node 2 later. The following tasks will be completed:

A. Install CRS under $CRS_HOME;
B. Define cluster name, public and private node name;
C. Define the network interface name, subnet as CLUSTER_INTERCONNECTS;
D. Define the location of OCR and voting disk, choose Oracle multiplex the OCR and Voting Disk option.


After completion, the following components will be installed:
• Cluster Manager
• Node Monitor
• Interconnect(IPC) Software


3. Oracle RAC Database Setup

3.1 Oracle RAC Database Software Installation


Oracle RAC 10g Release 2 software is the heart of the RAC database. In our system the binaries are located on local file system of each node.

Oracle software can be downloaded from :
http://www.oracle.com/technology/software


Started the runInstaller command from the DB directory of the Oracle Database 10g Release 2 (10.2) installation media;

On Selected Nodes page , provided node name SGEPRDDB01A and SGEPRDDB01B;

On the Install Type page , selected Enterprise Edition so that components such as Data Guard will be included in the installation;

On the Database Control page , enabled the Enterprise Manager Grid Control for database management;

On Data Storage Option page , selected Automatic Storage Management (ASM) and then provided the disk partition locations "/asmdisks/PRODDB/*";

On the Public Network Interfaces page, assigned the public VIP addresses for each node :



Node NIC VIP
node1(SGEPRDDB01A) eth0 203.117.80.54
Node2(SGEPRDDB01B) eth0 203.117.80.55

Table 5 Public VIP Addresses


After completed, latest patch version 10.2.0.3 should be applied.


For the standby database, Oracle software binaries were installed without cluster settings.




3.2 Create RAC Database


We can create RAC database either with DBCA or with manual scripts. Oracle recommends using the DBCA to create RAC database because the DBCA's preconfigured databases optimize the environment for features such as ASM. It also configures RAC environment for various features such as services and cluster administration tools.

A. Create the database and its instances

Before start DBCA, TNS listener need to be configured, this can be done by running $ORACLE_HOME/bin/netca and select “Cluster Configuration”, define node SGEPRDDB01A and SGEPRDDB01B, add listener LISTENERS_PRODDB for both nodes and add LISTENER_PRODDB1 for SGEPRDDB01A and LISTENER_PRODDB2 for SGEPRDDB01B only.

The DBCA GUI interface started by the $ORACLE_HOME/bin/dbca command from SGEPRDDB01A only. On the Welcome page there is an option to select an Oracle Real Application Clusters (RAC) database;

On the Node Selection page, select both nodes SGEPRDDB01A and SGEPRDDB01B;

On the Management Options page, select OEM with Grid control option;

On the Storage Options page, select Automatic Storage Management (ASM), create new ASM instances on both nodes, and then create the disk groups DG_DAT,DG_REDO1,DG_REDO2,and DG_RECO as defined in Chapter 2.3.3 above.

On the Database File Locations page , select OMF(Oracle-managed files);

On the Recovery Configuration page, define the flash recovery area as DG_RECO;

On the Database Services page, Add Service "PRODDB1_SERV", "PRODDB2_SERV" using VIPs for both nodes;

On the Initialization Parameters page , define Initialization Parameters as following:

Cluster-Wide Parameters for Database "PRODDB":

*.CLUSTER_DATABASE=TRUE
*.CLUSTER_DATABASE_INSTANCES=2
*.REMOTE_LISTENER=LISTENERS_PRODDB
*.DB_NAME=PRODDB
*.DB_UNIQUE_NAME=PRODDB
*.LARGE_POOL_SIZE=32M
*.DB_BLOCK_SIZE=8192
*.DB_CACHE_SIZE=800M
*.SHARED_POOL_SIZE=500M
*.SORT_AREA_SIZE=1M
*.DB_CREATE_FILE_DEST=+DG_DAT
*.DB_RECOVERY_FILE_DEST=+DG_RECO
*.UNDO_MANAGEMENT=AUTO


Instance Specific Parameters for Instance "PRODDB1" and "PRODDB2":


PRODDB1.INSTANCE_NAME=PRODDB1
PRODDB1.INSTANCE_NUMBER=1
PRODDB1.LOCAL_LISTENER=LISTENER_PRODDB1
PRODDB1.THREAD=1
PRODDB1.UNDO_TABLESPACE=UNDOTBS_PRODDB1

PRODDB2.INSTANCE_NAME=PRODDB2
PRODDB2.INSTANCE_NUMBER=2
PRODDB2.LOCAL_LISTENER=LISTENER_PRODDB2
PRODDB2.THREAD=2
PRODDB2.UNDO_TABLESPACE=UNDOTBS_PRODDB2


After the Summary dialog , DBCA will complete the follow tasks:

• Creates an operative RAC database and its instances
• Creates the RAC data dictionary views
• Configures the network for the cluster database
• Start the listeners and database instances and then starts the high availability services


Also we need to register RAC database and listener in Sun cluster resource to enable automated startup/shutdown of RAC instances by Sun cluster, refer to [3] for detailed recommendation.




3.3 Data Guard Setup


3.3.1 Configure Data Guard Broker

The Oracle Data Guard broker centralizes the creation, maintenance, and monitoring of Data Guard configurations. Although it is included with the enterprise edition installation, broker configuration files need to be setup[4].

Two copies of the configuration file are maintained for each database so as to always have a record of the last known valid state of the configuration. It can be managed with Cluster File System (CFS) for Configuration Files, ASM Disk Groups or Raw Devices . In our system, we create on ASM disk group:

ALTER SYSTEM SET DG_BROKER_CONFIG_FILE1 = '+DG_DAT/BROKER/DR1.DAT' SCOPE=BOTH;
ALTER SYSTEM SET DG_BROKER_CONFIG_FILE2 = '+DG_DAT/BROKER/DR2.DAT' SCOPE=BOTH;
ALTER SYSTEM SET DG_BROKER_START=TRUE;


3.3.2 Creating a Physical Standby Database


A. RMAN Backup of the Primary Database Datafiles including all database files plus archivelog files with RMAN command into a staging area and transfer to standby server [5];

B. To copy and rename the Initialization Parameter File from the Primary Database to the Standby database and change the log related parameters.

On the primary database, Oracle Data Guard uses archiver processes (ARCn) or the log writer process (LGWR) to collect transaction redo data and transmit it to standby destinations. In our system, we use Archiver Processes (ARCn) to Archive Redo Data and also use the fetch archive log (FAL) client and server for automatic gap resolution. The Initialization Parameters for Standby database are changed as following:



*.DB_NAME=PRODDB
*.DB_UNIQUE_NAME=STANDBYDB
*.LOG_ARCHIVE_CONFIG=‘DG_CONFIG=(STANDBYDB,PRODDB)’
*.LOG_ARCHIVE_DEST_STATE_1= ENABLE
*.LOG_ARCHIVE_DEST_1='LOCATION=USE_DB_RECOVERY_FILE_DEST'
*.LOG_ARCHIVE_DEST_STATE_2=ENABLE
*.LOG_ARCHIVE_DEST_2=‘SERVICE=PRODDB VALID_FOR=(ONLINE_LOGFILES, PRIMARY_ROLE) DB_UNIQUE_NAME=PRODDB’
*.FAL_SERVER='PRODDB1_SERV','PRODDB2_SERV'
*.FAL_CLIENT='STANDBYDB_SERV'
*.DB_CREATE_FILE_DEST=+DG_DAT
*.DB_RECOVERY_FILE_DEST=+DG_RECO
*.DB_RECOVERY_FILE_DEST_SIZE=200G
*.DB_FILE_NAME_CONVERT ='+DG_DAT/PRODDB','+DG_DAT/STANDBYDB','+DG_RECO/PRODDB','+DG_RECO/STANDBYDB'
*.LOG_FILE_NAME_CONVERT='+DG_REDO1/PRODDB','+DG_REDO1/STANDBYDB','+DG_REDO2/PRODDB','+DG_REDO2/STANDBYDB''+DG_RECO/PRODDB','+DG_RECO/STANDBYDB'
*.STANDBY_ARCHIVE_DEST='LOCATION=USE_DB_RECOVERY_FILE_DEST'
*.STANDBY_FILE_MANAGEMENT=AUTO




C. Duplicate the Primary Database with RMAN

With the modified Initialization Parameter File, startup the standby database in nomount status. Then duplicate the primary database as a standby database with RMAN command "duplicate target database for standby";

Connect to the standby database, and create the standby redo logs to support the standby role. The standby redo logs must be the same size as the primary database online logs. The recommended number of standby redo logs is [6]:
(maximum # of logfiles +1) * maximum # of threads

This example uses two online log files for each thread. Thus, the number of standby redo logs should be (2 + 1) * 2 = 6. That is, one more standby redo log file for each thread.



D. Start Redo Apply

Start redo apply by mounting database and issuing command "alter database recover managed standby database using current logfile disconnect;"



3.3.3 Configure the Primary Database for Data Guard


The Primary Database needed to do the following:


A. Set log related Initialization Parameters

*.DB_NAME=PRODDB
*.DB_UNIQUE_NAME=PRODDB
*.SERVICE_NAME=PRODDB
*.LOG_ARCHIVE_CONFIG='DG_CONFIG=(PRODDB,STANDBYDB)'
*.LOG_ARCHIVE_DEST_STATE_1=ENABLE
*.LOG_ARCHIVE_DEST_1='LOCATION=USE_DB_RECOVERY_FILE_DEST'
*.LOG_ARCHIVE_DEST_STATE_2=ENABLE
*.LOG_ARCHIVE_DEST_2='SERVICE=STANDBYDB VALID_FOR=(ONLINE_LOGFILES, PRIMARY_ROLE) DB_UNIQUE_NAME=STANDBYDB’
*.FAL_SERVER='STANDBYDB_SERV'
PRODDB1.FAL_CLIENT='PRODDB1_SERV'
PRODDB2.FAL_CLIENT='PRODDB2_SERV'
*.DB_CREATE_FILE_DEST=+DG_DAT
*.DB_RECOVERY_FILE_DEST=+DG_RECO
*.DB_RECOVERY_FILE_DEST_SIZE=200G
*.DB_FILE_NAME_CONVERT ='+DG_DAT/STANDBYDB','+DG_DAT/PRODDB','+DG_RECO/STANDBYDB','+DG_RECO/PRODDB'
*.LOG_FILE_NAME_CONVERT='+DG_REDO1/STANDBYDB','+DG_REDO1/PRODDB','+DG_REDO2/STANDBYDB','+DG_REDO2/PRODDB','+DG_RECO/STANDBYDB','+DG_RECO/PRODDB'
*.STANDBY_FILE_MANAGEMENT=AUTO


B. Add a Standby Redo Log file group for maximum protection other than archived redo log files.


C. Enable Fast-start failover

Fast-Start Failover feature[4] ensure the minimize downtime by enabling automatically and quickly failing over to standby database without requiring manual intervention to execute the failover and in the meantime, the production database is automatically reconfigured as a new standby database upon reconnection to the configuration.


In our system, the observer is located at standby server. We can enable the Fast-start failover as following:

• Set FastStartFailoverTarget and FastStartFailoverThreshold property as below:

DGMGRL> EDIT CONFIGURATION SET PROPERTY FASTSTARTFAILOVERTARGET = 'STANDBYDB';
DGMGRL> EDIT CONFIGURATION SET PROPERTY FASTSTARTFAILOVERTHRESHOLD = 45;
DGMGRL> ENABLE FAST_START FAILOVER;

• Start the Observer from the standby server.


The following database conditions will trigger a fast-start failover[4]:

• Broken network connection between the observer and the primary database;
• Instance failures;
• Shutdown abort;
• Offline datafiles.






4. References

[1] Certify - Product Selection: Real Application Clusters
https://metalink.oracle.com/metalink/plsql/cert_views.platform_selection?p_html_source=0&p_group_no=1041

[2] Sun Cluster Software Installation Guide for Solaris OS
http:/docs.sun.com/app/docs/doc/819-0420

[3] Sun Cluster Data Service for Oracle Real Application Clusters Guide for Solaris OS
http:/docs.sun.com/app/docs/doc/819-0583

[4] Oracle® Data Guard Broker 10g Release 2 (10.2), Oracle Corporation

[5] MAA / Data Guard 10g Setup Guide –Creating a Single Instance Physical Standby for a RAC Primary, Oracle Maximum Availability Architecture White Paper, April 2006

[6] Oracle® Data Guard Concepts and Administration, 10g Release 2 (10.2), Oracle Corporation

[7] Oracle Maximum Availability Architecture website on OTN http://www.oracle.com/technology/deploy/availability/htdocs/maa.htm

No comments: