One of the most efficient ways to become familiar with Oracle Real Application Clusters (RAC) 10g technology is to have access to an actual Oracle RAC 10g cluster. There's no better way to understand its benefitsincluding fault tolerance, security, load balancing, and scalabilitythan to experience them directly.
Unfortunately, for many shops, the price of the hardware required for a typical production RAC configuration makes this goal impossible. A small two-node cluster can cost from US$10,000 to well over US$20,000. That cost would not even include the heart of a production RAC environmenttypically a storage area networkwhich can start at US$8,000.
For those who want to become familiar with Oracle RAC 10g without a major cash outlay, this guide provides a low-cost alternative to configuring an Oracle RAC 10g Release 2 system using commercial off-the-shelf components and downloadable software at an estimated cost of US$1,200 to US$1,800. The system involved comprises a dual-node cluster (each with a single processor) running Linux (CentOS 4.2 or Red Hat Enterprise Linux 4) with a shared disk storage based on IEEE1394 (FireWire) drive technology. (Of course, you could also consider building a virtual cluster on a VMware Virtual Machine, but the experience won't quite be the same!)
Please note that this is not the only way to build a low-cost Oracle RAC 10g system. I have seen other solutions that utilize an implementation based on SCSI rather than FireWire for shared storage. In most cases, SCSI will cost more than our FireWire solution where a typical SCSI card is priced around US$70 and an 80GB external SCSI drive will cost US$700-US$1,000. Keep in mind that some motherboards may already include built-in SCSI controllers.
It is important to note that this configuration should never be run in a production environment and that it is not supported by Oracle or any other vendor. In a production environment, fibre channelthe high-speed serial-transfer interface that can connect systems and storage devices in either point-to-point or switched topologiesis the technology of choice. FireWire offers a low-cost alternative to fibre channel for testing and development, but it is not ready for production.
The Oracle9i and Oracle 10g Release 1 guides used raw partitions for storing files on shared storage, but here we will make use of the Oracle Cluster File System Release 2 (OCFS2) and Oracle Automatic Storage Management (ASM) feature. The two Linux servers will be configured as follows:
| Oracle Database Files | ||||
| RAC Node Name | Instance Name | Database Name | $ORACLE_BASE | File System / Volume Manager for DB Files |
| linux1 | orcl1 | orcl | /u01/app/oracle | ASM |
| linux2 | orcl2 | orcl | /u01/app/oracle | ASM |
| Oracle Clusterware Shared Files | ||||
| File Type | File Name | Partition | Mount Point | File System |
| Oracle Cluster Registry | /u02/oradata/orcl/OCRFile | /dev/sda1 | /u02/oradata/orcl | OCFS2 |
| CRS Voting Disk | /u02/oradata/orcl/CSSFile | /dev/sda1 | /u02/oradata/orcl | OCFS2 |
Note that with Oracle Database 10g Release 2 (10.2), Cluster Ready Services, or CRS, is now called Oracle Clusterware.
The Oracle Clusterware software will be installed to /u01/app/oracle/product/crs on each of the nodes that make up the RAC cluster. However, the Clusterware software requires that two of its files—the Oracle Cluster Registry (OCR) file and the Voting Disk file—be shared with all nodes in the cluster. These two files will be installed on shared storage using OCFS2. It is possible (but not recommended by Oracle) to use RAW devices for these files; however, it is not possible to use ASM for these two Clusterware files.
The Oracle Database 10g Release 2 software will be installed into a separate Oracle Home, namely /u01/app/oracle/product/10.2.0/db_1, on each of the nodes that make up the RAC cluster. All the Oracle physical database files (data, online redo logs, control files, archived redo logs), will be installed to different partitions of the shared drive being managed by ASM. (The Oracle database files can just as easily be stored on OCFS2. Using ASM, however, makes the article that much more interesting!)
Note: This article is only designed to work as documented with absolutely no substitutions. If you are looking for an example that takes advantage of Oracle RAC 10g Release 1 with RHEL 3, click here. For the previously published Oracle9i RAC version of this guide, click here.
Oracle RAC, introduced with Oracle9i, is the successor to Oracle Parallel Server (OPS). RAC allows multiple instances to access the same database (storage) simultaneously. It provides fault tolerance, load balancing, and performance benefits by allowing the system to scale out, and at the same timebecause all nodes access the same databasethe failure of one instance will not cause the loss of access to the database.
At the heart of Oracle RAC is a shared disk subsystem. All nodes in the cluster must be able to access all of the data, redo log files, control files and parameter files for all nodes in the cluster. The data disks must be globally available to allow all nodes to access the database. Each node has its own redo log and control files but the other nodes must be able to access them in order to recover that node in the event of a system failure.
One of the bigger differences between Oracle RAC and OPS is the presence of Cache Fusion technology. In OPS, a request for data between nodes required the data to be written to disk first, and then the requesting node could read that data. With cache fusion, data is passed along a high-speed interconnect using a sophisticated locking algorithm.
Not all clustering solutions use shared storage. Some vendors use an approach known as a federated cluster, in which data is spread across several machines rather than shared by all. With Oracle RAC 10g, however, multiple nodes use the same set of disks for storing data. With Oracle RAC, the data files, redo log files, control files, and archived log files reside on shared storage on raw-disk devices, a NAS, a SAN, ASM, or on a clustered file system. Oracle's approach to clustering leverages the collective processing power of all the nodes in the cluster and at the same time provides failover security.
For more background about Oracle RAC, visit the Oracle RAC Product Center on OTN.
Fibre Channel is one of the most popular solutions for shared storage. As I mentioned previously, Fibre Channel is a high-speed serial-transfer interface used to connect systems and storage devices in either point-to-point or switched topologies. Protocols supported by Fibre Channel include SCSI and IP.
Fibre Channel configurations can support as many as 127 nodes and have a throughput of up to 2.12 gigabits per second. Fibre Channel, however, is very expensive; the switch alone can start at US$1,000 and high-end drives can reach prices of US$300. Overall, a typical Fibre Channel setup (including cards for the servers) costs roughly US$8,000.
A less expensive alternative to Fibre Channel is SCSI. SCSI technology provides acceptable performance for shared storage, but for administrators and developers who are used to GPL-based Linux prices, even SCSI can come in over budget at around US$2,000 to US$5,000 for a two-node cluster.
Another popular solution is the Sun NFS (Network File System) found on a NAS. It can be used for shared storage but only if you are using a network appliance or something similar. Specifically, you need servers that guarantee direct I/O over NFS, TCP as the transport protocol, and read/write block sizes of 32K.
Developed by Apple Computer and Texas Instruments, FireWire is a cross-platform implementation of a high-speed serial data bus. With its high bandwidth, long distances (up to 100 meters in length) and high-powered bus, FireWire is being used in applications such as digital video (DV), professional audio, hard drives, high-end digital still cameras and home entertainment devices. Today, FireWire operates at transfer rates of up to 800 megabits per second while next generation FireWire calls for speeds to a theoretical bit rate to 1,600 Mbps and then up to a staggering 3,200 Mbps. That's 3.2 gigabits per second. This speed will make FireWire indispensable for transferring massive data files and for even the most demanding video applications, such as working with uncompressed high-definition (HD) video or multiple standard-definition (SD) video streams.
The following chart shows speed comparisons of the various types of disk interfaces. For each interface, I provide the maximum transfer rates in kilobits (kb), kilobytes (KB), megabits (Mb), megabytes (MB), and gigabits (Gb) per second. As you can see, the capabilities of IEEE1394 compare very favorably with other available disk interface technologies.
| Disk Interface | Speed | ||||
| Kb | KB | Mb | MB | Gb | |
| Serial | 115 | 14.375 | 0.115 | 0.014 | |
| Parallel (standard) | 920 | 115 | 0.92 | 0.115 | |
| USB 1.1 | 12 | 1.5 | |||
| Parallel (ECP/EPP) | 24 | 3 | |||
| SCSI-1 | 40 | 5 | |||
| SCSI-2 (Fast SCSI / Fast Narrow SCSI) | 80 | 10 | |||
| ATA/100 (parallel) | 100 | 12.5 | |||
| IDE | 133.6 | 16.7 | |||
| Fast Wide SCSI (Wide SCSI) | 160 | 20 | |||
| Ultra SCSI (SCSI-3 / Fast-20 / Ultra Narrow) | 160 | 20 | |||
| Ultra IDE | 264 | 33 | |||
| Wide Ultra SCSI (Fast Wide 20) | 320 | 40 | |||
| Ultra2 SCSI | 320 | 40 | |||
| FireWire 400 - IEEE1394(a) | 400 | 50 | |||
| USB 2.0 | 480 | 60 | |||
| Wide Ultra2 SCSI | 640 | 80 | |||
| Ultra3 SCSI | 640 | 80 | |||
| FireWire 800 - IEEE1394(b) | 800 | 100 | |||
| Serial ATA - (SATA) | 1200 | 150 | 1.2 | ||
| Wide Ultra3 SCSI | 1280 | 160 | 1.28 | ||
| Ultra160 SCSI | 1280 | 160 | 1.28 | ||
| Ultra Serial ATA 1500 | 1500 | 187.5 | 1.5 | ||
| Ultra320 SCSI | 2560 | 320 | 2.56 | ||
| FC-AL Fibre Channel | 3200 | 400 | 3.2 | ||
The hardware we will use to build our example Oracle RAC 10g environment comprises two Linux servers and components that you can purchase at any local computer store or over the Internet.
| Server 1 - (linux1) | |||
|
Dimension 2400 Series |
US$620 | ||
1 - Ethernet LAN Cards
|
US$20 | ||
|
1 - FireWire Card
|
US$30 | ||
| Server 2 - (linux2) | |||
|
Dimension 2400 Series |
US$620 | ||
1 - Ethernet LAN Cards
|
US$20 | ||
|
1 - FireWire Card
|
US$30 | ||
| Miscellaneous Components | |||
FireWire Hard Drive
|
US$280 | ||
|
1 - Extra FireWire Cable |
US$20 | ||
|
1 - Ethernet hub or switch (Used for interconnect int-linux1 / int-linux2) |
US$25 | ||
4 - Network Cables
|
US$5 US$5 US$5 US$5 |
||
| Total | US$1,685 | ||
Now that we know the hardware that will be used in this example, let's take a conceptual look at what the environment looks like:
Figure 1 Architecture
This section provides a summary of the screens used to install the Linux operating system. This guide is designed to work with the Red Hat Enterprise Linux 4 AS/ES (RHEL4) operating environment. As an alternative, and what I used for this article, is CentOS 4.2: a free and stable version of the RHEL4 operating environment.
For more detailed installation instructions, it is possible to use the manuals from Red Hat Linux. I would suggest, however, that the instructions I have provided below be used for this configuration.
Before installing the Linux operating system on both nodes, you should have the FireWire and two NIC interfaces (cards) installed.
Also, before starting the installation, ensure that the FireWire drive (our shared storage drive) is NOT connected to either of the two servers. You may also choose to connect both servers to the FireWire drive and simply turn the power off to the drive.
Download the following ISO images for CentOS 4.2:
After downloading and burning the CentOS images (ISO files) to CD, insert CentOS Disk #1 into the first server (linux1 in this example), power it on, and answer the installation screen prompts as noted below. After completing the Linux installation on the first node, perform the same Linux installation on the second node while substituting the node name linux1 for linux2 and the different IP addresses where appropriate.
Boot Screen
The first screen is the CentOS Enterprise Linux boot screen.
At the boot: prompt, hit [Enter] to start the installation process.
Media Test
When asked to test the CD media, tab over to [Skip] and hit
[Enter]. If there
were any errors, the media burning software would have warned us. After several
seconds, the installer should then detect the video card, monitor, and mouse.
The installer then goes into GUI mode.
Welcome to CentOS Enterprise Linux
At the welcome screen, click [Next] to continue.
Language / Keyboard Selection
The next two screens prompt you for the Language and Keyboard
settings. Make the appropriate selections for your configuration.
Installation Type
Choose the [Custom] option and click [Next] to continue.
Disk Partitioning Setup
Select [Automatically partition] and click [Next] continue.
If there were a previous installation of Linux on this machine, the next screen will ask if you want to "remove" or "keep" old partitions. Select the option to [Remove all partitions on this system]. Also, ensure that the [hda] drive is selected for this installation. I also keep the checkbox [Review (and modify if needed) the partitions created] selected. Click [Next] to continue.
You will then be prompted with a dialog window asking if you really want to remove all partitions. Click [Yes] to acknowledge this warning.
Partitioning
The installer will then allow you to view (and modify if needed)
the disk partitions it automatically selected. In almost all cases, the
installer will choose 100MB for /boot, double the amount of
RAM for swap, and the rest going to the root (/) partition. I like to
have a minimum of 1GB for swap. For the purpose of this install,
I will accept all automatically preferred sizes. (Including
2GB for swap since I have 1GB of RAM installed.)
Starting with RHEL 4, the installer will create the same disk configuration as just noted but will create them using the Logical Volume Manager (LVM). For example, it will partition the first hard drive (/dev/hda for my configuration) into two partitions—one for the /boot partition (/dev/hda1) and the remainder of the disk dedicate to a LVM named VolGroup00 (/dev/hda2). The LVM Volume Group (VolGroup00) is then partitioned into two LVM partitions - one for the root filesystem (/) and another for swap. I basically check that it created at least 1GB of swap. Since I have 1GB of RAM installed, the installer created 2GB of swap. Saying that, I just accept the default disk layout.
Boot Loader Configuration
The installer will use the GRUB boot loader by default.
To use the GRUB boot loader, accept all default values and click [Next] to continue.
Network Configuration
I made sure to install both NIC interfaces (cards) in each of the
Linux machines before starting the operating system installation.
This screen should have successfully detected each of the network
devices.
First, make sure that each of the network devices are checked to [Active on boot]. The installer may choose to not activate eth1.
Second, [Edit] both eth0 and eth1 as follows. You may choose to use different IP addresses for both eth0 and eth1 and that is OK. If possible, try to put eth1 (the interconnect) on a different subnet than eth0 (the public network):
eth0:
- Check off the option to [Configure using DHCP]
- Leave the [Activate on boot] checked
- IP Address: 192.168.1.100
- Netmask: 255.255.255.0
eth1:
- Check off the option to [Configure using DHCP]
- Leave the [Activate on boot] checked
- IP Address: 192.168.2.100
- Netmask: 255.255.255.0
Continue by setting your hostname manually. I used "linux1" for the first node and "linux2" for the second one. Finish this dialog off by supplying your gateway and DNS servers.
Firewall
On this screen, make sure to select [No firewall]
and click [Next] to continue. You may be prompted
with a warning dialog about not setting the firewall.
If this occurs, simply hit [Proceed] to continue.
Additional Language Support/Time Zone
The next two screens allow you to select additional language support
and time zone information.
In almost all cases, you can accept the defaults.
Set Root Password
Select a root password and click [Next] to continue.
Package Group Selection
Scroll down to the bottom of this screen and select
[Everything] under the "Miscellaneous" section. Click
[Next] to continue.
Please note that the installation of Oracle does not require all Linux packages to be installed. My decision to install all packages was for the sake of brevity. Please see section Section 15 ("Check RPM Packages for Oracle 10g Release 2") for a more detailed look at the critical packages required for a successful Oracle installation.
Note that with some RHEL4 distributions, you will not get the "Package Group Selection" screen by default. There, you are asked to simply "Install default software packages" or "Customize software packages to be installed". Select the option to "Customize software packages to be installed" and click [Next] to continue. This will then bring up the "Package Group Selection" screen. Now, scroll down to the bottom of this screen and select [Everything] under the "Miscellaneous" section. Click [Next] to continue.
About to Install
This screen is basically a confirmation screen. Click [Next]
to start the installation. During the installation process,
you will be asked to switch disks to Disk #2, Disk #3, and then Disk #4.
Click [Continue] to start the installation process.
Note that with CentOS 4.2, the installer will ask to switch to Disk #2, Disk #3, Disk #4, Disk #1, and then back to Disk #4.
Graphical Interface (X) Configuration
With most RHEL4 distributions (not the case with CentOS 4.2), when the installation
is complete, the installer will attempt to detect
your video hardware. Ensure that the installer has detected
and selected the correct video hardware (graphics card and monitor) to
properly use the X Windows server. You will continue with the X
configuration in the next serveral screens.
Congratulations
And that's it. You have successfully installed CentOS Enterprise Linux
on the first node (linux1). The installer will eject the CD
from the CD-ROM drive. Take out the CD and click [Exit] to reboot
the system.
When the system boots into Linux for the first time, it will prompt you with another Welcome screen. The following wizard allows you to configure the date and time, add any additional users, testing the sound card, and to install any additional CDs. The only screen I care about is the time and date (and if you are using CentOS 4.x, the monitor/display settings). As for the others, simply run through them as there is nothing additional that needs to be installed (at this point anyways!). If everything was successful, you should now be presented with the login screen.
Perform the same installation on the second node
After completing the Linux installation on the first node, repeat the above
steps for the second node (linux2). When configuring the machine name
and networking, ensure to configure the proper values. For my installation,
this is what I configured for linux2:
First, make sure that each of the network devices are checked to [Active on boot]. The installer will choose not to activate eth1.
Second, [Edit] both eth0 and eth1 as follows:
eth0:
- Check off the option to [Configure using DHCP]
- Leave the [Activate on boot] checked
- IP Address: 192.168.1.101
- Netmask: 255.255.255.0
eth1:
- Check off the option to [Configure using DHCP]
- Leave the [Activate on boot] checked
- IP Address: 192.168.2.101
- Netmask: 255.255.255.0
Continue by setting your hostname manually. I used
"linux2" for the second node.
Finish this dialog off by supplying your gateway and
DNS servers.
Note: Although we configured several of the network settings during the Linux installation, it is important to not skip this section as it contains critical steps that are required for the RAC environment.
Introduction to Network Settings
During the Linux O/S install you already configured the IP address and host name for each of the nodes. You now need to configure the /etc/hosts file as well as adjust several of the network settings for the interconnect. I also include instructions for enabling Telnet and FTP services.
Each node should have one static IP address for the public network and one static IP address for the private cluster interconnect. The private interconnect should only be used by Oracle to transfer Cluster Manager and Cache Fusion related data. Although it is possible to use the public network for the interconnect, this is not recommended as it may cause degraded database performance (reducing the amount of bandwidth for Cache Fusion and Cluster Manager traffic). For a production RAC implementation, the interconnect should be at least gigabit or more and only be used by Oracle.
Configuring Public and Private Network
In our two-node example, you need to configure the network on both nodes
for access to the public network as well as their private interconnect.
The easiest way to configure network settings in RHEL4 is with the Network Configuration program. This application can be started from the command-line as the root user account as follows:
# su - # /usr/bin/system-config-network &Do not use DHCP naming for the public IP address or the interconnects; you need static IP addresses!
Using the Network Configuration application, you need to configure both NIC devices as well as the /etc/hosts file. Both of these tasks can be completed using the Network Configuration GUI. Notice that the /etc/hosts settings are the same for both nodes.
Our example configuration will use the following settings:
| Server 1 (linux1) | |||
| Device | IP Address | Subnet | Purpose |
| eth0 | 192.168.1.100 | 255.255.255.0 | Connects linux1 to the public network |
| eth1 | 192.168.2.100 | 255.255.255.0 | Connects linux1 (interconnect) to linux2 (int-linux2) |
| /etc/hosts | |||
127.0.0.1 localhost loopback # Public Network - (eth0) 192.168.1.100 linux1 192.168.1.101 linux2 # Private Interconnect - (eth1) 192.168.2.100 int-linux1 192.168.2.101 int-linux2 # Public Virtual IP (VIP) addresses for - (eth0) 192.168.1.200 vip-linux1 192.168.1.201 vip-linux2 |
|||
| Server 2 (linux2) | |||
| Device | IP Address | Subnet | Purpose |
| eth0 | 192.168.1.101 | 255.255.255.0 | Connects linux2 to the public network |
| eth1 | 192.168.2.101 | 255.255.255.0 | Connects linux2 (interconnect) to linux1 (int-linux1) |
| /etc/hosts | |||
127.0.0.1 localhost loopback # Public Network - (eth0) 192.168.1.100 linux1 192.168.1.101 linux2 # Private Interconnect - (eth1) 192.168.2.100 int-linux1 192.168.2.101 int-linux2 # Public Virtual IP (VIP) addresses for - (eth0) 192.168.1.200 vip-linux1 192.168.1.201 vip-linux2 |
|||
In the screenshots below, only node 1 (linux1) is shown. Be sure to make all the proper network settings to both nodes.
Figure 2 Network Configuration Screen, Node 1 (linux1)
Figure 3 Ethernet Device Screen, eth0 (linux1)
Figure 4 Ethernet Device Screen, eth1 (linux1)
Figure 5: Network Configuration Screen, /etc/hosts (linux1)
When the network if configured, you can use the ifconfig command to verify everything is working. The following example is from linux1:
$ /sbin/ifconfig -a
eth0 Link encap:Ethernet HWaddr 00:0D:56:FC:39:EC
inet addr:192.168.1.100 Bcast:192.168.1.255 Mask:255.255.255.0
inet6 addr: fe80::20d:56ff:fefc:39ec/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:835 errors:0 dropped:0 overruns:0 frame:0
TX packets:1983 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:705714 (689.1 KiB) TX bytes:176892 (172.7 KiB)
Interrupt:3
eth1 Link encap:Ethernet HWaddr 00:0C:41:E8:05:37
inet addr:192.168.2.100 Bcast:192.168.2.255 Mask:255.255.255.0
inet6 addr: fe80::20c:41ff:fee8:537/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:9 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:0 (0.0 b) TX bytes:546 (546.0 b)
Interrupt:11 Base address:0xe400
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:16436 Metric:1
RX packets:5110 errors:0 dropped:0 overruns:0 frame:0
TX packets:5110 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:8276758 (7.8 MiB) TX bytes:8276758 (7.8 MiB)
sit0 Link encap:IPv6-in-IPv4
NOARP MTU:1480 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:0 (0.0 b) TX bytes:0 (0.0 b)
About Virtual IP
Why is there a Virtual IP (VIP) in 10g? Why does it just return a dead connection when its primary node fails?
It's all about availability of the application. When a node fails, the VIP associated with it is supposed to be automatically failed over to some other node. When this occurs, two things happen.
This means that when the client issues SQL to the node that is now down, or traverses the address list while connecting, rather than waiting on a very long TCP/IP time-out (~10 minutes), the client receives a TCP reset. In the case of SQL, this is ORA-3113. In the case of connect, the next address in tnsnames is used.
Going one step further is making use of Transparent Application Failover (TAF). With TAF successfully configured, it is possible to completely avoid ORA-3113 errors alltogether! TAF will be discussed in more detail in Section 28 ("Transparent Application Failover - (TAF)").
Without using VIPs, clients connected to a node that died will often wait a 10-minute TCP timeout period before getting an error. As a result, you don't really have a good HA solution without using VIPs (Source - Metalink Note 220970.1).
Confirm the RAC Node Name is Not Listed in Loopback Address
Ensure that the node names (linux1 or linux2) are not included for the loopback address in the /etc/hosts file. If the machine name is listed in the in the loopback address entry as below:
127.0.0.1 linux1 localhost.localdomain localhostit will need to be removed as shown below:
127.0.0.1 localhost.localdomain localhost
If the RAC node name is listed for the loopback address, you will receive the following error during the RAC installation:
ORA-00603: ORACLE server session terminated by fatal erroror
ORA-29702: error occurred in Cluster Group Service operation
Adjusting Network Settings
With Oracle 9.2.0.1 and later, Oracle makes use of UDP as the default protocol on Linux for inter-process communication (IPC), such as Cache Fusion and Cluster Manager buffer transfers between instances within the RAC cluster.
Oracle strongly suggests to adjust the default and maximum send buffer size (SO_SNDBUF socket option) to 256KB, and the default and maximum receive buffer size (SO_RCVBUF socket option) to 256KB.
The receive buffers are used by TCP and UDP to hold received data until it is read by the application. The receive buffer cannot overflow because the peer is not allowed to send data beyond the buffer size window. This means that datagrams will be discarded if they don't fit in the socket receive buffer, potentially causing the sender to overwhelm the receiver.
The default and maximum window size can be changed in the /proc file system without reboot:
# su - root # sysctl -w net.core.rmem_default=262144 net.core.rmem_default = 262144 # sysctl -w net.core.wmem_default=262144 net.core.wmem_default = 262144 # sysctl -w net.core.rmem_max=262144 net.core.rmem_max = 262144 # sysctl -w net.core.wmem_max=262144 net.core.wmem_max = 262144
The above commands made the changes to the already running OS. You should now make the above changes permanent (for each reboot) by adding the following lines to the /etc/sysctl.conf file for each node in your RAC cluster:
# Default setting in bytes of the socket receive buffer net.core.rmem_default=262144 # Default setting in bytes of the socket send buffer net.core.wmem_default=262144 # Maximum socket receive buffer size which may be set by using # the SO_RCVBUF socket option net.core.rmem_max=262144 # Maximum socket send buffer size which may be set by using # the SO_SNDBUF socket option net.core.wmem_max=262144
Enabling Telnet and FTP Services
Linux is configured to run the Telnet and FTP server, but by default, these services are disabled. To enable the telnet these service, login to the server as the root user account and run the following commands:
# chkconfig telnet on # service xinetd reload Reloading configuration: [ OK ]
Starting with the Red Hat Enterprise Linux 3.0 release (and in CentOS), the FTP server (wu-ftpd) is no longer available with xinetd. It has been replaced with vsftp and can be started from /etc/init.d/vsftpd as in the following:
# /etc/init.d/vsftpd start Starting vsftpd for vsftpd: [ OK ]If you want the vsftpd service to start and stop when recycling (rebooting) the machine, you can create the following symbolic links:
# ln -s /etc/init.d/vsftpd /etc/rc3.d/S56vsftpd # ln -s /etc/init.d/vsftpd /etc/rc4.d/S56vsftpd # ln -s /etc/init.d/vsftpd /etc/rc5.d/S56vsftpd
Perform the following kernel upgrade and FireWire modules install on all nodes in the cluster!
The next step is to obtain and install a new Linux kernel and the FireWire modules that support the use of IEEE1394 devices with multiple logins. This will require two separate downloads and installs: one for the new RHEL4 kernel and a second one that includes the supporting FireWire modules.
In a previous version of this guide, I included the steps to download a patched version of the Linux kernel (source code) and then compile it. Thanks to Oracle's Linux Projects Development Team , this is no longer a requirement. Oracle now provides a pre-compiled kernel for RHEL4 (which also works with CentOS!), that can simply be downloaded and installed. The instructions for downloading and installing the kernel and supporting FireWire modules are included in this section. Before going into the details of how to perform these actions, however, let's take a moment to discuss the changes that are required in the new kernel.
While FireWire drivers already exist for Linux, they often do not support shared storage. Typically when you logon to an OS, the OS associates the driver to a specific drive for that machine alone. This implementation simply will not work for our RAC configuration. The shared storage (our FireWire hard drive) needs to be accessed by more than one node. You need to enable the FireWire driver to provide nonexclusive access to the drive so that multiple serversthe nodes that comprise the clusterwill be able to access the same storage. This goal is accomplished by removing the bit mask that identifies the machine during login in the source code, resulting in nonexclusive access to the FireWire hard drive. All other nodes in the cluster login to the same drive during their logon session, using the same modified driver, so they too also have nonexclusive access to the drive.
Your implementation describes a dual node cluster (each with a single processor), each server running CentOS Enterprise Linux. Keep in mind that the process of installing the patched Linux kernel and supporting FireWire modules will need to be performed on both Linux nodes. CentOS Enterprise Linux 4.2 includes kernel 2.6.9-22.EL #1. We will need to download the OTN-supplied 2.6.9-11.0.0.10.3.EL #1 Linux kernel and the supporting FireWire modules from the following two URLs:
Download one of the following files for the new RHEL 4 Kernel:
kernel-2.6.9-11.0.0.10.3.EL.i686.rpm - (for single processor)
or
kernel-smp-2.6.9-11.0.0.10.3.EL.i686.rpm - (for multiple processors)
Download one of the following files for the supporting FireWire Modules:
oracle-firewire-modules-2.6.9-11.0.0.10.3.EL-1286-1.i686.rpm - (for single processor)
or
oracle-firewire-modules-2.6.9-11.0.0.10.3.ELsmp-1286-1.i686.rpm - (for multiple processors)
Install the new RHEL 4 kernel, as root:
# rpm -ivh --force kernel-2.6.9-11.0.0.10.3.EL.i686.rpm - (for single processor)or
# rpm -ivh --force kernel-smp-2.6.9-11.0.0.10.3.EL.i686.rpm - (for multiple processors)Installing the new kernel using RPM will also update your GRUB (or lilo) configuration with the appropiate stanza and default boot option. There is no need to to modify your boot loader configuration after installing the new kernel.
Note: After installing the new kernel, do not proceed to install the supporting FireWire modules at this time! A reboot into the new kernel is required before the FireWire modules can be installed.
Reboot into the new Linux server:
At this point, the new RHEL4 kernel is installed. You now need to reboot into the new Linux kernel:
# init 6
Install the supporting FireWire modules, as root:
After booting into the new RHEL 4 kernel, you need to install the supporting FireWire modules package by running either of the following:
# rpm -ivh oracle-firewire-modules-2.6.9-11.0.0.10.3.EL-1286-1.i686.rpm - (for single processor) - OR - # rpm -ivh oracle-firewire-modules-2.6.9-11.0.0.10.3.ELsmp-1286-1.i686.rpm - (for multiple processors)
Add module options:
Add the following lines to /etc/modprobe.conf:
options sbp2 exclusive_login=0It is vital that the parameter sbp2 exclusive_login of the Serial Bus Protocol module (sbp2) be set to zero to allow multiple hosts to login to and access the FireWire disk concurrently.
Perform the above tasks on the second Linux server:
With the new RHEL4 kernel and supporting FireWire modules installed on the first Linux server, move on to the second Linux server and repeat the same tasks in this section on it.
Connect FireWire drive to each machine and boot into the new kernel:
After performing the above tasks on both nodes in the cluster, power down both Linux machines:
=============================== # hostname linux1 # init 0 =============================== # hostname linux2 # init 0 ===============================After both machines are powered down, connect each of them to the back of the FireWire drive. Power on the FireWire drive. Finally, power on each Linux server and ensure to boot each machine into the new kernel.
Note: RHEL4 users will be prompted during the boot process on both nodes at the "Probing for New Hardware" section for your FireWire hard drive. Simply select the option to "Configure" the device and continue the boot process. If you are not prompted during the "Probing for New Hardware" section for the new FireWire drive, you will need to run the following commands and reboot the machine:
# modprobe -r sbp2 # modprobe -r sd_mod # modprobe -r ohci1394 # modprobe ohci1394 # modprobe sd_mod # modprobe sbp2 # init 6
Loading the FireWire stack:
In most cases, the loading of the FireWire stack will already be configured in the /etc/rc.sysinit file. The commands that are contained within this file that are responsible for loading the FireWire stack are:
# modprobe sbp2 # modprobe ohci1394In older versions of Red Hat, this was not the case and these commands would have to be manually run or put within a startup file. With Red Hat Enterprise Linux 3 and later, these commands are already put within the /etc/rc.sysinit file and run on each boot.
Check for SCSI Device:
After each machine has rebooted, the kernel should automatically detect the disk as a SCSI device (/dev/sdXX). This section will provide several commands that should be run on all nodes in the cluster to verify the FireWire drive was successfully detected and being shared by all nodes in the cluster.
For this configuration, I was performing the above procedures on both nodes at the same time. When complete, I shutdown both machines, started linux1 first, and then linux2. The following commands and results are from my linux2 machine. Again, make sure that you run the following commands on all nodes to ensure both machine can login to the shared drive.
Let's first check to see that the FireWire adapter was successfully detected:
# lspci
00:00.0 Host bridge: Intel Corporation 82845G/GL[Brookdale-G]/GE/PE DRAM Controller/Host-Hub Interface (rev 01)
00:02.0 VGA compatible controller: Intel Corporation 82845G/GL[Brookdale-G]/GE Chipset Integrated Graphics Device (rev 01)
00:1d.0 USB Controller: Intel Corporation 82801DB/DBL/DBM (ICH4/ICH4-L/ICH4-M) USB UHCI Controller #1 (rev 01)
00:1d.1 USB Controller: Intel Corporation 82801DB/DBL/DBM (ICH4/ICH4-L/ICH4-M) USB UHCI Controller #2 (rev 01)
00:1d.2 USB Controller: Intel Corporation 82801DB/DBL/DBM (ICH4/ICH4-L/ICH4-M) USB UHCI Controller #3 (rev 01)
00:1d.7 USB Controller: Intel Corporation 82801DB/DBM (ICH4/ICH4-M) USB2 EHCI Controller (rev 01)
00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev 81)
00:1f.0 ISA bridge: Intel Corporation 82801DB/DBL (ICH4/ICH4-L) LPC Interface Bridge (rev 01)
00:1f.1 IDE interface: Intel Corporation 82801DB (ICH4) IDE Controller (rev 01)
00:1f.3 SMBus: Intel Corporation 82801DB/DBL/DBM (ICH4/ICH4-L/ICH4-M) SMBus Controller (rev 01)
00:1f.5 Multimedia audio controller: Intel Corporation 82801DB/DBL/DBM (ICH4/ICH4-L/ICH4-M) AC'97 Audio Controller (rev 01)
01:04.0 Ethernet controller: Linksys NC100 Network Everywhere Fast Ethernet 10/100 (rev 11)
01:06.0 FireWire (IEEE 1394): Texas Instruments TSB43AB23 IEEE-1394a-2000 Controller (PHY/Link)
01:09.0 Ethernet controller: Broadcom Corporation BCM4401 100Base-T (rev 01)
Second, let's check to see that the modules are loaded:
# lsmod |egrep "ohci1394|sbp2|ieee1394|sd_mod|scsi_mod" sd_mod 17217 0 sbp2 23948 0 scsi_mod 121293 2 sd_mod,sbp2 ohci1394 35784 0 ieee1394 298228 2 sbp2,ohci1394Third, let's make sure the disk was detected and an entry was made by the kernel:
# cat /proc/scsi/scsi Attached devices: Host: scsi0 Channel: 00 Id: 01 Lun: 00 Vendor: Maxtor Model: OneTouch II Rev: 023g Type: Direct-Access ANSI SCSI revision: 06Now let's verify that the FireWire drive is accessible for multiple logins and shows a valid login:
# dmesg | grep sbp2 sbp2: $Rev: 1265 $ Ben Collins <bcollins@debian.org> ieee1394: sbp2: Maximum concurrent logins supported: 2 ieee1394: sbp2: Number of active logins: 0 ieee1394: sbp2: Logged into SBP-2 deviceFrom the above output, you can see that the FireWire drive I have can support concurrent logins by up to 2 servers. It is vital that you have a drive where the chipset supports concurrent access for all nodes within the RAC cluster.
One other test I like to perform is to run a quick fdisk -l from each node in the cluster to verify that it is really being picked up by the OS. Your drive may show that the device does not contain a valid partition table, but this is OK at this point of the RAC configuration.
# fdisk -l Disk /dev/hda: 40.0 GB, 40000000000 bytes 255 heads, 63 sectors/track, 4863 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Device Boot Start End Blocks Id System /dev/hda1 * 1 13 104391 83 Linux /dev/hda2 14 4863 38957625 8e Linux LVM Disk /dev/sda: 300.0 GB, 300090728448 bytes 255 heads, 63 sectors/track, 36483 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Device Boot Start End Blocks Id System /dev/sda1 1 36483 293049666 c W95 FAT32 (LBA)
Rescan SCSI bus no longer required:
In older versions of the kernel, I would need to run the rescan-scsi-bus.sh script in order to detect the FireWire drive. The purpose of this script was to create the SCSI entry for the node by using the following command:
echo "scsi add-single-device 0 0 0 0" > /proc/scsi/scsiWith RHEL3 and RHEL4, this step is no longer required and the disk should be detected automatically.
Troubleshooting SCSI Device Detection:
If you are having troubles with any of the procedures (above) in detecting the SCSI device, you can try the following:
# modprobe -r sbp2 # modprobe -r sd_mod # modprobe -r ohci1394 # modprobe ohci1394 # modprobe sd_mod # modprobe sbp2You may also want to unplug any USB devices connected to the server. The system may not be able to recognize your FireWire drive if you have a USB device attached!
Perform the following tasks on all nodes in the cluster!
You will be using OCFS2 to store the files required to be shared for the Oracle Clusterware software. When using OCFS2, the UID of the UNIX user oracle and GID of the UNIX group dba should be identical on all machines in the cluster. If either the UID or GID are different, the files on the OCFS file system may show up as "unowned" or may even be owned by a different user. For this article, I will use 175 for the oracle UID and 115 for the dba GID.
Create Group and User for Oracle
Let's continue our example by creating the Unix dba group and oracle user account along with all appropriate directories.
# mkdir -p /u01/app # groupadd -g 115 dba # useradd -u 175 -g 115 -d /u01/app/oracle -s /bin/bash -c "Oracle Software Owner" -p oracle oracle # chown -R oracle:dba /u01 # passwd oracle # su - oracle
Note: When you are setting the Oracle environment variables for each RAC node, ensure to assign each RAC node a unique Oracle SID! For this example, I used:
....................................
# .bash_profile
# Get the aliases and functions
if [ -f ~/.bashrc ]; then
. ~/.bashrc
fi
alias ls="ls -FA"
# User specific environment and startup programs
export ORACLE_BASE=/u01/app/oracle
export ORACLE_HOME=$ORACLE_BASE/product/10.2.0/db_1
export ORA_CRS_HOME=$ORACLE_BASE/product/crs
export ORACLE_PATH=$ORACLE_BASE/common/oracle/sql:.:$ORACLE_HOME/rdbms/admin
# Each RAC node must have a unique ORACLE_SID. (i.e. orcl1, orcl2,...)
export ORACLE_SID=orcl1
export PATH=.:${PATH}:$HOME/bin:$ORACLE_HOME/bin
export PATH=${PATH}:/usr/bin:/bin:/usr/bin/X11:/usr/local/bin
export PATH=${PATH}:$ORACLE_BASE/common/oracle/bin
export ORACLE_TERM=xterm
export TNS_ADMIN=$ORACLE_HOME/network/admin
export ORA_NLS10=$ORACLE_HOME/nls/data
export LD_LIBRARY_PATH=$ORACLE_HOME/lib
export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:$ORACLE_HOME/oracm/lib
export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/lib:/usr/lib:/usr/local/lib
export CLASSPATH=$ORACLE_HOME/JRE
export CLASSPATH=${CLASSPATH}:$ORACLE_HOME/jlib
export CLASSPATH=${CLASSPATH}:$ORACLE_HOME/rdbms/jlib
export CLASSPATH=${CLASSPATH}:$ORACLE_HOME/network/jlib
export THREADS_FLAG=native
export TEMP=/tmp
export TMPDIR=/tmp
....................................
Create Mount Point for OCFS2 / Clusterware
Finally, create the mount point for the OCFS2 filesystem that will be used to store the two Oracle Clusterware shared files. These commands will need to be run as the "root" user account:
$ su - # mkdir -p /u02/oradata/orcl # chown -R oracle:dba /u02
Ensure Adequate temp Space for OUI
Note: The Oracle Universal Installer (OUI) requires at most 400MB of free space in the /tmp directory.
You can check the available space in /tmp by running the following command:
# cat /proc/swaps Filename Type Size Used Priority /dev/mapper/VolGroup00-LogVol01 partition 2031608 0 -1
-OR-
# cat /proc/meminfo | grep SwapTotal SwapTotal: 2031608 kB
If for some reason you do not have enough space in /tmp, you can temporarily create space in another file system and point your TEMP and TMPDIR to it for the duration of the install. Here are the steps to do this:
# su -
# mkdir /<AnotherFilesystem>/tmp
# chown root.root /<AnotherFilesystem>/tmp
# chmod 1777 /<AnotherFilesystem>/tmp
# export TEMP=/<AnotherFilesystem>/tmp # used by Oracle
# export TMPDIR=/<AnotherFilesystem>/tmp # used by Linux programs
# like the linker "ld"
When the installation of Oracle is complete, you can remove the temporary directory using the following:
# su - # rmdir /<AnotherFilesystem>/tmp # unset TEMP # unset TMPDIR
Create the following partitions on only one node in the cluster!
The next step is to create the required partitions on the FireWire (shared) drive. As I mentioned previously, you will use OCFS2 to store the two files to be shared for Oracle's Clusterware software. You will then create three ASM volumes; two for all physical database files (data/index files, online redo log files, control files, SPFILE, and archived redo log files) and one for the Flash Recovery Area.
The following table lists the individual partitions that will be created on the FireWire (shared) drive and what files will be contained on them.
| Oracle Shared Drive Configuration | |||||
| File System Type | Partition | Size | Mount Point | ASM Diskgroup Name | File Types |
| OCFS2 | /dev/sda1 | 1GB | /u02/oradata/orcl | Oracle Cluster Registry File - (~100MB) CRS Voting Disk - (~20MB) |
|
| ASM | /dev/sda2 | 50GB | ORCL:VOL1 | +ORCL_DATA1 | Oracle Database Files |
| ASM | /dev/sda3 | 50GB | ORCL:VOL2 | +ORCL_DATA1 | Oracle Database Files |
| ASM | /dev/sda4 | 100GB | ORCL:VOL3 | +FLASH_RECOVERY_AREA | Oracle Flash Recovery Area |
| Total | 201GB | ||||
Create All Partitions on FireWire Shared Storage
As shown in the table above, my FireWire drive shows up as the SCSI device /dev/sda. The fdisk command is used for creating (and removing) partitions. For this configuration, we will be creating four partitions: one for Oracle's Clusterware shared files and the other three for ASM (to store all Oracle database files and the Flash Recovery Area). Before creating the new partitions, it is important to remove any existing partitions (if they exist) on the FireWire drive:
# fdisk /dev/sda Command (m for help): p Disk /dev/sda: 300.0 GB, 300090728448 bytes 255 heads, 63 sectors/track, 36483 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Device Boot Start End Blocks Id System /dev/sda1 1 36483 293049666 c W95 FAT32 (LBA) Command (m for help): d Selected partition 1 Command (m for help): p Disk /dev/sda: 300.0 GB, 300090728448 bytes 255 heads, 63 sectors/track, 36483 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Device Boot Start End Blocks Id System Command (m for help): n Command action e extended p primary partition (1-4) p Partition number (1-4): 1 First cylinder (1-36483, default 1): 1 Last cylinder or +size or +sizeM or +sizeK (1-36483, default 36483): +1G Command (m for help): n Command action e extended p primary partition (1-4) p Partition number (1-4): 2 First cylinder (124-36483, default 124): 124 Last cylinder or +size or +sizeM or +sizeK (124-36483, default 36483): +50G Command (m for help): n Command action e extended p primary partition (1-4) p Partition number (1-4): 3 First cylinder (6204-36483, default 6204): 6204 Last cylinder or +size or +sizeM or +sizeK (6204-36483, default 36483): +50G Command (m for help): n Command action e extended p primary partition (1-4) p Selected partition 4 First cylinder (12284-36483, default 12284): 12284 Last cylinder or +size or +sizeM or +sizeK (12284-36483, default 36483): +100G Command (m for help): p Disk /dev/sda: 300.0 GB, 300090728448 bytes 255 heads, 63 sectors/track, 36483 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Device Boot Start End Blocks Id System /dev/sda1 1 123 987966 83 Linux /dev/sda2 124 6203 48837600 83 Linux /dev/sda3 6204 12283 48837600 83 Linux /dev/sda4 12284 24442 97667167+ 83 Linux Command (m for help): w The partition table has been altered! Calling ioctl() to re-read partition table. Syncing disks.
After creating all required partitions, you should now inform the kernel of the partition changes using the following syntax as the root user account:
# partprobe # fdisk -l /dev/sda Disk /dev/sda: 300.0 GB, 300090728448 bytes 255 heads, 63 sectors/track, 36483 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Device Boot Start End Blocks Id System /dev/sda1 1 123 987966 83 Linux /dev/sda2 124 6203 48837600 83 Linux /dev/sda3 6204 12283 48837600 83 Linux /dev/sda4 12284 24442 97667167+ 83 Linux(Note: The FireWire drive and partitions created will be exposed as a SCSI device.)