RAC:real application clusters introduction  
ORACLE RAC原理:
在一个应用环境当中,所有的服务器使用和管理同一个数据库,目的是为了分散每一台服务器的工作量,硬件上至少需要两台以上的服务器,而且还需一个共享存储设备。还需要两类软件,一个是集群软件,另外一个就是Oracle数据库中的RAC组件,所有服务器上的OS都应该是同一类OS。
根据负载均衡的配置策略,当一个客户端发送请求到某一台服务的listener后,这台服务器根据负载均衡策略,会把请求发送给本机的RAC组件处理,也可能会发送给另外一台服务器的RAC组件处理,处理完请求后,RAC会通过集群软件来访问我们的共享存储设备。
逻辑结构上,每一个参加集群的节点有一个独立的instance,这些instance访问同一个数据库。节点之间通过集群软件的通讯层(communication layer)来进行通讯。同时为了减少IO的消耗,存在了一个全局缓存服务,因此每一个数据库的instance,都保留了一份相同的数据库cache。
RAC中的特点是:
每一个节点的instance都有自己的SGA,background process,redo logs,undo表空间
所有节点都共享一份datafiles和controlfiles
Oracle缓存融合技术(Cache fusion):  
- 保证缓存的一致性  
 
- 减少共享磁盘IO的消耗
 
缓存融合原理:  
- 其中一个节点会从共享数据库中读取一个block到db cache中
 
- 这个节点会在所有的节点进行交叉db block copy
 
- 当任何一个节点缓存被修改的时候,就会在节点之间进行缓存修改
 
- 为了达到存储的一致最终修改的结果也会写到磁盘上
 
ClusterWare组件:
有四种服务  
- Crsd - 集群资源服务
 
- Cssd - 集群同步服务
 
- Evmd - 事件管理服务
 
- oprocd - 节点检测监控
 
三类Resource  
- VIP - 虚拟IP地址(Virtual IP)
 
- OCR - Oracle Cluster Registry(集群注册文件),记录每个节点的相关信息
 
- Voting Disk - Establishes quorum (表决磁盘),仲裁机制用于仲裁多个节点向共享节点同时写的行为,这样做是为了避免发生冲突
 
RAC组件:
提供过了额外的进程,用来维护数据库  
- LMS - Gobal Cache Service Process 全局缓存服务进程
 
- LMD - Global Enqueue Service Daemon 全局查询服务守护进程
 
- LMON - Global Enqueue Service Monitor全局查询服务监视进程
 
- LCK0 - Instance Enqueue Process 实例查询进程
 
规划
拓扑图:
  
IP规划
| Node | 
Public IP(Bond0) | 
Heartbeat(eth2) | 
Private IP | 
System | 
hostname | 
Memory | 
| RAC1 | 
192.168.100.241/24 | 
192.168.90.1/24 | 
eth3:192.168.80.1/24 | 
Centos 6.4 | 
rac1.example.com | 
3G | 
| RAC2 | 
192.168.100.242/24 | 
192.168.90.2/24 | 
eth3:192.168.80.1/24 | 
Centos 6.4 | 
rac2.example.com | 
3G | 
| Storage | 
- | 
- | 
bond0:192.168.80.3/24 | 
CentOS 6.4 | 
iscsi | 
512M | 
Storage磁盘规划
| 存储组件 | 
文件系统 | 
卷大小 | 
ASM卷组名 | 
ASM冗余 | 
ASM磁盘组 | 
| OCR/表决磁盘 | 
ASM | 
2G | 
+CRS | 
External | 
DISK1 | 
| 数据库文件 | 
ASM | 
40G | 
+RACDB_DATA | 
External | 
DISK2 | 
| 快速恢复区 | 
ASM | 
40G | 
+FRA | 
External | 
DISK3 | 
安装前准备
环境:Centos 6.4
   Oracle 11.2.0.4  
附件:
hosts
cvuqdisk-1.0.9-1.rpm
pdksh-5.2.14-1.i386.rpm
plsql+instantclient_11_2安装包  
利用ISCSI搭建后台存储
1 2 3 4 5 
  | [root@iscsi ~] Target 1: iqn.disk1 Target 2: iqn.disk2 Target 3: iqn.disk3 [root@iscsi ~] 
  | 
rac1和rac2挂载后分区
1 2 3 4 5 
  | [root@rac1 software] Disk /dev/sdb: 2147 MB, 2147483648 bytes Disk /dev/sdc: 42.9 GB, 42949672960 bytes Disk /dev/sdd: 42.9 GB, 42949672960 bytes [root@rac1 software] 
  | 
1 2 3 4 5 6 7 8 9 10 
  | [root@rac1 ~] rac1.example.com [root@rac1 ~] [root@rac2 ~] rac2.example.com [root@rac2 ~] [root@rac1 ~] PING rac2 (192.168.100.242) 56(84) bytes of data. 64 bytes from rac2 (192.168.100.242): icmp_seq=1 ttl=64 time=0.622 ms 64 bytes from rac2 (192.168.100.242): icmp_seq=2 ttl=64 time=0.369 ms 
  | 
 
在rac1和rac2上需要做地址解析
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 
  | [root@rac1 ~] 192.168.100.241	rac1	rac1.example.com 192.168.100.242 rac2	rac2.example.com 192.168.100.244	rac1-vip 192.168.100.245	rac2-vip 192.168.100.246	racscan 192.168.90.1 rac1-priv 192.168.90.2 rac2-priv 192.168.80.1 rac1-s 192.168.80.2 rac2-s 192.168.80.3 iscsi [root@rac1 ~] 
  | 
关闭防火墙和Selinux
要求:内存至少2G,swap:16GB内存以内内存的1.5或者1倍,16GB内存以上设置16GB
1 2 3 4 5 6 7 
  | [root@rac1 software] Linux rac1 2.6.32-358.el6.x86_64  [root@rac1 ~] [root@rac1 ~] [root@rac1 ~] tmpfs                  10G     0   10G   0% /dev/shm [root@rac1 ~] 
  | 

1 2 3 4 5 6 7 8 9 10 11 
  | [root@rac1 ~] [root@rac1 ~] [root@rac1 ~] [root@rac1 ~] [root@rac1 ~] [root@rac1 ~] [root@rac1 ~] [root@rac1 ~] [root@rac1 ~] [root@rac1 ~] [root@rac1 ~] 
  | 
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 
  | [root@rac1 ~] oracle soft nproc 2047 oracle hard nproc 16384 oracle soft nofile 1024 oracle hard nofile 65536 oracle soft stack 10240 grid soft nproc 2047 grid hard nproc 16384 grid soft nofile 1024 grid hard nofile 65536 grid soft stack 10240 [root@rac1 ~] [root@rac1 ~] session	  required	pam_limits.so [root@rac1 ~] [root@rac1 ~] fs.aio-max-nr = 1048576 fs.file-max = 6815744 kernel.shmall = 2097152 kernel.shmmax = 4294967295 kernel.shmmni = 4096 kernel.sem = 250 32000 100 128 net.ipv4.ip_local_port_range = 9000 65500 net.core.rmem_default = 262144 net.core.rmem_max = 4194304 net.core.wmem_default = 262144 net.core.wmem_max = 1048576 [root@rac1 ~] 
  | 
 
1 2 3 4 5 6 7 
  | [root@rac1 ~] [root@rac1 ~] [root@rac1 ~] [root@rac1 ~] [root@rac1 ~] [root@rac1 ~] [root@rac1 ~] 
  | 
 
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 
  | [root@rac1 ~] [oracle@rac1 ~]$ tail -n 8 .bash_profile export ORACLE_BASE=/u01/app/oracle export ORACLE_SID=racdb1 export ORACLE_HOME=$ORACLE_BASE/product/11.2.0/db_1 export LD_LIBARY_PATH=$ORACLE_HOME/lib export PATH=$ORACLE_HOME/bin:$PATH export ORACLE_UNQNAME=racdb umask 022 [oracle@rac1 ~]$ source .bash_profile [oracle@rac1 ~]$ echo $ORACLE_HOME /u01/app/oracle/product/11.2.0/db_1 [oracle@rac1 ~]$ echo $ORACLE_BASE /u01/app/oracle [oracle@rac1 ~]$ [grid@rac1 ~]$ tail -n 7 .bash_profile export ORACLE_BASE=/u01/app/grid export ORACLE_SID=+ASM1 export ORACLE_HOME=/u01/app/11.2.0/grid export LD_LIBRARY_PATH=$ORACLE_HOME/lib export PATH=$ORACLE_HOME/bin:$PATH umask 022 [grid@rac1 ~]$ source .bash_profile [gr[grid@rac1 ~]$ echo $ORACLE_BASE /u01/app/grid [grid@rac1 ~]$ echo $ORACLE_HOME /u01/app/11.2.0/grid [grid@rac1 ~]$ [grid@rac2 ~]$ grep SID .bash_profile export ORACLE_SID=+ASM2 [grid@rac2 ~]$ [oracle@rac2 ~]$ grep SID .bash_profile export ORACLE_SID=racdb2 [oracle@rac2 ~]$ 以上步骤第二个节点也同样操作 
  | 
 
1 2 3 4 5 
  | [grid@rac1 sshsetup]$ pwd /software/grid/sshsetup [grid@rac1 sshsetup]$ ./sshUserSetup.sh -user grid -hosts rac2.example.com -advanced -exverify -confirm -noPromptPassphrase [grid@rac1 sshsetup]$ 
  | 
 

  
检查下会不会出现下面的情况。如果出现在测试SSH互信时会包INS-06006的错误



  
安装cvuqdisk,发现共享磁盘(rac1和rac2都安装)
1 2 3 
  | [root@rac1 ~] [root@rac1 software] [root@rac1 software] 
  | 
使用UDEV绑定ASM
1 2 3 4 5 6 7 
  | declare -i num =0 for i in b c d; do let num=$num+1 echo "KERNEL==\"sd*\", BUS==\"scsi\", PROGRAM==\"/sbin/scsi_id --whitelisted --replace-whitespace --device=/dev/\$name\", RESULT==\"`/sbin/scsi_id --whitelisted --replace-whitespace --device=/dev/sd$i`\", NAME=\"asm-disk$num\", OWNER=\"grid\", GROUP=\"asmadmin\", MODE=\"0660\"" >> /etc/udev/rules.d/12-oracle-asmdevices.rules done 
  | 
 
1 2 3 4 
  | [root@rac1 ~] Starting udev:                                             [  OK  ] [root@rac1 ~] 
  | 
 
1 2 3 4 5 
  | KERNEL=="sd*", BUS=="scsi", PROGRAM=="/sbin/scsi_id --whitelisted --replace-whitespace --device=/dev/$name", RESULT=="1IET_00010001", NAME="asm-disk1", OWNER="grid", GROUP="asmadmin", MODE="0660" KERNEL=="sd*", BUS=="scsi", PROGRAM=="/sbin/scsi_id --whitelisted --replace-whitespace --device=/dev/$name", RESULT=="1IET_00020002", NAME="asm-disk2", OWNER="grid", GROUP="asmadmin", MODE="0660" KERNEL=="sd*", BUS=="scsi", PROGRAM=="/sbin/scsi_id --whitelisted --replace-whitespace --device=/dev/$name", RESULT=="1IET_00030003", NAME="asm-disk3", OWNER="grid", GROUP="asmadmin", MODE="0660" 
  | 
 

固化磁盘后Linux就无法读取到/dev/sdb、/dev/sdc、/dev/sdd了,在rac1上固化磁盘后,rac2上也需要执行下上面的脚本,注意匹配PROGRAM的值,需要与节点一的相同
注:如果使用UDEV绑定磁盘后所属组不是asmadmin,需要手动改下,否则在创建数据库(dbca)时会报错,如下图:

1 2 
  | [root@rac1 ~] [root@rac1 ~] 
  | 
更改后:
  
安装Grid
用grid用户登录
1 2 
  | [grid@rac1 grid]$ export DISPLAY=192.168.100.251:0.0 [grid@rac1 grid]$ ./runInstaller 
  | 



  
需要在/etc/hosts里面解析racscan,否则会报:
1 
  | [INS-40718] Single Client Access Name (SCAN):RACSCAN1 could not be resolved. 
  | 
但racscan不是真实存在的地址
  
Test和Setup均能通过
  
1 
  | [INS-40912] Virtual host name: rac1-vip is assigned to another system on the network. 
  | 
 
参考:https://community.oracle.com/thread/2594182
如果出现上面的报错信息,看下/etc/hosts里面的Virtual IP是否存在,ping一下
  





  
在grid用户的环境变量,$ORACLE_HOME  不能是  $ORACLE_BASE  子目录,否则会报错:
1 
  | ORACLE 11G RAC [INS-32026] The Software Location specified should not 
  | 

  
因为这里使用是的udev绑定,这个警告可以忽略

  
1 2 3 4 5 6 7 8 9 
  | [root@rac1 ~] Changing permissions of /u01/app/oraInventory. Adding read,write permissions for group. Removing read,write,execute permissions for world. Changing groupname of /u01/app/oraInventory to oinstall. The execution of the script is complete. [root@rac1 ~] [root@rac1 ~] 
  | 
 
执行上面这个脚本如果报下面的错误

解决方法:
再执行/u01/app/11.2.0/grid/root.sh就没问题了
执行过程如下:
rac1_u01_app_11.2.0_grid_root.sh执行过程
rac1_u01_app_11.2.0_grid_root.sh执行过程
在rac1和rac2上都执行下↑↑↑(/u01/app/oraInventory/orainstRoot.sh和/u01/app/11.2.0/grid/root.sh)
在Install Product过程中如果出现下面的情况:(忽略)
  
如果出现这个错误且能ping通racscan地址(192.168.100.246),则可忽略


  
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 
  | [grid@rac1 grid]$ crsctl check crs CRS-4638: Oracle High Availability Services is online CRS-4537: Cluster Ready Services is online CRS-4529: Cluster Synchronization Services is online CRS-4533: Event Manager is online [grid@rac1 grid]$ olsnodes -n rac1	1 rac2	2 [grid@rac1 grid]$ crsctl check ctss CRS-4701: The Cluster Time Synchronization Service is in Active mode. CRS-4702: Offset (in msec): 0 [grid@rac1 grid]$ srvctl status asm -a ASM is running on rac2,rac1 ASM is enabled. [grid@rac1 grid]$ ocrcheck Status of Oracle Cluster Registry is as follows :   Version                  :          3   Total space (kbytes)     :     262120   Used space (kbytes)      :       2592   Available space (kbytes) :     259528   ID                       : 1975731354   Device/File Name         :       +CRS                                     Device/File integrity check succeeded                                     Device/File not configured                                     Device/File not configured                                     Device/File not configured                                     Device/File not configured   Cluster registry integrity check succeeded   Logical corruption check bypassed due to non-privileged user [grid@rac1 grid]$ crsctl query css votedisk --  -----    -----------------                --------- ---------  1. ONLINE   65b9ef5913044fdbbff0d1b75e91172e (/dev/asm-disk1) [CRS] Located 1 voting disk(s). [grid@rac1 grid]$ 
  | 
 
创建ASM磁盘
grid用户登录
1 2 3 4 
  | [grid@rac1 ~]$ export DISPLAY=192.168.100.251:0.0 [grid@rac1 ~]$ xhost + access control disabled, clients can connect from any host [grid@rac1 ~]$ asmca 
  | 


  
安装database
用Oracle用户登录
1 2 3 4 5 
  | [oracle@rac1 ~]$ export DISPLAY=192.168.100.251:0.0 [oracle@rac1 ~]$ xhost + access control disabled, clients can connect from any host [oracle@rac1 ~]$ cd /software/database [oracle@rac1 database]$ ./runInstaller 
  | 


  
做Oracle用户SSH互信(按照之前grid用户一样做互信)
1 
  | [oracle@rac1 sshsetup]$ ./sshUserSetup.sh -user oracle -hosts rac2.example.com -advanced -exverify -confirm -noPromptPassphrase 
  | 






  
节点rac1和节点rac2都执行下面的脚本
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 
  | [root@rac1 ~] Performing root user operation for Oracle 11g The following environment variables are set as:     ORACLE_OWNER= oracle     ORACLE_HOME=  /u01/app/oracle/product/11.2.0/db_1 Enter the full pathname of the local bin directory: [/usr/local/bin]: The contents of "dbhome" have not changed. No need to overwrite. The contents of "oraenv" have not changed. No need to overwrite. The contents of "coraenv" have not changed. No need to overwrite. Entries will be added to the /etc/oratab file as needed by Database Configuration Assistant when a database is created Finished running generic part of root script. Now product-specific root actions will be performed. Finished product-specific root actions. [root@rac1 ~] 
  | 

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 
  | [root@rac1 ~] [grid@rac1 ~]$ crs_stat  -t -v Name           Type           R/RA   F/FT   Target    State     Host         ---------------------------------------------------------------------- ora.CRS.dg     ora....up.type 0/5    0/     ONLINE    ONLINE    rac1         ora.FRA.dg     ora....up.type 0/5    0/     ONLINE    ONLINE    rac1         ora....ER.lsnr ora....er.type 0/5    0/     ONLINE    ONLINE    rac1         ora....N1.lsnr ora....er.type 0/5    0/0    ONLINE    ONLINE    rac1         ora....DATA.dg ora....up.type 0/5    0/     ONLINE    ONLINE    rac1         ora.asm        ora.asm.type   0/5    0/     ONLINE    ONLINE    rac1         ora.cvu        ora.cvu.type   0/5    0/0    ONLINE    ONLINE    rac1         ora.gsd        ora.gsd.type   0/5    0/     OFFLINE   OFFLINE               ora....network ora....rk.type 0/5    0/     ONLINE    ONLINE    rac1         ora.oc4j       ora.oc4j.type  0/1    0/2    ONLINE    ONLINE    rac1         ora.ons        ora.ons.type   0/3    0/     ONLINE    ONLINE    rac1         ora....SM1.asm application    0/5    0/0    ONLINE    ONLINE    rac1         ora....C1.lsnr application    0/5    0/0    ONLINE    ONLINE    rac1         ora.rac1.gsd   application    0/5    0/0    OFFLINE   OFFLINE               ora.rac1.ons   application    0/3    0/0    ONLINE    ONLINE    rac1         ora.rac1.vip   ora....t1.type 0/0    0/0    ONLINE    ONLINE    rac1         ora....SM2.asm application    0/5    0/0    ONLINE    ONLINE    rac2         ora....C2.lsnr application    0/5    0/0    ONLINE    ONLINE    rac2         ora.rac2.gsd   application    0/5    0/0    OFFLINE   OFFLINE               ora.rac2.ons   application    0/3    0/0    ONLINE    ONLINE    rac2         ora.rac2.vip   ora....t1.type 0/0    0/0    ONLINE    ONLINE    rac2         ora.scan1.vip  ora....ip.type 0/0    0/0    ONLINE    ONLINE    rac1         [grid@rac1 ~]$ 
  | 
创建数据库
创建数据库:(用oracle用户登录)
1 2 3 4 
  | [oracle@rac1 ~]$ export DISPLAY=192.168.100.251:0.0 [oracle@rac1 ~]$ xhost + access control disabled, clients can connect from any host [oracle@rac1 ~]$ dbca 
  | 








  
如果SGA and PGA内存不够在安装数据库时会出现下面的错误:
1 2 
  | ORA-00838: Specified value of MEMORY_TARGET is too small, needs to be at least 1408M ORA-01078:failure in processing system parameters 
  | 
参考:http://yfshare.blog.51cto.com/8611708/1671927  







  
如果安装到85%报上面这个错,查看到闪回区没空间了

解决方法:删除多余的归档文件,或设置较大的db_recovery_file_dest_size  
  
检查集群运行状态
1 2 
  | srvctl status database -d racdb crs_stat -t -v 
  | 


  
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 
  | [oracle@rac1 ~]$ lsnrctl status LSNRCTL for Linux: Version 11.2.0.4.0 - Production on 11-JUL-2015 01:07:17 Copyright (c) 1991, 2013, Oracle.  All rights reserved. Connecting to (ADDRESS=(PROTOCOL=tcp)(HOST=)(PORT=1521)) STATUS of the LISTENER ------------------------ Alias                     LISTENER Version                   TNSLSNR for Linux: Version 11.2.0.4.0 - Production Start Date                10-JUL-2015 15:38:11 Uptime                    0 days 9 hr. 29 min. 6 sec Trace Level               off Security                  ON: Local OS Authentication SNMP                      OFF Listener Parameter File   /u01/app/11.2.0/grid/network/admin/listener.ora Listener Log File         /u01/app/grid/diag/tnslsnr/rac1/listener/alert/log.xml Listening Endpoints Summary...   (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=LISTENER)))   (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.100.244)(PORT=1521)))   (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.100.241)(PORT=1521))) Services Summary... Service "+ASM" has 1 instance(s).   Instance "+ASM1", status READY, has 1 handler(s) for this service... Service "racdb" has 1 instance(s).   Instance "racdb1", status READY, has 1 handler(s) for this service... Service "racdbXDB" has 1 instance(s).   Instance "racdb1", status READY, has 1 handler(s) for this service... The command completed successfully [oracle@rac1 ~]$ [oracle@rac1 ~]$ sqlplus / as sysdba SQL> select status from v$instance; STATUS ------------ OPEN SQL> 
  | 
 
配置EM管理器
如果启动EM web界面管理工具,出现下面的问题,则使用emca -config dbcontrol db重建EM

1 2 
  | [oracle@rac1 ~]$ emca -deconfig dbcontrol db -cluster 
  | 
删除集群EM执行过程
emctl命令  
重建EM
用oracle用户登录
1 2 3 4 
  | [oracle@rac1 ~]$ export DISPLAY=192.168.100.251:0.0 [oracle@rac1 ~]$ xhost + access control disabled, clients can connect from any host [oracle@rac1 ~]$ dbca 
  | 










注:
1 2 3 
  | [oracle@rac1 ~]$emctl status dbconsole Environment variable ORACLE_UNQNAME not defined. Please set ORACLE_UNQNAME to database unique name. [oracle@rac1 ~]$ 
  | 
如果报上面的错误,是oracle用户的ORACLE_UNQNAME环境变量未设置。如果报下面的错误,则是ORACLE_UNQNAME设置不正确

正确设置


oracle RAC集群默认是开机自启。根据配置不同,花费的时间不一样  
配置PL/SQL
使用PL/SQL登录oracle RAC
安装plsql+ora10client
plsql+instantclient_11_2安装包
C:\Ora10InstantClient\network\admin\tnsnames.ora(在ora10client安装目录下新建network/admin目录,在RAC服务器上把$ORACLE_HOME/network/admin/tnsnames.ora文件放到ora10client安装目录下的admin目录下)
1 2 3 
  | [root@rac1 ~] 192.168.100.246	racscan [root@rac1 ~] 
  | 
格式为:把HOST = racscan改为HOST = racscan对应的地址
1 2 3 4 5 6 7 8 9 
  | RACDB =   (DESCRIPTION =     (ADDRESS = (PROTOCOL =TCP)(HOST = 192.168.100.246)(PORT= 1521))     (CONNECT_DATA =       (SERVER = DEDICATED)       (SERVICE_NAME = racdb)     )   ) 
  | 
打开PL/SQL—>Tools—>Preferences—>Connection


  
本文出自”Jack Wang Blog”:http://www.yfshare.vip/2017/04/12/部署Oracle-RAC集群/