用43p140實現最簡單的hacmp環境（二）

04-12

要利用兩台140實現OPS環境首先需要實現concurrent環境。還是利用以上的硬體環境，對hacmp作如下設置，hacmp軟體版本為hacmp escrm 4.4.1。

　　1。SERVICE網卡

　　對於concurrent環境，傳統的做法是通過3個rg組成，其中兩個rg為casCADing方式，只包含一個svc ip地址。另一個rg為concurrent環境，包含了concurrent vg。其實現在oracle提供的OPS和RAC HACMP配置建議中，就非常簡單，網卡直接設為SVC地址，連standby和boot都不用配。於是把兩台140的內置網卡設為nodea_svc,nodeb_svc，在hacmp的toplogy中只設了兩個對應的adapter。

　　2。concurrent vg

　　手冊上說concurrent模式要求concurrent vg的硬碟由ssa硬碟或者raid硬碟組成，而現在共享硬碟只是一塊獨立的scsi硬碟，這能行嗎？帶著這個疑問還是繼續測試下去。首先建立sharevg，對於ssa硬碟可以把vg建為concurrent capable，而對於其它raid硬碟就不能設concurrent capable為yes，raid硬碟是通過hacmp來實現concurrent共享。基於此，在建sharevg時沒有設concurrent capable

　　3。concurrent rg

　　配置好sharevg，兩個節點兩邊同步後，在hacmp中建了一個concurrent 模式的rg，包含了只包含一個sharevg。至於app，就沒有配，主要想先測試好concurrent環境，app等oracle安裝後再配置也不遲。

　　4。關鍵部分

　　以下是整個調試過程的關鍵部分，在ha兩邊同步順利後，兩邊開始啟動ha，由於網卡已開始就設為svc，所以就看不到boot到svc的改變。用lsvg -o 檢查共享vg是否varyon，發現沒有。檢查hacmp.out文件看到如下的錯誤：

　　。。。

　　cl_raid_vg[97] cl_raid_vg[97] lsdev -Cc disk -l hdisk1 -F type

　　DEVTYPE=scsd

　　cl_raid_vg[103] grep -qw scsd /usr/es/sbin/cluster/diag/clconraid.dat

　　cl_raid_vg[106] THISTYPE=disk

　　cl_raid_vg[106] [[ -z ]]

　　cl_raid_vg[116] FIRSTTYPE=disk

　　cl_raid_vg[123] [[ disk = array ]]

　　cl_raid_vg[128] exit 1

　　cl_mode3[166] cl_log 485 cl_mode3: Failed concurrent varyon of sharevg

　　cl_log[50] version=1.9

　　cl_log[92] SYSLOG_FILE=/usr/es/adm/cluster.log

　　*******

　　Aug 1 2003 17:42:24 !!!!!!!!!! ERROR !!!!!!!!!!

　　*******

　　Aug 1 2003 17:42:24 cl_mode3: Failed concurrent varyon of sharevg because it is not made up of known RAID devices.

　　cl_mode3[168] STATUS=1

　　cl_mode3[217] exit 1

　　。。。

　　看來手冊沒有騙我，不過不能就此放棄。要知道hacmp其實是通過腳本和事件來實現，看來得對腳本作點手腳了。

　　在hacmp系統目錄中.../utils中存儲了很多運行腳本，其中與現在問題有關的是cl_mode3腳本。以下是該腳本全文(貼出來是希望大家也能看看)：

　　#!/bin/ksh

　　# IBM_PROLOG_BEGIN_TAG

　　# This is an automatically generated prolog.

　　# Licensed Materials - Property of IBM

　　# (C) COPYRIGHT International Business Machines Corp. 1990,2001

　　# US Government Users Restricted Rights - Use, duplication or

　　# disclosure restricted by GSA ADP Schedule Contract with IBM Corp.

　　# IBM_PROLOG_END_TAG

　　# @(#)27 1.9 src/43haes/usr/sbin/cluster/events/utils/cl_mode3.sh, hacmp.events, 43haes_rmo2, rmo2s01b 5/31/01 16:36:46

　　###################

　　# COMPONENT_NAME: EVENTUTILS

　　# FUNCTIONS: none

　　###################

　　# Name: cl_mode3

　　# Returns:

　　# 0 - All of the volume groups are successfully varIEd on/changed mode

　　# 1 - varyonvg/mode change of at least one volume group failed

　　# 2 - Zero arguments were passed

　　# This function will place the volume groups passed in as arguments in

　　# the designated mode .

　　# Arguments: -s Varyon volume group in mode 3 with sync

　　# -n Varyon volume group in mode 3 without sync

　　# Environment: VERBOSE_LOGGING, PATH

　　###################

　　PROGNAME=$(basename )

　　export PATH="$($(dirname )/../../utilities/cl_get_path all)"

　　[[ "$VERBOSE_LOGGING" = "high" ]] &&set -x

　　[[ "$VERBOSE_LOGGING" = "high" ]] &&version="1.9"

　　HA_DIR="$(cl_get_path)"

　　if (( $# <2 )) ;then

　　# Caller used incorrect syntax

　　cl_echo 204 "usage: $PROGNAME [-n | -s] volume_groups_to_varyon" $PROGNAME

　　exit 2

　　if [[ = "-n" ]] ;then # sync or no sync

　　SYNCFLAG="-n"

　　else

　　SYNCFLAG="" # LVM default is "sync"

　　if [[ -z $ ]] ;then

　　EMULATE="REAL"

　　STATUS=0

　　set -u

　　# Get volume groups, past the sync|nosync flag

　　shift

　　for vg in $*

　　VGID=$(/usr/sbin/getlvodm -v $vg)

　　# Check to see if this volume group is already vary"d on

　　if lsvg -o | fgrep -s -x "$vg" ;then

　　# Note this and keep going. This could happen legitimately on a

　　# node up after a forced down.

　　# Find out if its vary"d on in concurrent mode

　　if [[ 0 = $(lqueryvg -g $VGID -C) ]] ;then

　　# No, its not. Now, find out if its defined as concurrent capable

　　if [[ 0 = $(lqueryvg -g $VGID -X) ]] ;then

　　# We get here in the case where the volume group is

　　# vary"d on, but not in concurrent mode, and is not

　　# concurrent capable. This would be the case for a SCSI

　　# RAID disk used in concurrent mode.

　　if ! cl_raid_vg $vg ;then

　　# This volume group is not made up of known RAID devices

　　cl_log 485 "$PROGNAME: Failed concurrent varyon of $vg

　　because it is not made up of known RAID devices." $PROGNAME $vg

　　STATUS=1

　　continue

　　else

　　# For some obscure reason, the volume group that

　　# we want to vary on in concurrent mode is

　　# already vary"d on, in non-concurrent mode.

　　cl_echo 200 "$PROGNAME: Volume Group "$vg" in non-concurrent mode." $PROGNAME $vg

　　# Try to recover by varying it off, to be vary"d on in

　　# concurrent mode below.

　　if [[ $EMULATE = "REAL" ]] ;then

　　if ! varyoffvg $vg

　　then

　　# Unable to vary off the volume group - probably because

　　# its in use. Note error and keep going

　　cl_log 203 "$PROGNAME: Failed varyonvg $SYNCFLAG -c of $vg." $PROGNAME $SYNCFLAG $vg

　　STATUS=1

　　continue

　　else

　　cl_echo 3020 "NOTICE The following command was not executed "

　　echo "varyoffvg $vg"

　　# At this point, the volume group was vary"d off. The

　　# flow takes over below, and vary"s on the volume group

　　# in concurrent mode.

　　else

　　# Since the volume group is already vary"d on in

　　# concurrent mode, there is really nothing more to do

　　# with it. Go on to the next one.

　　continue

　　# Find out whether LVM thinks this volume group is concurrent

　　# capable. Note that since the volume group is not vary"d on at this

　　# point in time, we have to look directly at the VGDA on the

　　# hdisks in the volume group.

　　export MODE

　　for HDISK in $(/usr/sbin/getlvodm -w $VGID | cut -d" " -f2) ;do

　　# Check each of the hdisks for a valid mode value. Stop at the

　　# first one we find.

　　if MODE=$(lqueryvg -p $HDISK -X) ;then

　　break

　　done

　　if [[ -z $MODE ]] ;then

　　# If we couldn"t pull a valid mode indicator off of any disk in

　　# the volume group, there is no chance whatsoever that LVM

　　# will be able to vary it on. Give up on this one.

　　cl_log 203 "$PROGNAME: Failed varyonvg $SYNCFLAG -c of $vg." $PROGNAME $SYNCFLAG $vg

　　STATUS=1

　　elif [[ $MODE = "0" ]] ;then

　　# LVM thinks that this is not a concurrent capable

　　# volume group. This is the expected result if this is

　　# a RAID device treated as a concurrent device

　　# Check to make sure that this is a known RAID device

　　if cl_raid_vg $vg ;then

　　# If this is a known RAID device, attempt to vary it on

　　# with no reserve, to simulate concurrent mode

　　if ! convaryonvg $vg ;then

　　# It was not possible to vary on this volume

　　# group. Note error and keep going.

　　STATUS=1

　　else

　　# This volume group is not made up of known RAID devices

　　cl_log 485 "$PROGNAME: Failed concurrent varyon of $vg

　　because it is not made up of known RAID devices." $PROGNAME $vg

　　STATUS=1

　　elif [[ $MODE = "32" ]] ;then

　　# LVM thinks that this volume group is defined as concurrent

　　# capable, for the group services based concurrent mode

　　# try to varyon in concurrent with appropriate sync option

　　if [[ $EMULATE = "REAL" ]] ;then

　　if ! varyonvg $SYNCFLAG -c $vg ;then

　　cl_log 203 "$PROGNAME: Failed varyonvg $SYNCFLAG -c of $vg." $PROGNAME $SYNCFLAG $vg

　　# note error and keep going

　　STATUS=1

　　else

　　cl_echo 3020 "NOTICE The following command was not executed "

　　echo "varyonvg $SYNCFLAG -c $vg"

　　else

　　# Anything else ("1" or "16", depending on the level of LVM)

　　# indicates that LVM thinks this volume group is

　　# defined as concurrent capable, for the covert channel based

　　# concurrent mode.

　　if cl_raid_vg $vg ;then

　　# SCSI attached RAID devices are reported as concurrent capable.

　　# If that is what we have here, try the appropriate varyon

　　if ! convaryonvg $vg ;then

　　# It was not possible to vary on this volume

　　# group. Note error and keep going.

　　STATUS=1

　　else

　　# Its not a concurrent capable RAID device. The only remaining

　　# supported choice is covert channel based concurrent mode.

　　if [[ $EMULATE = "REAL" ]] ;then

　　if ! varyonvg $SYNCFLAG -c $vg ;then

　　cl_log 203 "$PROGNAME: Failed varyonvg $SYNCFLAG -c of $vg." $PROGNAME $SYNCFLAG $vg

　　# note error and keep going

　　STATUS=1

　　else

　　cl_echo 3020 "NOTICE The following command was not executed "

　　echo "varyonvg $SYNCFLAG -c $vg"

　　done

　　exit $STATUS

　　讀了下該腳本就知道問題關鍵在於那塊共享硬碟在這腳本中是不被承認為raid硬碟，結果返回1，那我們就對最後作簡單的修改：

　　# add for 140 ha escrm

　　STATUS=0

　　exit $STATUS

　　希望能騙過HA。注意兩個節點同一腳本都要修改。

　　5。重新啟動HA

　　非常令人高興，HA啟動成功，兩邊用lsvg -l sharevg ，能看到同樣內容。