TWSP
TSRI:RANK=SMALL;f. 如果无法CPT连接,REBOOT主用侧,做个切换,然后再尝试CPT连接。 4.2 一个NODE DOWN 关电为最后选择,一般不进行。a. TELNET到此NODE.b. LOCAL CONSOLE到该NODE,参考以下“CLUSTER STOP”处理c. 如果以上都失败,登录上ACTIVE的NODE,做fcc_reset other。 4.3 AP2问题导致计费失败 在R10中,采取GOH&RTR方式AP中检查CDH, destination set:cdhdsls -l BILLDESTSETcdhls –l billingdest !check destination!cdhls –p billingdest !path list! CDHVER检查DEST连接是否正常;Cdhver billingdest 检查CP和AP间的接口:<APAMP;AP MAINTENANCE DATADIRECTORY ADDRESS DATAAP NODE LAN IP PORT STATUS CATEGORY1 A 1 192.168.169.1 14000 PASSIVE1 A 2 192.168.170.1 14000 ACTIVE1 B 2 192.168.170.2 14000 ACTIVE1 B 1 192.168.169.2 14000 PASSIVE2 A 1 192.168.169.3 14000 ACTIVE2 A 2 192.168.170.3 14000 PASSIVE2 B 2 192.168.170.4 14000 ACTIVE2 B 1 192.168.169.4 14000 PASSIVE确认CP-AP状态OPEN: CHOPP;CHOIP;CP SAE检查SAAEP:SAE=500,BLOCK=CHOP;AP2 检查 MESSAGE STOREmsdls -m CHS -s cp0ex –a子文件检查:Y:\ACS\data\RTR\CHS_cp0ex\dataFiles\Reported>AFP功能检查Afpls –ls rtrfiles人工传送Afpfti –f rtrfiles 4.4 进程死亡 cluster res 来确认进程为ONLINE,OFFLINE,或FAILURE分以下两类情形处理: 在单侧:a. 检查死亡或停止的进程在ACTIVE或PASSIVE侧,若在ACTIVE侧,做切边(prcboot),让问题的NODE为PASSIVE状态;b. 人工启动进程.cluster res <process name> /on /wait 在双侧:a. 启动现在的PASSIVE侧,试图启动死亡的进程;b. 再次登陆启动的NODE,如果CLUSTER正常,检查进程是否online,若依旧为offline,尝试人工启动;c. 若已是online,则在执行侧启动(FAILOVER),使原PASSIVE侧成为ACTIVE侧,以下遵循“在单侧”的方法。 4.5 cluster停止 类似于NODE DOWN.a. 如何判断cluster downprcstate 显示状态为UNDEFINED.b. 检查cluster servicenet start检查SERVICE是否正常启动首先为clussvc,其次为ACS_PRC_LBBc. 如果以上SERVICE已经启动,则尝试再次PRCBOOT.d. 如果SERVICE未启动,尝试人工启动:net start "Cluster Server" net start "ACS_PRC_LBB" e. 如果能够正常启动,CLUSTER应会稍后正常(UP)f. 如果不能启动,再次启动NODE(prcboot),如仍无效,呼叫爱立信工程师。 4.6 关于RAID的处理 主要可能为“MIRROR NOT REDUNDANT”的告警。a. 如何判断raidutil –L logical 可以检查RAID的状态正常情况下,在执行侧,我们应看到:Logical ViewAddress Type Manufacturer/Model Capacity Status---------------------------------------------------------------------------d0b0t0d0 RAID 1 (Mirrored) DPT RAID-1 17522MB Optimald0b0t0d0 Disk Drive (DASD) FUJITSU MAP3367NP 17522MB Optimald0b1t0d0 Disk Drive (DASD) FUJITSU MAP3367NP 17522MB Optimald0b0t1d0 RAID 1 (Mirrored) DPT RAID-1 17522MB Optimald0b0t1d0 Disk Drive (DASD) FUJITSU MAP3367NP 17522MB Optimald0b1t1d0 Disk Drive (DASD) FUJITSU MAP3367NP 17522MB Optimald0b0t2d0 RAID 1 (Mirrored) DPT RAID-1 17522MB Optimald0b0t2d0 Disk Drive (DASD) FUJITSU MAP3367NP 17522MB Optimald0b1t2d0 Disk Drive (DASD) FUJITSU MAP3367NP 17522MB Optimal 在PASSIVE侧,我们应看到:Logical ViewAddress Type Manufacturer/Model Capacity Status---------------------------------------------------------------------------d0b0t0d0 RAID 1 (Mirrored) DPT RAID-1 17522MB Drive Failed d0b0t0d0 Disk Drive (DASD) FUJITSU MAP3367NP 0MB Optimal d0b1t0d0 Disk Drive (DASD) FUJITSU MAP3367NP 0MB Optimald0b0t1d0 RAID 1 (Mirrored) DPT RAID-1 17522MB Drive Failed d0b0t1d0 Disk Drive (DASD) FUJITSU MAP3367NP 0MB Optimal d0b1t1d0 Disk Drive (DASD) FUJITSU MAP3367NP 0MB Optimald0b0t2d0 RAID 1 (Mirrored) DPT RAID-1 17522MB Drive Failed d0b0t2d0 Disk Drive (DASD) FUJITSU MAP3367NP 0MB Optimal d0b1t2d0 Disk Drive (DASD) FUJITSU MAP3367NP 0MB Optimal 如和以上不符,应考虑修复RAID. b. 对于degraded状态的DISK,显示Logical ViewAddress Type Manufacturer/Model Capacity Status---------------------------------------------------------------------------d0b0t0d0 RAID 1 (Mirrored) DPT RAID-1 17522MB Degraded d0b1t0d0 Disk Drive (DASD) FUJITSU MAN3184MP 17522MB Optimal d0b0t0d0 Disk Drive (DASD) FUJITSU MAN3184MP 17522MB Failed drived0b0t1d0 RAID 1 (Mirrored) DPT RAID-1 17522MB Degraded d0b1t1d0 Disk Drive (DASD) FUJITSU MAN3184MP 17522MB Optimal d0b0t1d0 Disk Drive (DASD) FUJITSU MAN3184MP 17522MB Failed drived0b0t2d0 RAID 1 (Mirrored) DPT RAID-1 17522MB Degraded d0b1t2d0 Disk Drive (DASD) FUJITSU MAN3184MP 17522MB Optimal d0b0t2d0 Disk Drive (DASD) FUJITSU MAN3184MP 17522MB Failed drive 在执行侧,使用命令raidutil -a rebuild d#b#t#d#遵循OPI “AP Fault” c. 另一可选择的方式为利用DPT Storage Manager如何进入:Start -> Programs -> DPT Storage Manager -> DPT Storage Manager v2.17 选择Logical Configuration View双击RAID-1的图标,状态域中应显示degraded点击REBUILD的图标,启动RAID的重建磁盘将显示白旗表示处于重建进程中;当重建完成,白旗将消失,阵列的状态将为OPTIMAL 4.7 单侧关电的操作 在某些情况下(如RAID问题,OS NOT FOUND(蓝屏),NODE DOWN),可能需要单侧关电(不考虑客户操作双侧关电),对于此类判断,通常应由爱立信工程师指导,此处只是说明基本过程。a. 如是ACTIVE侧需关电,通过切换,使之成为PASSIVE;如已是PASSIVE,则忽略;b. 检查RAID状态,确认ACTIVE侧的RAID状态为OPTIMAL.c. 在ACTIVE侧,执行fcc_save_to_remove other 检查命令是否执行,另一侧是否关闭; 如成功执行后,等待数分钟后检查需关电侧的MIA灯是否已亮,如亮,即可关电。 开电恢复后,还需要在 Active node执行 fcc_integrate other 来恢复磁盘镜像。 4.8 对APG的CP负荷高的处理 当发现某些操作较慢时,可能是某个进程(或RESOURCE)占用了较多CPU的资源,首先定位进程:1.PSTAT 发现是否有进程占用时间太长,如CMD.EXE,例子如下:User Time Kernel Time Ws Faults Commit Pri Hnd Thd Pid Name 0:00:03.635 0:00:03.404 3908 1153 2440 8 252 5 861 mml.exe 0:00:00.020 0:00:00.060 3348 847 2216 8 199 1 525 aploc.exe 2:50:04.573 5:33:00.460 1512 334950256 516 8 181 1 854 CMD.EXE 显然CMD.EXE占用时间较长。 2.通过PCANYWHERE,进入NT,直接RUN:TASKMGR,通过选择CPU部分,可以看到占用的百分比,在通常情况下,CPU IDLE PROCESS占用50左右,如无IDLE PROCESS,则在此列中最高显示的即为吊住的进程。此法直接,推荐。 处理方法如下:1.对于AP相关的进程,如MCS_MTS_ADM等,(CLUSTER RES可显示),可以通过cluster res mcs_mts_adm /off /wait 再 cluster res mcs_mts_adm /on /wait 对进程重启。 2.对于非AP相关的进程,如cmd.exe,ghost32.exe,explore.exe等,爱立信不建议采用cohen(类似UNIX的kill)来杀死进程,在PASSIVE NODE无异常的情况下,prcboot做NODE的切换更合适 | 通信人家园 (https://test.txrjy.com/) | Powered by C114 |