自动化运维、大数据、Docker

MegaCli操作指引

MegaCli 是LSI公司官方提供的SCSI卡管理工具,由于LSI被收购变成了现在的Broadcom,所以现在想下载MegaCli,需要去Broadcom官网查找Legacy产品支持,搜索MegaRAID即可。

现在官方有storcli,整合了LSI和3ware所有产品。但是个人认为Megacli用起来更顺手,而且线上用了几家国产厂商服务器,用Megacli都能管理好RAID,所以换不换无所谓。

查看Adapter 信息:

./MegaCli64 -AdpAllInfo -aALL
返回结果太长很多都看不懂但没关系,新手先记住第一行,表示我的机器上有个0号适配器。MegaCli64很多命令都要在最后用-a指定Adapter,我只有Adapter #0 所以今后都写-a0就行,还可以-a0,1,2或-aALL

Adapter #0
==============================================================================
                    Versions
                ================
Product Name    : PERC H710 Adapter
Serial No       : 31P003R
FW Package Build: 21.1.0-0007
                    Mfg. Data
                ================
Mfg. Date       : 01/26/13
Rework Date     : 01/26/13
Revision No     : A00
Battery FRU     : N/A
...

查看Adapter的具体配置,这台机器插了12块盘,一块做RAID0装系统,剩下的盘做了RAID5:

./MegaCli64 -CfgDsply -aALL
==============================================================================
Adapter: 0
Product Name: PERC H710 Adapter
Memory: 512MB
BBU: Present
Serial No: 31P003R
==============================================================================
Number of DISK GROUPS: 2 #有俩磁盘组
DISK GROUPS: 0 #0号磁盘组
Number of Spans: 1
SPAN: 0
Span Reference: 0x00
Number of PDs: 1
Number of VDs: 1
Number of dedicated Hotspares: 0
Virtual Disk Information:
Virtual Disk: 0 (Target Id: 0)
Name:
RAID Level: Primary-0, Secondary-0, RAID Level Qualifier-0 #做了RAID0
Size:2.728 TB
State: Optimal
Stripe Size: 64 KB
Number Of Drives:1
Span Depth:1
Default Cache Policy: WriteBack, ReadAdaptive, Direct, No Write Cache if Bad BBU
Current Cache Policy: WriteBack, ReadAdaptive, Direct, No Write Cache if Bad BBU
Access Policy: Read/Write
Disk Cache Policy: Disk's Default
Encryption Type: None
Physical Disk Information:
Physical Disk: 0
Enclosure Device ID: 32
Slot Number: 0
Device Id: 0
Sequence Number: 2
Media Error Count: 0
Other Error Count: 0
Predictive Failure Count: 0
Last Predictive Failure Event Seq Number: 0
PD Type: SATA
Raw Size: 2.728 TB [0x15d50a3b0 Sectors]
Non Coerced Size: 2.728 TB [0x15d40a3b0 Sectors]
Coerced Size: 2.728 TB [0x15d400000 Sectors]
Firmware state: Online
SAS Address(0): 0x500056b37789abee
Connected Port Number: 0(path0) 
Inquiry Data:             手动马赛克 #这里是序列号
FDE Capable: Not Capable
FDE Enable: Disable
Secured: Unsecured
Locked: Unlocked
Foreign State: None 
Device Speed: 3.0Gb/s 
Link Speed: 3.0Gb/s 
Media Type: Hard Disk Device

DISK GROUPS: 1 #1号磁盘组
Number of Spans: 1
SPAN: 0
Span Reference: 0x01
Number of PDs: 11 #11块物理盘
Number of VDs: 1 #做成了1块虚拟盘
Number of dedicated Hotspares: 0
Virtual Disk Information:
Virtual Disk: 0 (Target Id: 1)
Name:
RAID Level: Primary-5, Secondary-0, RAID Level Qualifier-3 #做了RAID5
Size:27.285 TB
State: Optimal
Stripe Size: 64 KB
Number Of Drives:11
Span Depth:1
Default Cache Policy: WriteBack, ReadAdaptive, Direct, No Write Cache if Bad BBU
Current Cache Policy: WriteBack, ReadAdaptive, Direct, No Write Cache if Bad BBU
Access Policy: Read/Write
Disk Cache Policy: Disk's Default
Encryption Type: None
Physical Disk Information:
Physical Disk: 0 #第一块物理盘
Enclosure Device ID: 32
Slot Number: 1
Device Id: 1
Sequence Number: 2
Media Error Count: 0
Other Error Count: 0
Predictive Failure Count: 0
Last Predictive Failure Event Seq Number: 0
PD Type: SATA
Raw Size: 2.728 TB [0x15d50a3b0 Sectors]
Non Coerced Size: 2.728 TB [0x15d40a3b0 Sectors]
Coerced Size: 2.728 TB [0x15d400000 Sectors]
Firmware state: Online
SAS Address(0): 0x500056b37789abec
Connected Port Number: 0(path0) 
Inquiry Data:             手动马赛克 #这里是磁盘的序列号,跟磁盘标签一致
FDE Capable: Not Capable
FDE Enable: Disable
Secured: Unsecured
Locked: Unlocked
Foreign State: None 
Device Speed: 3.0Gb/s 
Link Speed: 3.0Gb/s 
Media Type: Hard Disk Device
...

查看每块物理盘的信息和状态,跟前面一样,只是少了Adapter信息。

./MegaCli64 -PDList -a0

Adapter #0
Enclosure Device ID: 32
Slot Number: 0
Device Id: 0
Sequence Number: 2
Media Error Count: 0
Other Error Count: 0
Predictive Failure Count: 0
Last Predictive Failure Event Seq Number: 0
PD Type: SATA
Raw Size: 2.728 TB [0x15d50a3b0 Sectors]
Non Coerced Size: 2.728 TB [0x15d40a3b0 Sectors]
Coerced Size: 2.728 TB [0x15d400000 Sectors]
Firmware state: Online
SAS Address(0): 0x500056b37789abee
Connected Port Number: 0(path0) 
Inquiry Data:             手动马赛克 #这里是磁盘的序列号,跟磁盘标签一致
FDE Capable: Not Capable
FDE Enable: Disable
Secured: Unsecured
Locked: Unlocked
Foreign State: None 
Device Speed: 3.0Gb/s 
Link Speed: 3.0Gb/s 
Media Type: Hard Disk Device
Enclosure Device ID: 32
Slot Number: 1
Device Id: 1
Sequence Number: 2
Media Error Count: 0
Other Error Count: 0
Predictive Failure Count: 0
Last Predictive Failure Event Seq Number: 0
PD Type: SATA
Raw Size: 2.728 TB [0x15d50a3b0 Sectors]
Non Coerced Size: 2.728 TB [0x15d40a3b0 Sectors]
Coerced Size: 2.728 TB [0x15d400000 Sectors]
Firmware state: Online
SAS Address(0): 0x500056b37789abec
Connected Port Number: 0(path0) 
Inquiry Data:             手动马赛克 #这里是磁盘的序列号,跟磁盘标签一致
FDE Capable: Not Capable
FDE Enable: Disable
Secured: Unsecured
Locked: Unlocked
Foreign State: None 
Device Speed: 3.0Gb/s 
Link Speed: 3.0Gb/s 
Media Type: Hard Disk Device
...

这里会拿到很多有用的信息:

1、Slot Number:slot号,应该跟机器外观上的标识一致。如果机器上有多块盘,直接告诉现场工程师slot X的硬盘有问题,工程师就会直接换盘。

2、Inquiry Data: 这里是磁盘的序列号,跟磁盘标签上一致。磁盘标签需要拔盘才能看到,按slot拔盘看到磁盘的序列号应该跟Inquiry Data一致。

3、Firmware state: 这里能看到磁盘的状态,Online是我们期望看到的最好状态,除此之外还有 Unconfigured Offline Failed等等,大多表达一个悲伤的事实:你要加班报修/修复他们了。。。

4、需要特别关注这几个指标:Media Error / Other Error / Predictive Failure Count / Last Predictive Failure Event Seq Number 都有可能不是0。这意味着磁盘虽然能用但已经不再可靠,很有可能存在坏簇、坏道之类的问题,必须尽快换掉这块盘。如果坚持使用,那磁盘就离彻底坏掉不远了。网上流传的说法是前3个Count越大代表磁盘状态越

这个问题专门与服务器RAID卡磁盘厂家沟通,得到的反馈是: 查到之前的资料,Medium error、other error数值的绝对值,不能直接反应硬盘的状态。 根据与RAID卡、硬盘厂家的沟通,建议做法是监控Predictive Failure 的数值,不为零说明硬盘有问题。另外,如果硬盘failed,也可以直接报修。 Predictive Failure Count 指令:storcli /c0/eall/sall show all 监控关键字Predictive Failure Count,标准为不能大于0,若有计数,将对应的硬盘换掉; Predictive Failure中已经涵盖media error,而且比media error的范围更广、更全面。 硬盘的 SMART 子系统已经具备一套完整的算法来评估硬盘的健康状况 SMART 子系统算法会参考硬盘运行时各个方面的参数,media error 是其中一项 SMART 对于 media error 的评估是基于单位时间增长数来计算的 当 SMART 子系统中任何一个评估项达到对应的阈值时,硬盘会报告 Sense Code: 01 5D 00 (FAILURE PREDICTION THRESHOLD EXCEEDED) 遵循 SCSI 协议标准的 host (OS SCSI 子系统,SAS 控制器, RAID 卡等) 可以正确解析出该 Sense Code 综上,由于 media error 已经被硬盘 SMART 子系统所涵盖,并且会依据 SCSI 协议标准上报 predictive failure,所有硬盘部分只需要在Raid卡下监控Predictive Failure就好,标准为不能大于0。

赞(0) 打赏
蜷缩的蜗牛 , 版权所有丨如未注明 , 均为原创丨 转载请注明蜷缩的蜗牛 » MegaCli操作指引
分享到: 更多 (0)

评论 抢沙发

评论前必须登录!

 

觉得文章有用就打赏一下文章作者

支付宝扫一扫打赏

微信扫一扫打赏