• 欢迎访问蜷缩的蜗牛博客 蜷缩的蜗牛
  • 微信搜索: 蜷缩的蜗牛 | 联系站长 kbsonlong@qq.com
  • 如果您觉得本站非常有看点,那么赶紧使用Ctrl+D 收藏吧

MegaCli操作指引

Linux 蜷缩的蜗牛 3个月前 (07-20) 133次浏览 已收录 0个评论

MegaCli 是 LSI 公司官方提供的 SCSI 卡管理工具,由于 LSI 被收购变成了现在的 Broadcom,所以现在想下载 MegaCli,需要去 Broadcom 官网查找 Legacy 产品支持,搜索 MegaRAID 即可。

现在官方有 storcli,整合了 LSI 和 3ware 所有产品。但是个人认为 Megacli 用起来更顺手,而且线上用了几家国产厂商服务器,用 Megacli 都能管理好 RAID,所以换不换无所谓。

查看 Adapter 信息:

./MegaCli64 -AdpAllInfo -aALL
返回结果太长很多都看不懂但没关系,新手先记住第一行,表示我的机器上有个 0 号适配器。MegaCli64 很多命令都要在最后用-a 指定 Adapter,我只有 Adapter #0 所以今后都写-a0 就行,还可以-a0,1,2 或-aALL

Adapter #0
==============================================================================
                    Versions
                ================
Product Name    : PERC H710 Adapter
Serial No       : 31P003R
FW Package Build: 21.1.0-0007
                    Mfg. Data
                ================
Mfg. Date       : 01/26/13
Rework Date     : 01/26/13
Revision No     : A00
Battery FRU     : N/A
...

查看 Adapter 的具体配置,这台机器插了 12 块盘,一块做 RAID0 装系统,剩下的盘做了 RAID5:

./MegaCli64 -CfgDsply -aALL
==============================================================================
Adapter: 0
Product Name: PERC H710 Adapter
Memory: 512MB
BBU: Present
Serial No: 31P003R
==============================================================================
Number of DISK GROUPS: 2 #有俩磁盘组
DISK GROUPS: 0 #0 号磁盘组
Number of Spans: 1
SPAN: 0
Span Reference: 0x00
Number of PDs: 1
Number of VDs: 1
Number of dedicated Hotspares: 0
Virtual Disk Information:
Virtual Disk: 0 (Target Id: 0)
Name:
RAID Level: Primary-0, Secondary-0, RAID Level Qualifier-0 #做了 RAID0
Size:2.728 TB
State: Optimal
Stripe Size: 64 KB
Number Of Drives:1
Span Depth:1
Default Cache Policy: WriteBack, ReadAdaptive, Direct, No Write Cache if Bad BBU
Current Cache Policy: WriteBack, ReadAdaptive, Direct, No Write Cache if Bad BBU
Access Policy: Read/Write
Disk Cache Policy: Disk's Default
Encryption Type: None
Physical Disk Information:
Physical Disk: 0
Enclosure Device ID: 32
Slot Number: 0
Device Id: 0
Sequence Number: 2
Media Error Count: 0
Other Error Count: 0
Predictive Failure Count: 0
Last Predictive Failure Event Seq Number: 0
PD Type: SATA
Raw Size: 2.728 TB [0x15d50a3b0 Sectors]
Non Coerced Size: 2.728 TB [0x15d40a3b0 Sectors]
Coerced Size: 2.728 TB [0x15d400000 Sectors]
Firmware state: Online
SAS Address(0): 0x500056b37789abee
Connected Port Number: 0(path0) 
Inquiry Data:             手动马赛克 #这里是序列号
FDE Capable: Not Capable
FDE Enable: Disable
Secured: Unsecured
Locked: Unlocked
Foreign State: None 
Device Speed: 3.0Gb/s 
Link Speed: 3.0Gb/s 
Media Type: Hard Disk Device

DISK GROUPS: 1 #1 号磁盘组
Number of Spans: 1
SPAN: 0
Span Reference: 0x01
Number of PDs: 11 #11 块物理盘
Number of VDs: 1 #做成了 1 块虚拟盘
Number of dedicated Hotspares: 0
Virtual Disk Information:
Virtual Disk: 0 (Target Id: 1)
Name:
RAID Level: Primary-5, Secondary-0, RAID Level Qualifier-3 #做了 RAID5
Size:27.285 TB
State: Optimal
Stripe Size: 64 KB
Number Of Drives:11
Span Depth:1
Default Cache Policy: WriteBack, ReadAdaptive, Direct, No Write Cache if Bad BBU
Current Cache Policy: WriteBack, ReadAdaptive, Direct, No Write Cache if Bad BBU
Access Policy: Read/Write
Disk Cache Policy: Disk's Default
Encryption Type: None
Physical Disk Information:
Physical Disk: 0 #第一块物理盘
Enclosure Device ID: 32
Slot Number: 1
Device Id: 1
Sequence Number: 2
Media Error Count: 0
Other Error Count: 0
Predictive Failure Count: 0
Last Predictive Failure Event Seq Number: 0
PD Type: SATA
Raw Size: 2.728 TB [0x15d50a3b0 Sectors]
Non Coerced Size: 2.728 TB [0x15d40a3b0 Sectors]
Coerced Size: 2.728 TB [0x15d400000 Sectors]
Firmware state: Online
SAS Address(0): 0x500056b37789abec
Connected Port Number: 0(path0) 
Inquiry Data:             手动马赛克 #这里是磁盘的序列号,跟磁盘标签一致
FDE Capable: Not Capable
FDE Enable: Disable
Secured: Unsecured
Locked: Unlocked
Foreign State: None 
Device Speed: 3.0Gb/s 
Link Speed: 3.0Gb/s 
Media Type: Hard Disk Device
...

查看每块物理盘的信息和状态,跟前面一样,只是少了 Adapter 信息。

./MegaCli64 -PDList -a0

Adapter #0
Enclosure Device ID: 32
Slot Number: 0
Device Id: 0
Sequence Number: 2
Media Error Count: 0
Other Error Count: 0
Predictive Failure Count: 0
Last Predictive Failure Event Seq Number: 0
PD Type: SATA
Raw Size: 2.728 TB [0x15d50a3b0 Sectors]
Non Coerced Size: 2.728 TB [0x15d40a3b0 Sectors]
Coerced Size: 2.728 TB [0x15d400000 Sectors]
Firmware state: Online
SAS Address(0): 0x500056b37789abee
Connected Port Number: 0(path0) 
Inquiry Data:             手动马赛克 #这里是磁盘的序列号,跟磁盘标签一致
FDE Capable: Not Capable
FDE Enable: Disable
Secured: Unsecured
Locked: Unlocked
Foreign State: None 
Device Speed: 3.0Gb/s 
Link Speed: 3.0Gb/s 
Media Type: Hard Disk Device
Enclosure Device ID: 32
Slot Number: 1
Device Id: 1
Sequence Number: 2
Media Error Count: 0
Other Error Count: 0
Predictive Failure Count: 0
Last Predictive Failure Event Seq Number: 0
PD Type: SATA
Raw Size: 2.728 TB [0x15d50a3b0 Sectors]
Non Coerced Size: 2.728 TB [0x15d40a3b0 Sectors]
Coerced Size: 2.728 TB [0x15d400000 Sectors]
Firmware state: Online
SAS Address(0): 0x500056b37789abec
Connected Port Number: 0(path0) 
Inquiry Data:             手动马赛克 #这里是磁盘的序列号,跟磁盘标签一致
FDE Capable: Not Capable
FDE Enable: Disable
Secured: Unsecured
Locked: Unlocked
Foreign State: None 
Device Speed: 3.0Gb/s 
Link Speed: 3.0Gb/s 
Media Type: Hard Disk Device
...

这里会拿到很多有用的信息:

1、Slot Number:slot 号,应该跟机器外观上的标识一致。如果机器上有多块盘,直接告诉现场工程师 slot X 的硬盘有问题,工程师就会直接换盘。

2、Inquiry Data: 这里是磁盘的序列号,跟磁盘标签上一致。磁盘标签需要拔盘才能看到,按 slot 拔盘看到磁盘的序列号应该跟 Inquiry Data 一致。

3、Firmware state: 这里能看到磁盘的状态,Online 是我们期望看到的最好状态,除此之外还有 Unconfigured Offline Failed 等等,大多表达一个悲伤的事实:你要加班报修/修复他们了。。。

4、需要特别关注这几个指标:Media Error / Other Error / Predictive Failure Count / Last Predictive Failure Event Seq Number 都有可能不是 0。这意味着磁盘虽然能用但已经不再可靠,很有可能存在坏簇、坏道之类的问题,必须尽快换掉这块盘。如果坚持使用,那磁盘就离彻底坏掉不远了。网上流传的说法是前 3 个 Count 越大代表磁盘状态越

这个问题专门与服务器 RAID 卡磁盘厂家沟通,得到的反馈是: 查到之前的资料,Medium error、other error 数值的绝对值,不能直接反应硬盘的状态。 根据与 RAID 卡、硬盘厂家的沟通,建议做法是监控 Predictive Failure 的数值,不为零说明硬盘有问题。另外,如果硬盘 failed,也可以直接报修。 Predictive Failure Count 指令:storcli /c0/eall/sall show all 监控关键字 Predictive Failure Count,标准为不能大于 0,若有计数,将对应的硬盘换掉; Predictive Failure 中已经涵盖 media error,而且比 media error 的范围更广、更全面。 硬盘的 SMART 子系统已经具备一套完整的算法来评估硬盘的健康状况 SMART 子系统算法会参考硬盘运行时各个方面的参数,media error 是其中一项 SMART 对于 media error 的评估是基于单位时间增长数来计算的 当 SMART 子系统中任何一个评估项达到对应的阈值时,硬盘会报告 Sense Code: 01 5D 00 (FAILURE PREDICTION THRESHOLD EXCEEDED) 遵循 SCSI 协议标准的 host (OS SCSI 子系统,SAS 控制器, RAID 卡等) 可以正确解析出该 Sense Code 综上,由于 media error 已经被硬盘 SMART 子系统所涵盖,并且会依据 SCSI 协议标准上报 predictive failure,所有硬盘部分只需要在 Raid 卡下监控 Predictive Failure 就好,标准为不能大于 0。


蜷缩的蜗牛 , 版权所有丨如未注明 , 均为原创丨 转载请注明MegaCli 操作指引
喜欢 (0)
[]
分享 (0)

您必须 登录 才能发表评论!