Monday, May 11, 2015

That RAID10 Mystery..

Not all RAIDs are created equal and I recently stumbled upon an issue where I was at loss to explain why two different configs seemed to have the same characteristics (RAID-10).
On a server with many disks (xd configs), I created two LD’s  (one with ten disks, the other with 12 disks).
I asked for a RAID10 config in both cases but I didn't use the same tools to create the RAID's.


  • Creating the two LD’s
    • First LD (racadm on the iDrac):


                send "racadm storage createvd:RAID.Integrated.1-1 -rl r10 -wp wb -rp ra -name Virtual_Disk_1 -ss 512k -pdkey:"
                send "Disk.Bay.2:Enclosure.Internal.0-1:RAID.Integrated.1-1"
                send ",Disk.Bay.3:Enclosure.Internal.0-1:RAID.Integrated.1-1"
                send ",Disk.Bay.4:Enclosure.Internal.0-1:RAID.Integrated.1-1"
                send ",Disk.Bay.5:Enclosure.Internal.0-1:RAID.Integrated.1-1"
                send ",Disk.Bay.6:Enclosure.Internal.0-1:RAID.Integrated.1-1"
                send ",Disk.Bay.7:Enclosure.Internal.0-1:RAID.Integrated.1-1"
                send ",Disk.Bay.8:Enclosure.Internal.0-1:RAID.Integrated.1-1"
                send ",Disk.Bay.9:Enclosure.Internal.0-1:RAID.Integrated.1-1"
                send ",Disk.Bay.10:Enclosure.Internal.0-1:RAID.Integrated.1-1"
                send ",Disk.Bay.11:Enclosure.Internal.0-1:RAID.Integrated.1-1"
                send "\r"


    • Second LD (MegaCli):

$ sudo MegaCli64 -CfgSpanAdd -r10 -Array0[32:12,32:13,32:14,32:15,32:16,32:17] -Array1[32:18,32:19,32:20,32:21,32:22,32:24] -a0


  • Configuration Results

$ sudo MegaCli64 -LDInfo -l1 -a0 -NoLog
Adapter 0 -- Virtual Drive Information:
Virtual Drive: 1 (Target Id: 1)
Name                :Virtual_Disk_1
RAID Level          : Primary-1, Secondary-0, RAID Level Qualifier-0
Size                : 2.725 TB
Sector Size         : 512
Is VD emulated      : No
Mirror Data         : 2.725 TB
State               : Optimal
Strip Size          : 512 KB
Number Of Drives    : 10
Span Depth          : 1

$ sudo MegaCli64 -LDInfo -l2 -a0 -NoLog
Adapter 0 -- Virtual Drive Information:
Virtual Drive: 2 (Target Id: 2)
Name                :
RAID Level          : Primary-1, Secondary-0, RAID Level Qualifier-0
Size                : 3.271 TB
Sector Size         : 512
Is VD emulated      : No
Mirror Data         : 3.271 TB
State               : Optimal
Strip Size          : 64 KB
Number Of Drives per span:6
Span Depth          : 2

So you would expect that the two LD's are RAID'ed differently (one must be RAID0+1 while the other is RAID1+0).

Let’s look at the LD’s:

    • First LD

$ sudo ~/bin/megaclisas-status|egrep '(Status|c0u1)'
-- ID | Type    |    Size |  Strpsz |   Flags | DskCache |  Status |  OS Path | InProgress   
c0u1  | RAID-10 |   2725G |  512 KB |   RA,WB | Disabled | Offline |        1 | None         
-- ID   | Type | Drive Model                      | Size     | Status          | Speed    | Temp | Slot ID  | LSI Device ID
c0u1p0  | HDD  | SEAGATE ST600MM0006 LS0AS0MH6WD1 | 558.3 Gb | Online, Spun Up | 6.0Gb/s  | 29C  | [32:2]   | 2       
c0u1p1  | HDD  | SEAGATE ST600MM0006 LS0AS0MM9VMK | 558.3 Gb | Online, Spun Up | 6.0Gb/s  | 29C  | [32:3]   | 3       
c0u1p2  | HDD  | SEAGATE ST600MM0006 LS0AS0MPU9UD | 558.3 Gb | Online, Spun Up | 6.0Gb/s  | 29C  | [32:4]   | 4       
c0u1p3  | HDD  | SEAGATE ST600MM0006 LS0AS0M0DYA3 | 558.3 Gb | Online, Spun Up | 6.0Gb/s  | 29C  | [32:5]   | 5       
c0u1p4  | HDD  | SEAGATE ST600MM0006 LS0AS0MPE83S | 558.3 Gb | Online, Spun Up | 6.0Gb/s  | 29C  | [32:6]   | 6       
c0u1p5  | HDD  | SEAGATE ST600MM0006 LS0AS0MCS44B | 558.3 Gb | Online, Spun Up | 6.0Gb/s  | 31C  | [32:7]   | 7       
c0u1p6  | HDD  | SEAGATE ST600MM0006 LS0AS0MV4CDQ | 558.3 Gb | Online, Spun Up | 6.0Gb/s  | 32C  | [32:8]   | 8       
c0u1p7  | HDD  | SEAGATE ST600MM0006 LS0AS0MKZITD | 558.3 Gb | Online, Spun Up | 6.0Gb/s  | 29C  | [32:9]   | 9       
c0u1p8  | HDD  | SEAGATE ST600MM0006 LS0AS0MG5JK5 | 558.3 Gb | Online, Spun Up | 6.0Gb/s  | 30C  | [32:10]  | 10      
c0u1p9  | HDD  | SEAGATE ST600MM0006 LS0AS0MNQKGJ | 558.3 Gb | Online, Spun Up | 6.0Gb/s  | 29C  | [32:11]  | 11      


    • Second LD

$ sudo ~/bin/megaclisas-status|egrep '(Status|c0u2)'
-- ID | Type    |    Size |  Strpsz |   Flags | DskCache |  Status |  OS Path | InProgress   
c0u2  | RAID-10 |   3271G |   64 KB | ADRA,WB |  Default | Optimal | /dev/sdc | Background Initialization: Completed 2%, Taken 2 min. 
-- ID   | Type | Drive Model                      | Size     | Status          | Speed    | Temp | Slot ID  | LSI Device ID
c0u2p0  | HDD  | SEAGATE ST600MM0006 LS0AS0M8U2J0 | 558.3 Gb | Online, Spun Up | 6.0Gb/s  | 30C  | [32:12]  | 12      
c0u2p1  | HDD  | SEAGATE ST600MM0006 LS0AS0MR90JD | 558.3 Gb | Online, Spun Up | 6.0Gb/s  | 30C  | [32:13]  | 13      
c0u2p2  | HDD  | SEAGATE ST600MM0006 LS0AS0MEHKQG | 558.3 Gb | Online, Spun Up | 6.0Gb/s  | 30C  | [32:14]  | 14      
c0u2p3  | HDD  | SEAGATE ST600MM0006 LS0AS0MD8GJM | 558.3 Gb | Online, Spun Up | 6.0Gb/s  | 32C  | [32:15]  | 15      
c0u2p4  | HDD  | SEAGATE ST600MM0006 LS0AS0MIQWWY | 558.3 Gb | Online, Spun Up | 6.0Gb/s  | 32C  | [32:16]  | 16      
c0u2p5  | HDD  | SEAGATE ST600MM0006 LS0AS0MPL5QR | 558.3 Gb | Online, Spun Up | 6.0Gb/s  | 30C  | [32:17]  | 17      
c0u2p0  | HDD  | SEAGATE ST600MM0006 LS0AS0MJ59BJ | 558.3 Gb | Online, Spun Up | 6.0Gb/s  | 30C  | [32:18]  | 18      
c0u2p1  | HDD  | SEAGATE ST600MM0006 LS0AS0MB5F8B | 558.3 Gb | Online, Spun Up | 6.0Gb/s  | 29C  | [32:19]  | 19      
c0u2p2  | HDD  | SEAGATE ST600MM0006 LS0AS0MLCFI8 | 558.3 Gb | Online, Spun Up | 6.0Gb/s  | 29C  | [32:20]  | 20      
c0u2p3  | HDD  | SEAGATE ST600MM0006 LS0AS0M7N594 | 558.3 Gb | Online, Spun Up | 6.0Gb/s  | 30C  | [32:21]  | 21      
c0u2p4  | HDD  | SEAGATE ST600MM0006 LS0AS0MXA9G2 | 558.3 Gb | Online, Spun Up | 6.0Gb/s  | 29C  | [32:22]  | 22      
c0u2p5  | HDD  | SEAGATE ST600MM0006 LS0AS0MM3BLR | 558.3 Gb | Online, Spun Up | 6.0Gb/s  | 52C  | [32:24]  | 24      


  • Punching holes through the RAID drives

Let’s take some drives offline. If we have a RAID0+1 config, as soon as we start hitting the second stripe, the LD will most likely go down..:


    • First LD (10 disks):

$ sudo MegaCli64 -PDOffline -PhysDrv '[32:2]' -a0
$ sudo MegaCli64 -PDOffline -PhysDrv '[32:4]' -a0
$ sudo MegaCli64 -PDOffline -PhysDrv '[32:6]' -a0
$ sudo MegaCli64 -PDOffline -PhysDrv '[32:8]' -a0
$ sudo MegaCli64 -PDOffline -PhysDrv '[32:10]' -a0

The LD is still online, albeit Degraded (This would mean RAID10, not 0+1):

$ sudo ~/bin/megaclisas-status|egrep '(Status|c0u1)'
Password:
-- ID | Type    |    Size |  Strpsz |   Flags | DskCache |  Status |  OS Path | InProgress   
c0u1  | RAID-10 |   2725G |  512 KB |   RA,WB | Disabled | Degraded | /dev/sdb | None         
-- ID   | Type | Drive Model                      | Size     | Status          | Speed    | Temp | Slot ID  | LSI Device ID
c0u1p0  | HDD  | SEAGATE ST600MM0006 LS0AS0MH6WD1 | 558.3 Gb | Offline         | 6.0Gb/s  | 30C  | [32:2]   | 2       
c0u1p1  | HDD  | SEAGATE ST600MM0006 LS0AS0MM9VMK | 558.3 Gb | Online, Spun Up | 6.0Gb/s  | 30C  | [32:3]   | 3       
c0u1p2  | HDD  | SEAGATE ST600MM0006 LS0AS0MPU9UD | 558.3 Gb | Offline         | 6.0Gb/s  | 30C  | [32:4]   | 4       
c0u1p3  | HDD  | SEAGATE ST600MM0006 LS0AS0M0DYA3 | 558.3 Gb | Online, Spun Up | 6.0Gb/s  | 29C  | [32:5]   | 5       
c0u1p4  | HDD  | SEAGATE ST600MM0006 LS0AS0MPE83S | 558.3 Gb | Offline         | 6.0Gb/s  | 30C  | [32:6]   | 6       
c0u1p5  | HDD  | SEAGATE ST600MM0006 LS0AS0MCS44B | 558.3 Gb | Online, Spun Up | 6.0Gb/s  | 32C  | [32:7]   | 7       
c0u1p6  | HDD  | SEAGATE ST600MM0006 LS0AS0MV4CDQ | 558.3 Gb | Offline         | 6.0Gb/s  | 33C  | [32:8]   | 8       
c0u1p7  | HDD  | SEAGATE ST600MM0006 LS0AS0MKZITD | 558.3 Gb | Online, Spun Up | 6.0Gb/s  | 30C  | [32:9]   | 9       
c0u1p8  | HDD  | SEAGATE ST600MM0006 LS0AS0MG5JK5 | 558.3 Gb | Offline         | 6.0Gb/s  | 30C  | [32:10]  | 10      
c0u1p9  | HDD  | SEAGATE ST600MM0006 LS0AS0MNQKGJ | 558.3 Gb | Online, Spun Up | 6.0Gb/s  | 30C  | [32:11]  | 11      


    • Second LD (12 disks)

Let’s take some drives offline:
$ sudo MegaCli64 -PDOffline -PhysDrv '[32:13]' -a0
$ sudo MegaCli64 -PDOffline -PhysDrv '[32:15]' -a0
$ sudo MegaCli64 -PDOffline -PhysDrv '[32:17]' -a0
$ sudo MegaCli64 -PDOffline -PhysDrv '[32:19]' -a0
$ sudo MegaCli64 -PDOffline -PhysDrv '[32:21]' -a0
$ sudo MegaCli64 -PDOffline -PhysDrv '[32:24]' -a0

That second LD is also still online:
$ sudo ~/bin/megaclisas-status|egrep '(Status|c0u2)'
-- ID | Type    |    Size |  Strpsz |   Flags | DskCache |  Status |  OS Path | InProgress   
c0u2  | RAID-10 |   3271G |   64 KB | ADRA,WB |  Default | Degraded | /dev/sdc | None         
-- ID   | Type | Drive Model                      | Size     | Status          | Speed    | Temp | Slot ID  | LSI Device ID
c0u2p0  | HDD  | SEAGATE ST600MM0006 LS0AS0M8U2J0 | 558.3 Gb | Online, Spun Up | 6.0Gb/s  | 30C  | [32:12]  | 12      
c0u2p1  | HDD  | SEAGATE ST600MM0006 LS0AS0MR90JD | 558.3 Gb | Offline         | 6.0Gb/s  | 30C  | [32:13]  | 13      
c0u2p2  | HDD  | SEAGATE ST600MM0006 LS0AS0MEHKQG | 558.3 Gb | Online, Spun Up | 6.0Gb/s  | 30C  | [32:14]  | 14      
c0u2p3  | HDD  | SEAGATE ST600MM0006 LS0AS0MD8GJM | 558.3 Gb | Offline         | 6.0Gb/s  | 32C  | [32:15]  | 15      
c0u2p4  | HDD  | SEAGATE ST600MM0006 LS0AS0MIQWWY | 558.3 Gb | Online, Spun Up | 6.0Gb/s  | 32C  | [32:16]  | 16      
c0u2p5  | HDD  | SEAGATE ST600MM0006 LS0AS0MPL5QR | 558.3 Gb | Offline         | 6.0Gb/s  | 29C  | [32:17]  | 17      
c0u2p0  | HDD  | SEAGATE ST600MM0006 LS0AS0MJ59BJ | 558.3 Gb | Online, Spun Up | 6.0Gb/s  | 29C  | [32:18]  | 18      
c0u2p1  | HDD  | SEAGATE ST600MM0006 LS0AS0MB5F8B | 558.3 Gb | Offline         | 6.0Gb/s  | 29C  | [32:19]  | 19      
c0u2p2  | HDD  | SEAGATE ST600MM0006 LS0AS0MLCFI8 | 558.3 Gb | Online, Spun Up | 6.0Gb/s  | 29C  | [32:20]  | 20      
c0u2p3  | HDD  | SEAGATE ST600MM0006 LS0AS0M7N594 | 558.3 Gb | Offline         | 6.0Gb/s  | 30C  | [32:21]  | 21      
c0u2p4  | HDD  | SEAGATE ST600MM0006 LS0AS0MXA9G2 | 558.3 Gb | Online, Spun Up | 6.0Gb/s  | 29C  | [32:22]  | 22      
c0u2p5  | HDD  | SEAGATE ST600MM0006 LS0AS0MM3BLR | 558.3 Gb | Offline         | 6.0Gb/s  | 51C  | [32:24]  | 24      

In both cases, after taking one more drive offline, its associated LD went down. But, ....
At any case, it doesn't look like RAID0+1 and it does seem both LD's are RAID10.
The MegaCli output is very different between LD1 and LD2 and I am at loss to explain what this means (if you know, please do tell :) ).

The modified version of megaclisas-status can be found here:
https://github.com/ElCoyote27/hwraid/blob/master/wrapper-scripts/megaclisas-status

2 comments:

  1. Your post was helpful to me, and I think I have it mostly figured out now.

    Quoting a Dell rep from this post (https://community.spiceworks.com/topic/261243-raid-10-how-many-spans-should-i-use): ""The spans are the RAID 0 that connects the RAID 1 arrays together to make the RAID 10 array"

    I believe in setups with multiple spans, we're seeing multiple RAID10 sets being created and concat'ed together. The third response to this post (http://www.webhostingtalk.com/showthread.php?t=1030170) suggests that this is desirable, because the performance benefits of additional spindles in each RAID10 set plateau off pretty quickly, so it makes sense to segregate the data into different RAID10 sets in the hope that they are accessed concurrently. I'm sure a proper storage engineer could tell us more about the ideal number of disks per RAID10 set.

    It seems impossible to create a RAID0+1; passing two large "-Array" arguments into MegaCli only produces two "spans" aka RAID10 sets. Your experiment demonstrates this-- you can still offline every second disk without the entire virtual disk going offline.

    Sadly, I don't see a way to reproduce your setup with 10 disks in a single "span" using MegaCli, which is what started me on this journey. The "-r10" argument to -CfgSpanAdd seems to require at least two arrays. I've tried giving it an empty second array, i.e. -Array1[], and it just hangs. In my use-case, we're running these commands during a Kickstart when we have very few tools (we're able to make MegaCli available, but racadm is much harder). The machines have 10 disks, which can't be neatly subdivided into any other even number... we could split off a RAID1 pair, but we'd rather not. If you have ideas, I'd love to hear them!

    ReplyDelete
    Replies
    1. Hello, Just so you know, the racadm cli above isn't run inside the OS but on the iDRAC itself. It doesn't require anything inside the booted image except that the server is pre-configured with the proper RAIDs. Here's How I did it to build machines at my previous job:
      - Launch an except script that:
      * connects to the iDrac through ssh and does these:
      - configures iDRAC settings (port speed, fan speed, etc..)
      - configures the RAID levels using things like the above (racadm ...)

      Afterwards, the server rebooted and we'd start the PXE + kickstart procedure..
      (timing is critical here as some RACADM operations on the *30xd systems can take as much as 5 minutes)

      So you don't need to have racadm working inside your boot image but you need to be able to connect to the iDrac.

      Regards,

      Vincent

      Delete

LVM2 bootdisk encapsulation on RHEL7/Centos7

Introduction Hi everyone, Life on overcloud nodes was simple back then and everybody loved that single 'root' partition on th...