Softpanorama

Home Switchboard Unix Administration Red Hat TCP/IP Networks Neoliberalism Toxic Managers
May the source be with you, but remember the KISS principle ;-)
Skepticism and critical thinking is not panacea, but can help to understand the world better

HP Smart Array P410i controller

News HP Servers Recommended Links HP ProLiant DL360 G7 Recovering Filesystems from corrupted RAID sets Abrupt change of disks geometry in RAID10 configuration Lack of reliability
Sysadmin Horror Stories Administration of Remote Servers  Linux Disk Partitioning Simple Unix Backup Tools Recovery of LVM partitions Humor Etc

Introduction

This controller is used in HP DL360 G7 DP580 G7 and several other HP servers. Firmware version vary and in a typical HP manner are updated way too frequently. You need to disable graphic boot screen to get into controller BIOS on startup

With  CD/DVD installed on HP DL360 G7 P410i controller does not see drives in slots 5 and 6 (Bay 5, 6 not available when DVD installed) and sometimes even in slots 1 and 2 (in version 5.11). Only drives in slots 3 and 4 are visible.

DL360 G7 not showing new drives in slots 5 & 6

‎07-06-2012 03:20 PM

I just added 2 new drives to my 2 DL360G7 servers which are the same build and config. I am not seeing the system recognize them at all with the SmartStart ACU or even physically, no lights or anything are apparent when the drives are insterted into the system. They are the same configuration, including same part number and all are 15k. I feel that I might have missed something, but dont' think I need another controller to use these slots, would someone please give me an idea if there is something else I am missing?

The system is currently configuration o 2x146GB sets of Raid 1+-0. I just expanded the original array from 2 to 4 drives, but the 2 new drives which would make this 6 in the system aren't visible at all.

Thanks in advance for your help.

Re: DL360 G7 not showing new drives in slots 5 & 6

‎07-08-2012 07:20 PM

Take a look at the below links:

DL360 G7 QuickSpecs.

DL360 G7 Storage Configuration.

 I am not sure if you have 8 HDD backplane or 6 HDD backplane with DVD Drive.

Bay 5, 6 not available when DVD installed.

4 SFF Bays standard with option for up to 8 SFF drive bays total.
NOTE: 4 bays standard in Entry, Base and CTO models, with optional upgrade to 8 bays with the SFF HD Backplane Kit (516966-B21). No CD or DVD options available when server is configured with 8 drive bays.

Thanks

Re: DL360 G7 not showing new drives in slots 5 & 6

‎07-09-2012 08:16 AM

I had in production originally 4 146GB SAS drives and DVD drive which is in slot 7 and 8, but 5 and 6 just had an insert.

So, you are saying because of having the DVD drive, slots 5 and 6 are only capable of an insert? Or, do I need another component to use this space? Thanks.

Suman_1978 HP Pro

Re: DL360 G7 not showing new drives in slots 5 & 6

‎07-09-2012 07:18 PM

The documentation says that if Optical drive is installed then only drives bay 1-4 works, 5-6 do not work.

You may also refer to this link:

HP ProLiant DL360 G7 Server User Guide
http://bizsupport1.austin.hp.com/bc/docs/support/S​upportManual/c02065265/c02065265.pdf

Page# 48 = This server supports the installation of a DVD-ROM drive or a DVD-RW drive. When an optical drive is installed, the server does not support the additional hard drive backplane.

Page# 51 = When the hard drive backplane option is installed, the server does not support the DVD-ROM or DVD-RW drive options.

Page# 68 and 69 = Hard drive backplane cabling

Thank You!
__________________________________________
Was the post useful? Click on the white KUDOS! Star.

For product specific Guided Troubleshooting visit; http://www.hp.com/support/hpgt
Want to know How to use HP Guided Troubleshooting?  Visit the below link.
http://h20000.www2.hp.com/bizsupport/TechSupport/D​ocument.jsp?objectID=c03283214

RAID10 on two disks is fools gold

The small amount of performance enhancement that is achievable by "pseudo RAID10" on two disks is negated by loss of reliability: if with RAID 1 disks are usable even if controller failed or lost configuration, disks configured with RAID 10 are not. Moreover sometimes very bad things can happen with RAID 10 configuration on to disks, especially with ancient version of firmware on P410i (5.xx or earlier): controller can lose both disks at one by issuing some internal command that wipes that content.

If you do not touch controller and disks they work. But if you do touch them your mileage can vary.

First of all HP in its infinite wisdom propose the most aggressive RAID setting. They advocate striping and propose 51 on two disks by default.  But with cache of 256M the effect is negligible and with 1GB cache might be negative.  But if RAID 1 contains disks that can be read on other server individually, RAID 10 is more complex beast and loss of configuration in controller lead to real disasters.

All RAID levels which offer striping can get into situation when all of the data on the array are lost when more hard drives fail than the redundancy can handle. Or when the controller loses it s RAID configuration. Which is probably most frequent case with P410i. 

Setting up RAID 10 on 2 disks involves additional magic with "Sections" on the drive to speed up sequential reads.  The gain in unclear and can be negative(Non-standard RAID levels - Wikipedia). Generally Raid 10 requires 4 disks with non-standard implementation existing for three (Nested RAID levels - Wikipedia) ?

RAID 10, as recognized by the storage industry association and as generally implemented by RAID controllers, is a RAID 0 array of mirrors, which may be two- or three-way mirrors,[6] and requires a minimum of four drives. However, a nonstandard definition of "RAID 10" was created for the Linux MD driver;[7] Linux "RAID 10" can be implemented with as few as two disks. Implementations supporting two disks such as Linux RAID 10 offer a choice of layouts.[7] Arrays of more than four disks are also possible.

Or RAID 50:

RAID 50, also called RAID 5+0, combines the straight block-level striping of RAID 0 with the distributed parity of RAID 5.[3] As a RAID 0 array striped across RAID 5 elements, minimal RAID 50 configuration requires six drives. On the right is an example where three collections of 120 GB RAID 5s are striped together to make 720 GB of total storage space.

One drive from each of the RAID 5 sets could fail without loss of data; for example, a RAID 50 configuration including three RAID 5 sets can only tolerate three maximum potential drive failures. Because the reliability of the system depends on quick replacement of the bad drive so the array can rebuild, it is common to include hot spares that can immediately start rebuilding the array upon failure. However, this does not address the issue that the array is put under maximum strain reading every bit to rebuild the array at the time when it is most vulnerable.[12][13]

RAID 50 improves upon the performance of RAID 5 particularly during writes, and provides better fault tolerance than a single RAID level does. This level is recommended for applications that require high fault tolerance, capacity and random access performance. As the number of drives in a RAID set increases, and the capacity of the drives increase, this impacts the fault-recovery time correspondingly as the interval for rebuilding the RAID set increases.[

How you can recover information is it is distributed on two disks. But novice sysadmin do not ask such questions and accept proposed default. 

Generally using some advanced version of RAID without testing the controller capabilities (and reliability) is very dangerous. Only RAID 1 allow space for errors in this respect. Recovery of 300GB disk can easily cost $15K

Overcomplexity and horrible reliability in non-standard situations

There is no documentation where and how the controller stores this configuration. If configuration is lost you lost all (and I mean all) information on your disks. With Raid 10 you need to pay special recovery company to retrieve the data even if you disk are not damaged at all. The price for recovery of 300GB in such a situation is between $5K and $15K. See for example RAID 10 Data Recovery - Secure Data Recovery Services Essentially the cost of the server. A controller bug in which it lose the configuration can do as much damage as a failed disk.

Moreover, HP does not provide qualified advice if you face such a situation. One remedy is to buy their recovery contract. That increases the cost. 

Among non-standard situation for this RAID controller

In some, yet unclear, circumstances the controller can issue formatting command for the drives. Disks will continue to be shown as RAID 10 and healthy but there no readable information on them.

 

Case study: total loss of data on the server

Here is one interesting case study:

We have problem with HP DL360 server that has four 300GB 15 RPM SAS drives (standard HP suppled drive that came with the server, I think they are Seagate. All of the same model and batch; no replacements). Two were used in RAID 10 configuration (in slots 3 and 4 of the server) -- DL360 in our configuration has 6 slots and DVD in slots 7 and 8. Two were unused.

Drives are controlled to P410i controller. Firmware Version: 5.14.

Again, the server has four 300GB 15K RPM drives (slots 1-4), but only two were used (slots 3 and 4) as the second pair was not visible in the controller BIOS interface

I wanted to add two additional drives as we run out of space on this server, but as the drives in slots 1 and 2 were not visible in controller BIOS interface (as I lately discovered from Internet this is a known problem with P410i) I decided to move them the drives from slot 1 and 2 to slot 5 and 6 without shutting the server down. Which proved top be a very bad idea that I believe caused this problem.

I noticed that something was wrong when I removed the drive as controller complained about some data lost, so it looks like despite the fact that they were not visible via controller interface controller saw them and may be cached some portion of the data on those "unused" drives.

As the result OS can't boot and my yesterday inspection of harddrive via DD utility after booting from rescue DVD shows zeros in the initial sectors. Looks like controller forcefully reinitialised RAID10 with predictable consequences for data. Which can't be right action. 

RAID is operational and drives are perfectly readable from the rescue DVD using DD. No errors reported. DD copied as many sectors as I asked without complains. But no information on the drives exist anymore.

 

HP Smart Array - When is Magic Just an Illusion?

Reproduced verbatim from HP Smart Array - When is Magic Just an Illusion by Eric Wu

I would definitely recommend a backup prior to expanding an HP array. Interrupting the process is almost guaranteed to cause data loss it seems.

Introduction

While testing new HP G7 servers meant for database applications at work, the category of storage is an important one. Certain function such as online RAID expansion and migration allows one to reduce the initial cost and purchase storage as required.

I've been hearing and for the most part, experiencing what a few has described as HP Magic. Simply put, complicated things just work, we don't know how they get it to work so well, but it does. There are times where this is not true, and in testing of the Smart Array P410i on the DL360 G7 servers, I realized how unsafe it is to expand arrays on these and possibly other controllers.

HP ensures that a battery backup module is installed before allowing array expansion. This creates the idea that the expansion process may recover in the event of loss of power, hard shutdown, reboot, etc. This is not documented however, resulting in my curiosity to test how well it works.

Test 1 - Reboot

Test 1 Result


 

Looking at the last few lines of the output, seems like this controller has failed to resume. I was able to boot into the OS after pressing F2. However, all it means is that the 10% completion contains at least enough of the data required to boot Linux.

Test 2 - Power Unplugged

Test 2 Results

Same error as previous test. The difference is that in this case, only 1% of the transformation has completed, resulting in an MBR corruption upon pressing F2. The following screen is displayed on most HP servers when this happens.

Further Analysis and Research

While searching for information on this issue, I came across an HP troubleshooting document. It outlined the error displayed in Test 2 Results above (1769 - page 150). The suggested course of action is:

Link: HP Proliant Servers Troubleshooting Guide

HP's product manual for Smart Array P400 or P410 did not mention any ability to resume from any of the above scenarios. It also stated that removal of a drive from an array during expansion or migration would result in data loss! This is something I will need to test tomorrow, as it is a more likely case, next to software related failures.

What about offline expansion? The answer is: Not Possible. HP did not provide a means to manage the array within the Option ROM (RAID ROM) for functions such as expansion, migration, or any of the other settings. I personally have an Areca ARC-1230, which provides me with nearly all functions available in CLI and the dedicated Ethernet connection for HTTP based management.

Conclusion

While well known companies make it seem like things work well, like magic. What we know about magic is that, often times, it's just an illusion. Make sure to do testing prior to going into production environment in order to determine risky use cases and corresponding course of action for each failure scenario.
 

This entry was posted on June 1st, 2011 at 07:56:22 pm by Eric Wu and is filed under Storage, Unix.

8 comments

 

Comment from: wb [Visitor]

 I'm currently doing an offline transformation using the array configuration utility on the smartstart cdrom. I was wondering if I could reboot the server into linux. But after reading your blog I've decided to wait till its finished :)

06/07/11 @ 05:56

 

Comment from: TooMeeK [Visitor]

 For me, it look's like controller firmware bug.

This shouldn't happend. Try upgrading firmware, for ML350 I remember that all diodes of drives flash to blue when upgrading firmware of the RAID controller.
"It also stated that removal of a drive from an array during expansion or migration would result in data loss!" ee.. what? so this means that if You put brand new drive into array and it will fail this can cause data loss?

Comparing this to software Linux raid using mdadm: when drive fails during rebuild it become failed and rebuild stops. System will stay alive. But I recently noticed that SATA onboard set as IDE not AHCI can block controller and system will hang if drive will fail.

Also on older servers with SCSI drives if drive fail, but only one partition will be degraded and other are still used this can hand system until reboot. So setting manually all partitions from broken drive as failed prevent hungs.

08/03/11 @ 14:58

Comment from: Eric Wu [Member] Email

It's already at the latest firmware. I've never seen something like this before. This might be related to expansion only. I'd like to get my hands on some other controllers to see how well they do in this area. Rebuilding is usually not an issue as the content on existing drives are not affected. Expansion may modify current drive data which results in this issue. However, logically speaking, the redundancy is supposed to take care of a drive failure if the expansion algorithm is properly designed and the cache backup is meant to protect from power outages or reboots when expanding the volume.

08/03/11 @ 15:32

Comment from: Clarence [Visitor]

I definitely have the same opinion with you concerning this topic. Nice entry. Already bookmarked for future reference. 00-)

11/04/11 @ 08:54

Comment from: kaitoollayli1982 [Visitor]

I am impressed, I must say. Really rarely do I discovered a blog thats both educative and entertaining, and let me tell you, you have hit the nail on the head. Your thoughts is important; the issue is something that not enough people are speaking intelligently about. I am very happy that I stumbled across this in my search for something relating to it. =-=

12/19/11 @ 19:02

Comment from: Troy_J [Visitor]

 I just ran across a similar scenario while running the off-line ACU utility to expand an array. Added two more 600GB SAS drives to a 4-drive array. The Extension phase went through the night for several hours. I didn't need to migrate (settings stayed the same), but the progress bar for the Expansion phase stayed at 0% for about an hour. That's when an exec called me and insisted that he NEEDED data on that server IMMEDIATELY. Scheduled downtime or not. I couldn't find any reference to interrupting or cancelling the process, especially for the off-line tool. Calculations of 15mins/GB had me wondering if this thing would take the entire weekend to complete. I even called HP out of desperation and was told that their tests weren't conclusive. "Sometimes the data got lost. Sometimes it didn't. So we can't guarantee anything." While I was thinking of a way around this, I hit the 'ReScan' button for something to do... and the progress meter showed 72%! 20 minutes later, everything was complete. (Odd that the progress meter worked in real-time while extending.)
So I wasn't able to conclusively test this, I'm kind of glad I'm not stuck doing hours of data restoration over a weekend.

05/18/13 @ 15:12

Comment from: Eric Wu [Member] Email

 Glad it worked out for you Troy. I would definitely recommend a backup prior to expanding an HP array. Interrupting the process is almost guaranteed to cause data loss it seems.

05/28/13 @ 02:58

Comment from: Tiago de Aviz [Visitor] Email

 Hello Eric,

Another case happened on a customer of mine where the main drive failed on a RAID 1 group, and we had data loss. It seems as if the server never actually "switched" to the secondary disk...

The more odd thing is that the mirror on the second drive appeared to be outdated, missing files created 14+ days ago.

I'm staying away from these controllers.

Regards,

Tiago

06/29/13 @ 16:50

 


Top Visited
Switchboard
Latest
Past week
Past month

NEWS CONTENTS

Old News ;-)

1783 - Slot 0 Drive Array Controller Failure - p41... - HP Enterprise Business Community

Post options

‎01-04-2012 06:52 AM

We have had these servers deployed since mid-March with no problems until we started updating the firmware and BIOS in late October. Since then, we have averaged about 1 server failure every 6 weeks with this error. HP tries to insist that it is the controller's fault. We have found through troubleshooting that if we install known good drives into the box, the problem goes away. If we install the failed drives into another known good box, this error will follow the drives, not the controller. The drives (both of them) are unusable at this point.

We can remove the drives from the box, run SmartStart utility and the controller is visible. If we reinsert the drives and run SmartStart, the controller is not visible.

The following message is displayed during POST, it appears to be very specific with the dlu code, but I haven't been able to find any kind of information:

1783 – Slot 0 Drive Array Controller Failure! Command failure (cmd=0h, err=00h, dlu=013:5h)

System configuration:

ProLiant ML350 G6 (2011.05.05 – D22)
Smart Array p410i (5.06) (RAID 0+1)
2 x 300GB SAS drives(various, EF0300FATFD – HPD2, EF0300FARMU – HPD3)
Windows Server 2008 (W32 – Standard) – SP2

1 person had this problem.

View All (3)

Jacob W Anderson

Occasional Advisor

Posts: 8

Registered: ‎03-24-2003

Message 2 of 5 (24,442 Views)

Re: 1783 - Slot 0 Drive Array Controller Failure - p410i

Post options

‎01-04-2012 09:38 AM

Have you run the firmware upgrade (version 9.3) to update the FW on the drives? make sure they have updated firmware too. There has been quite a bit of chatter on drive FW incompatibility with the new P series of controllers (which are not HP, but just branded HP).

You have to put the drivers into the box and then boot with the FW DVD to update the FW on the drive. The FW update manager will automatically update the FW on the drives.

I think you have FW HPD2 and HPD3 on your drives. Not sure where the current revision is for your type of drive.

Have you tried a new controller?

vroc31426

Occasional Collector

vroc31426

Posts: 2

Registered: ‎11-05-2010

Message 3 of 5 (24,439 Views)

Re: 1783 - Slot 0 Drive Array Controller Failure - p410i

‎01-04-2012 12:06 PM

Thanks for the info on the controller, I will look into it. Back when all of this started, we were pushing the drive f/w updates through HPSIM. That was when we had our first occurance. The controller is integrated on the system board, no cache module, but when we move drives to another system, problem follows the drives.. Known good drives work fine.

raid - data lost with RAID5 on proliant DL360 when drives fail

Server Fault

up vote0down votefavorite

For some reason the Array Controller on my Proliant DL360 G6 couldn't recognize 2 of the 6 750GB drives I am running in a RAID 5 configuration with VMware ESXi 5.1.

When I rebooted the server I chose the BIOS option (F2) to recognize the 2 drives it said it stopped acknowledging.

Here is the BIOS option I chose: "Select "F2" to accept data loss and re-enable logical drive(s)."

All 6 drives now show up in the volume again.

Unfortunately, there seems to be data corruption. Many of the virtual machines no longer work and don't even register properly in ESXi anymore. ESXi boots ok, but none of the virtual machines hosted in it work.

I booted with the Array Configuration Utility and it says the Parity Initialization has finished. ACU doesn't show any other errors or information notices.

Is there a way for me to rebuild or recover my data so my virtual machines start working again? It is still a mystery to me why Array stopped seeing the two drives in the first place, but all I want to do now is recover all my data so my virtual machines start working again.

share|improve this question edited Jun 27 '13 at 11:53

ewwhite
130k29223470
asked Jun 26 '13 at 23:52

user127875
412
5
""Select "F2" to accept data loss and re-enable logical drive(s)."" ...uh... – Nathan C Jun 27 '13 at 1:32
6
Restore backups. – fukawi2 Jun 27 '13 at 1:40
add a comment |

2 Answers 2

activeoldestvotes
up vote7down vote F2 is typically the right option to choose... Otherwise the ESXi server would not have booted. My concern is what happened leading up to this incident...

Did you receive any errors? Any indicators on the hard drive LEDs? Typically, an HP server's disks won't just crap-out on you. Considering you're using 750GB disks in RAID5, the chances are that the drives are SATA and you may have more than one failed or failing disk.

Let's go the the HP ProLiant DL360 G6 quickspecs...

Okay, so the only disk options for that server from HP are:

  • SAS 2.5" in 72GB, 146GB, 300GB, 450GB, 600GB...
  • SATA 2.5" in 120GB, 160GB, 250GB, 500GB, 1TB...

So, where did these disks come from?
They're definitely not HP disks. I don't recall any server-class 2.5" 750GB disks ever hitting the market.

Are these laptop hard drives?

If so, there are a number of reasons this could have happened. I think a big SATA RAID5 could have resulted in the dreaded unrecoverable-read error (URE), where you may have had a failed disk and another one on it's way out.

Since this is ESXi, let's hope you have the HP health agents and utility bundle installed.

If you do, post the screenshot of the Hardware Status -> Storage menu in VMware and possibly the output of the /opt/hp/hpacucli/bin/hpacucli ctrl all show config detail command.

Worst case, your data is hosed.

Best case, you can build new virtual machines and import the VMDK files. Maybe it's just .vmx file corruption.

Either way, you should not move forward until you determine what happened with your disks in the first place. Otherwise, you're building on a pile of s**t and could encounter the same thing in the future.

(also, update your server's firmware, if you haven't already)

share|improve this answer edited Jun 27 '13 at 13:42


answered Jun 27 '13 at 11:36

ewwhite
130k29223470
Thanks for the great response! Yes, I am using laptop hard drives. I hadn't installed the HP health agents/utility bundle, but I will do so this time. I am convinced I just have to re-install ESXi and restore my backups. My investigations into RAID 5 show that if you lose more than 1 drive at a time (no matter how many drives in the RAID) you won't be able to automatically recover. I do have backups, so all is not lost. It will just take me a day of working in the noisy datacenter to recover. :( The firmware is completely up to date. – user127875 Jun 27 '13 at 14:59
2
NO LAPTOP HARD DRIVES IN THE SERVER!! Don't reinstall onto the same disks!! – ewwhite Jun 27 '13 at 15:42
Your comment "RAID 5 show that if you lose more than 1 drive at a time (no matter how many drives in the RAID) you won't be able to automatically recover." isn't necessarily true, depending on what failed. Sometimes the system will mark multiple drives as bad but you might be able to mark them as good/online and see if the system boots. At that point it's definitely time to take a backup and then investigate (via software/TAC) why the system thought the drive(s) were bad. I've had this happen on multiple servers simply due to a bug in the RAID firmware. – TheCleaner Jun 27 '13 at 15:50
add a comment |

up vote3down vote If it has rebuilt the array and you still cant access the data (''I booted with the Array Configuration Utility and it says the Parity Initialization has finished. '')

It is time to seek professional help. If it has rebuilt only using 5 drives then it may take some serious detangling to get anything back.

Depending where you are look for a reputable data recovery company.

Mike

share|improve this answer answered Jun 27 '13 at 10:46

Mike
311
2
And a lot of money. Possibly / preferably a job outside IT - one does not require reading (i.e. ignoring "accept data loss" and then wondering why the data is gone). – TomTom Jun 27 '13 at 11:05
add a comment |

[Aug 02, 2015] HP DL360 G7 P410i controller troubleshooting

up vote1down votefavorite

Server is HP DL360 G7 with P410i disk controller. 2xE5620 CPU's. 16GB RAM. Linux mysql 2.6.32-5-amd64 #1 SMP Mon Feb 25 00:26:11 UTC 2013 x86_64 GNU/Linux (Debian 6.0.7)
hpacucli "ctrl all show status"

Smart Array P410i in Slot 0 (Embedded)
   Controller Status: OK
   Cache Status: OK
   Battery/Capacitor Status: OK

hpacucli "ctrl all show config"

Smart Array P410i in Slot 0 (Embedded)    (sn: 5001438014555B80)

   array A (SAS, Unused Space: 0 MB)


      logicaldrive 1 (136.7 GB, RAID 1+0, OK)

      physicaldrive 1I:1:1 (port 1I:box 1:bay 1, SAS, 72 GB, OK)
      physicaldrive 1I:1:2 (port 1I:box 1:bay 2, SAS, 72 GB, OK)
      physicaldrive 1I:1:3 (port 1I:box 1:bay 3, SAS, 72 GB, OK)
      physicaldrive 1I:1:4 (port 1I:box 1:bay 4, SAS, 72 GB, OK)

   SEP (Vendor ID PMCSIERA, Model  SRC 8x6G) 250 (WWID: 5001438014555B8F)

hpacucli "ctrl slot=0 ld all show"

Smart Array P410i in Slot 0 (Embedded)

   array A

      logicaldrive 1 (136.7 GB, RAID 1+0, OK)

I run the following script via night:

#!/bin/bash
mkdir -p /isotest
for i in {1..200}; do
    for j in {1..55}; do cp -v /root/ubuntu.iso /isotest/ubuntu.iso${j}; done
    rm /isotest/ubuntu.iso*;
done

/root/ubuntu.iso size is abou 2 GB.

in syslog has some errors. I think that it is related to disk controller:

Mar 28 06:59:17 mysql kernel: [850337.524306] INFO: task mandb:25565 blocked for more than 120 seconds.
Mar 28 06:59:17 mysql kernel: [850337.524337] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Mar 28 06:59:17 mysql kernel: [850337.524381] mandb         D ffff88022740fa20     0 25565  25197 0x00000000
Mar 28 06:59:17 mysql kernel: [850337.524385]  ffff88041ec4b880 0000000000000082 0000000000000000 000000009d778d11
Mar 28 06:59:17 mysql kernel: [850337.524388]  ffffea000defe260 ffffea000defe260 000000000000f9e0 ffff88014d913fd8
Mar 28 06:59:17 mysql kernel: [850337.524390]  00000000000157c0 00000000000157c0 ffff88013228a350 ffff88013228a648
Mar 28 06:59:17 mysql kernel: [850337.524393] Call Trace:
Mar 28 06:59:17 mysql kernel: [850337.524404]  [<ffffffff810168ec>] ? read_tsc+0xa/0x20
Mar 28 06:59:17 mysql kernel: [850337.524408]  [<ffffffff8106bdca>] ? timekeeping_get_ns+0xe/0x2e
Mar 28 06:59:17 mysql kernel: [850337.524412]  [<ffffffff810b4761>] ? sync_page+0x0/0x46
Mar 28 06:59:17 mysql kernel: [850337.524416]  [<ffffffff812fc8f2>] ? io_schedule+0x73/0xb7
Mar 28 06:59:17 mysql kernel: [850337.524418]  [<ffffffff810b47a2>] ? sync_page+0x41/0x46
Mar 28 06:59:17 mysql kernel: [850337.524421]  [<ffffffff812fcd02>] ? __wait_on_bit_lock+0x3f/0x84
Mar 28 06:59:17 mysql kernel: [850337.524423]  [<ffffffff810b472e>] ? __lock_page+0x5d/0x63
Mar 28 06:59:17 mysql kernel: [850337.524426]  [<ffffffff810652e0>] ? wake_bit_function+0x0/0x23
Mar 28 06:59:17 mysql kernel: [850337.524428]  [<ffffffff810b473d>] ? lock_page+0x9/0x1f
Mar 28 06:59:17 mysql kernel: [850337.524431]  [<ffffffff810b4853>] ? find_lock_page+0x25/0x45
Mar 28 06:59:17 mysql kernel: [850337.524433]  [<ffffffff810b4e63>] ? filemap_fault+0x1a5/0x2f6
Mar 28 06:59:17 mysql kernel: [850337.524438]  [<ffffffff810cadf2>] ? __do_fault+0x54/0x3c3
Mar 28 06:59:17 mysql kernel: [850337.524455]  [<ffffffffa01702d2>] ? __ext3_journal_stop+0x1f/0x3d [ext3]
Mar 28 06:59:17 mysql kernel: [850337.524458]  [<ffffffff810cd146>] ? handle_mm_fault+0x3b8/0x80f
Mar 28 06:59:17 mysql kernel: [850337.524461]  [<ffffffff81101d8e>] ? notify_change+0x2b3/0x2c5
Mar 28 06:59:17 mysql kernel: [850337.524464]  [<ffffffff81103eb5>] ? mntput_no_expire+0x23/0xee
Mar 28 06:59:17 mysql kernel: [850337.524467]  [<ffffffff81300096>] ? do_page_fault+0x2e0/0x2fc
Mar 28 06:59:17 mysql kernel: [850337.524469]  [<ffffffff812fdf35>] ? page_fault+0x25/0x30

There are no other error messages.

Or this error can be related to memory? I already run memtest86+ on that server for several days and there was no errors.

When server was in data center, i cant boot server up. It show all the time error:

Fatal PCI Express Device Error PCI ? B00/D00/F00

After transporting it to my work, it boot up normally. In ILO event log has fallowing errors:

Uncorrectable PCI Express Error (Embedded device, Bus 0, Device 0, Function 0, Error status 0x00000000)
Uncorrectable Memory Error ((Processor 1, Memory Module 2))
Uncorrectable Memory Error ((Processor 1, Memory Module 3))
An Unrecoverable System Error (NMI) has occurred (System error code 0x00000000, 0x00000000)

I already updated bios, disk controller and drive firmwares to latest versions.

HP SmartArray P400: How to repair failed logical drive?

I have a HP Server with SmartArray P400 controller (incl. 256 MB Cache/Battery Backup) with a logicaldrive with replaced failed physicaldrive that does not rebuild.

This is how it looked when I detected the error:

~# /usr/sbin/hpacucli ctrl slot=0 show config
Smart Array P400 in Slot 0 (Embedded) (sn: XXXX)

  array A (SATA, Unused Space: 0 MB)
    logicaldrive 1 (698.6 GB, RAID 1, OK)
      physicaldrive 1I:1:1 (port 1I:box 1:bay 1, SATA, 750 GB, OK)
      physicaldrive 1I:1:2 (port 1I:box 1:bay 2, SATA, 750 GB, OK)

  array B (SATA, Unused Space: 0 MB)
    logicaldrive 2 (2.7 TB, RAID 5, Failed)
      physicaldrive 1I:1:3 (port 1I:box 1:bay 3, SATA, 750 GB, OK)
      physicaldrive 1I:1:4 (port 1I:box 1:bay 4, SATA, 750 GB, OK)
      physicaldrive 2I:1:5 (port 2I:box 1:bay 5, SATA, 750 GB, OK)
      physicaldrive 2I:1:6 (port 2I:box 1:bay 6, SATA, 750 GB, Failed)
      physicaldrive 2I:1:7 (port 2I:box 1:bay 7, SATA, 750 GB, OK)

  unassigned
      physicaldrive 2I:1:8 (port 2I:box 1:bay 8, SATA, 750 GB, OK)
~# 

I thought that I had drive 2I:1:8 configured as a spare for Array A and Array B, but it seems this was not the case :-(. I noticed the problem due to I/O errors on the host, even if only 1 physicaldrive of the RAID5 is failed.

Does someone know why this could happen? The logicaldrive should go into "Degraded" mode but still be fully accessible from the host os!?

I first tried to add the unassigned drive 2I:1:8 as a spare to logicaldrive 2, but this was not possible:

~# /usr/sbin/hpacucli ctrl slot=0 array B add spares=2I:1:8
    Error: This operation is not supported with the current configuration.
    Use the "show" command on devices to show additional details 
    about the configuration.
~#  

Interestingly it is possible to add the unassigned drive to the first array without problems. I thought maybe the controller put the array into "failed" state due to the missing spare and protects failed arrays from modification. So I tried was to reenable the logicaldrive (to add the spare afterwards):

~# /usr/sbin/hpacucli ctrl slot=0 ld 2 modify reenable
    Warning: Any previously existing data on the logical drive may not 
    be valid or recoverable. Continue? (y/n) y

    Error: This operation is not supported with the current configuration.
    Use the "show" command on devices to show additional details
    about the configuration.
~# 

But as you can see, re-enabling the logicaldrive this was not possible.

Now I replaced the failed drive by hotswapping it with the unassigned drive. The status now looks like this:

~# /usr/sbin/hpacucli ctrl slot=0 show config
Smart Array P400 in Slot 0 (Embedded) (sn: XXXX)

  array A (SATA, Unused Space: 0 MB)
    logicaldrive 1 (698.6 GB, RAID 1, OK)
      physicaldrive 1I:1:1 (port 1I:box 1:bay 1, SATA, 750 GB, OK)
      physicaldrive 1I:1:2 (port 1I:box 1:bay 2, SATA, 750 GB, OK)

  array B (SATA, Unused Space: 0 MB)
    logicaldrive 2 (2.7 TB, RAID 5, Failed)
      physicaldrive 1I:1:3 (port 1I:box 1:bay 3, SATA, 750 GB, OK)
      physicaldrive 1I:1:4 (port 1I:box 1:bay 4, SATA, 750 GB, OK)
      physicaldrive 2I:1:5 (port 2I:box 1:bay 5, SATA, 750 GB, OK)
      physicaldrive 2I:1:6 (port 2I:box 1:bay 6, SATA, 750 GB, OK)
      physicaldrive 2I:1:7 (port 2I:box 1:bay 7, SATA, 750 GB, OK)
~# 

The logical drive is still not accessible. Why is it not rebuilding?

What can I do?

FYI, this is the configuration of my controller:

~# /usr/sbin/hpacucli ctrl slot=0 show
 Smart Array P400 in Slot 0 (Embedded)
  Bus Interface: PCI
  Slot: 0
  Serial Number: XXXX
  Cache Serial Number: XXXX
  RAID 6 (ADG) Status: Enabled
  Controller Status: OK
  Chassis Slot:
  Hardware Revision: Rev E
  Firmware Version: 5.22
  Rebuild Priority: Medium
  Expand Priority: Medium
  Surface Scan Delay: 15 secs
  Surface Analysis Inconsistency Notification: Disabled
  Raid1 Write Buffering: Disabled
  Post Prompt Timeout: 0 secs
  Cache Board Present: True
  Cache Status: OK
  Accelerator Ratio: 25% Read / 75% Write
  Drive Write Cache: Disabled
  Total Cache Size: 256 MB
  No-Battery Write Cache: Disabled
  Cache Backup Power Source: Batteries
  Battery/Capacitor Count: 1
  Battery/Capacitor Status: OK
  SATA NCQ Supported: True
~# 

Thanks for you help in advance.

Recommended Links

Google matched content

Softpanorama Recommended

Top articles

Sites

Top articles

Sites

...

HP Smart Array - When is Magic Just an Illusion by Eric Wu

HP Smart Array P410 controller

HP.com - ProLiant Storage - HP Smart Array P410 controller - Key benefits

    [PDF] RAID 1(+0): breaking mirrors and rebuilding drives

    How to Configure RAID 5 on HP Proliant DL380 G7 - YouTube

    Most viewed solutions for HP Smart Array P410i Controller - HP Support Center

    Drivers & Software for HP Smart Array P410i Controller ...

    ProLiant DL380 G6 Smart Array P410i configuration ...

    Dec 15, 2013

    HP Smart Array Controller technology - Hewlett-Packard

    Troubleshoot a problem for HP Smart Array P410 Controller ...

    Advisory: HP ProLiant Servers - FIRMWARE UPGRADE ...



    Etc

    Society

    Groupthink : Two Party System as Polyarchy : Corruption of Regulators : Bureaucracies : Understanding Micromanagers and Control Freaks : Toxic Managers :   Harvard Mafia : Diplomatic Communication : Surviving a Bad Performance Review : Insufficient Retirement Funds as Immanent Problem of Neoliberal Regime : PseudoScience : Who Rules America : Neoliberalism  : The Iron Law of Oligarchy : Libertarian Philosophy

    Quotes

    War and Peace : Skeptical Finance : John Kenneth Galbraith :Talleyrand : Oscar Wilde : Otto Von Bismarck : Keynes : George Carlin : Skeptics : Propaganda  : SE quotes : Language Design and Programming Quotes : Random IT-related quotesSomerset Maugham : Marcus Aurelius : Kurt Vonnegut : Eric Hoffer : Winston Churchill : Napoleon Bonaparte : Ambrose BierceBernard Shaw : Mark Twain Quotes

    Bulletin:

    Vol 25, No.12 (December, 2013) Rational Fools vs. Efficient Crooks The efficient markets hypothesis : Political Skeptic Bulletin, 2013 : Unemployment Bulletin, 2010 :  Vol 23, No.10 (October, 2011) An observation about corporate security departments : Slightly Skeptical Euromaydan Chronicles, June 2014 : Greenspan legacy bulletin, 2008 : Vol 25, No.10 (October, 2013) Cryptolocker Trojan (Win32/Crilock.A) : Vol 25, No.08 (August, 2013) Cloud providers as intelligence collection hubs : Financial Humor Bulletin, 2010 : Inequality Bulletin, 2009 : Financial Humor Bulletin, 2008 : Copyleft Problems Bulletin, 2004 : Financial Humor Bulletin, 2011 : Energy Bulletin, 2010 : Malware Protection Bulletin, 2010 : Vol 26, No.1 (January, 2013) Object-Oriented Cult : Political Skeptic Bulletin, 2011 : Vol 23, No.11 (November, 2011) Softpanorama classification of sysadmin horror stories : Vol 25, No.05 (May, 2013) Corporate bullshit as a communication method  : Vol 25, No.06 (June, 2013) A Note on the Relationship of Brooks Law and Conway Law

    History:

    Fifty glorious years (1950-2000): the triumph of the US computer engineering : Donald Knuth : TAoCP and its Influence of Computer Science : Richard Stallman : Linus Torvalds  : Larry Wall  : John K. Ousterhout : CTSS : Multix OS Unix History : Unix shell history : VI editor : History of pipes concept : Solaris : MS DOSProgramming Languages History : PL/1 : Simula 67 : C : History of GCC developmentScripting Languages : Perl history   : OS History : Mail : DNS : SSH : CPU Instruction Sets : SPARC systems 1987-2006 : Norton Commander : Norton Utilities : Norton Ghost : Frontpage history : Malware Defense History : GNU Screen : OSS early history

    Classic books:

    The Peter Principle : Parkinson Law : 1984 : The Mythical Man-MonthHow to Solve It by George Polya : The Art of Computer Programming : The Elements of Programming Style : The Unix Hater’s Handbook : The Jargon file : The True Believer : Programming Pearls : The Good Soldier Svejk : The Power Elite

    Most popular humor pages:

    Manifest of the Softpanorama IT Slacker Society : Ten Commandments of the IT Slackers Society : Computer Humor Collection : BSD Logo Story : The Cuckoo's Egg : IT Slang : C++ Humor : ARE YOU A BBS ADDICT? : The Perl Purity Test : Object oriented programmers of all nations : Financial Humor : Financial Humor Bulletin, 2008 : Financial Humor Bulletin, 2010 : The Most Comprehensive Collection of Editor-related Humor : Programming Language Humor : Goldman Sachs related humor : Greenspan humor : C Humor : Scripting Humor : Real Programmers Humor : Web Humor : GPL-related Humor : OFM Humor : Politically Incorrect Humor : IDS Humor : "Linux Sucks" Humor : Russian Musical Humor : Best Russian Programmer Humor : Microsoft plans to buy Catholic Church : Richard Stallman Related Humor : Admin Humor : Perl-related Humor : Linus Torvalds Related humor : PseudoScience Related Humor : Networking Humor : Shell Humor : Financial Humor Bulletin, 2011 : Financial Humor Bulletin, 2012 : Financial Humor Bulletin, 2013 : Java Humor : Software Engineering Humor : Sun Solaris Related Humor : Education Humor : IBM Humor : Assembler-related Humor : VIM Humor : Computer Viruses Humor : Bright tomorrow is rescheduled to a day after tomorrow : Classic Computer Humor

    The Last but not Least Technology is dominated by two types of people: those who understand what they do not manage and those who manage what they do not understand ~Archibald Putt. Ph.D


    Copyright © 1996-2018 by Dr. Nikolai Bezroukov. www.softpanorama.org was initially created as a service to the (now defunct) UN Sustainable Development Networking Programme (SDNP) in the author free time and without any remuneration. This document is an industrial compilation designed and created exclusively for educational use and is distributed under the Softpanorama Content License. Original materials copyright belong to respective owners. Quotes are made for educational purposes only in compliance with the fair use doctrine.

    FAIR USE NOTICE This site contains copyrighted material the use of which has not always been specifically authorized by the copyright owner. We are making such material available to advance understanding of computer science, IT technology, economic, scientific, and social issues. We believe this constitutes a 'fair use' of any such copyrighted material as provided by section 107 of the US Copyright Law according to which such material can be distributed without profit exclusively for research and educational purposes.

    This is a Spartan WHYFF (We Help You For Free) site written by people for whom English is not a native language. Grammar and spelling errors should be expected. The site contain some broken links as it develops like a living tree...

    You can use PayPal to make a contribution, supporting development of this site and speed up access. In case softpanorama.org is down you can use the at softpanorama.info

    Disclaimer:

    The statements, views and opinions presented on this web page are those of the author (or referenced source) and are not endorsed by, nor do they necessarily reflect, the opinions of the author present and former employers, SDNP or any other organization the author may be associated with. We do not warrant the correctness of the information provided or its fitness for any purpose.

    The site uses AdSense so you need to be aware of Google privacy policy. You you do not want to be tracked by Google please disable Javascript for this site. This site is perfectly usable without Javascript.

    Last modified: June, 14, 2018