My Lenovo Workstation’s Failed RAID 1

I have a fantastic workstation at home: A Lenovo ThinkServer TS 440, something like this: View on Amazon but much more souped-up: 32 GB RAM, 2 SSD drives in a RAID 1 array as the drives for Windows 10 OS, 2 SSD drives for virtual machines and web content. I have all of these drives weekly backed up via EaseUS ToDo Backup onto external USB and network attached storage drives (NAS) ; even the NAS location has RAID 1!

I had been using this setup for almost one year without any problems. The system has been fast in running virtual machines and delivering web content via IIS. And, because of my backup routines, I had the confidence that the overall system had a lot of redundancy and restore options in case of some system ‘disaster’. I was reasonably confident that I’d be able to restore the system within few minutes in case of hardware or software failure.

Well, that assumption was tested yesterday and my confidence in the setup, while not entirely broken, is at least shaken a bit. Here’s what happened:

Yesterday the computer rebooted for no apparent reasons. I could not blame any power outage because I have the backup power. And yesterday morning there was a brief power outage–a rarity in my house (yes, the system resides in my home office)–but the computer defiantly kept on going. Except now I could not access files on what was the E drive for web content. The E drive was not present in ‘My Computer’ either.

I was seriously alarmed!! I knew I had a backup in the  NAS location but what good would that be with no E drive to restore to?

So I started investigating the crash by using the free version of WhoCrashed software but it didn’t find any entry for the day’s crash. During the system boot I noticed that the workstation’s BIOS, during the boot stage, had shown the ‘status’ of the RAID 1 configuration as ‘Degraded’.  But after Windows or the BIOS itself had ‘fixed’ the ‘Degraded’ status, the status was now ‘Normal’; see the screen cap below:

BIOS
BIOS of my Workstation after fixed

But notice that now the ‘Member Disk(0)’ type is found around both the smaller 220 GB drive and the larger 1 TB drive? It should not be like that: In RAID 1, both drives should have identical storage capacities,and perhaps even the same RPMs if they were the spinning drives. Puzzled by this anomaly, I launched Disk Management inside Windows, see the screen cap below:

Screen cap from Disk Management
Disk Management Utility

In Disk Management (see above), what used to be my E drive, which was one of the two 1 TB capacity drives, was absent now. So either the BIOS or Windows had decided to use the 1 TB drive as part of the RAID configuration. Sure, the RAID status was now ‘Normal’ but if I were to accept the new setup then I would lose 1 TB of an expensive SSD drive AND all my data which was on E drive!

I knew I had to revert to the setup to what it was before the problem: The smaller 220 GB drive as drive C and the two 1 TB drives as drive E and F, with or without the RAID. But first I had to investigate the ‘Degraded’ message in the BIOS: I had to make sure that none of the physical drives were failing or had already failed.

So I took out all four SSD drives and connected them as external USB drives, one by one, using this Cable to Go USB adapter to my Lenovo IdeaPad Flex laptop and all the drives were recognized by the laptop except one of the drives from the RAID array. It is safe to say that that drive is almost certainly ‘Degraded’ or ‘dead’ now.  This was a Sandisk Plus drive with very high reviews at Amazon but seemingly failed in just one year! I thought the SSD drives are supposed to be very resilient?

Having just determined that one of my drives had apparently failed, I had to get rid of the RAID array. But in the Lenovo BIOS, either removing a drive from a RAID or deleting a RAID drive would, according to the warnings right inside the BIOS, wipe off all the data in the drive(s).

Pretty scary, ha? But I had confidence that my weekly backups via the EaseUS ToDo Backup software would come to my rescue. And so I did indeed delete the RAID configuration by first by removing the smaller drive from the RAID array in the BIOS. But I was pleasantly  surprised to see that the data in that drive didn’t erase. Hey, don’t attempt it yourself without knowing what you are doing–I am not be blamed if you lose any data! Frankly, I am not 100% sure how I was able to get this all to work!  After removing the drive from the RAID array, I deleted the RAID array, and booted normally to Windows.

And this time, too, against my expectations, the E drive still didn’t show in Windows. The physical disk was present as ‘Disk 2’ (for the E drive) but, inside Disk Management, it was in ‘offline’ mode because of some signature conflict with another drive. I searched online and found this very helpful post, assigned a new drive ID using Disk Part, then I used EaseUS Partition Manager (free version) to completely format that partition as another E drive. Having finally restored the E drive, I used EaseUS ToDo Backup software to restore the content of the E drive from the NAS location. Yay!

Now the system is back to where it was until yesterday morning. I am satisfied with the recovery results but I am not happy about the SSD’s failure–I have filed a Return Merchandize Authorization (RMA) with SanDisk to get a replacement drive back. I am also not happy that, somehow, either Windows or the Lenovo BIOS, decided to use the larger SSD to be part of the RAID array. And I am also not happy that I could not use the Disk Management utility of Windows to do the partition work; that maybe because there was a ‘Recovery Partition’ between two ‘Unallocated’ partitions on Disk 2 but the EaseUS Partition Manager was able to get the job done fine.

Now I am eagerly waiting to get the approval of my RMA, get a replacement and, hopefully, brandnew SSD drive, and install the new SSD into a newly built RAID–or should I just hope that the drive for the operating system won’t ever fail–please NEVER fail! 🙂

What are your thoughts about this post? Any idea what may have caused the RAID to include the next available drive? Please leave your comments/feedback.

Thank you,

Irfan

 

 

 

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s