Showing posts with label RAID. Show all posts
Showing posts with label RAID. Show all posts

Wednesday, June 11, 2008

Today's boneheaded Solaris admin move

I converted the /var filesystem of a host I was installing to be a DiskSuite mirror, but forgot that I shouldn't attach the other side of the mirror until the filesystem was mounted through the metadevice.

Further compounding the problem, I didn't restart the system until after the mirror had resynced and I'd done a bunch of other work. After the reboot, a flood of errors from fsck; unsurprisingly, since everything I'd done to the system since I added the mirror had only been written to one half, but DiskSuite (Solaris Volume Manager, I guess, to give it its modern term) thought both mirrors were good, and was randomly reading from the good side or the bad side …

What makes it even more stupid is that I know better.

Given that I'd just installed the system, it was quicker to just re-jumpstart …

Tuesday, January 29, 2008

Odd Solaris DiskSuite problem, and solution

One of the systems I admin had a failed disk that was in use by two DiskSuite RAID 5 volumes (IMO insane, given the performance hit, but not my decision). After the disk was replaced, any attempt to run DiskSuite programs such as 'metastat' gave the following error:

Assertion failed: mdrcp->colnamep->start_blk <= rcp->un_orig_devstart, file ../common/meta_raid.c, line 151
metastat: Abort
Abort (core dumped)

No documentation about this error available anywhere, and a Google only found 3 or 4 hits, none of them helpful (one of them involved using LD_PRELOAD to replace the abort() function to allow 'metaclear' to delete the RAID, recreate it, and reload from backups.

I worked out why the error occurred, though. When the disk was replaced, it of course came with a label/TOC with partitions defined. If these partitions don't match the pre-existing RAID setup, the metadisk tools die a death.

All that was required was to build up the proper partitioning, and then everything worked fine.