I converted the /var filesystem of a host I was installing to be a DiskSuite mirror, but forgot that I shouldn't attach the other side of the mirror until the filesystem was mounted through the metadevice.
Further compounding the problem, I didn't restart the system until after the mirror had resynced and I'd done a bunch of other work. After the reboot, a flood of errors from fsck; unsurprisingly, since everything I'd done to the system since I added the mirror had only been written to one half, but DiskSuite (Solaris Volume Manager, I guess, to give it its modern term) thought both mirrors were good, and was randomly reading from the good side or the bad side …
What makes it even more stupid is that I know better.
Given that I'd just installed the system, it was quicker to just re-jumpstart …
Showing posts with label RAID. Show all posts
Showing posts with label RAID. Show all posts
Wednesday, June 11, 2008
Tuesday, January 29, 2008
Odd Solaris DiskSuite problem, and solution
One of the systems I admin had a failed disk that was in use by two DiskSuite RAID 5 volumes (IMO insane, given the performance hit, but not my decision). After the disk was replaced, any attempt to run DiskSuite programs such as 'metastat' gave the following error:
No documentation about this error available anywhere, and a Google only found 3 or 4 hits, none of them helpful (one of them involved using LD_PRELOAD to replace the abort() function to allow 'metaclear' to delete the RAID, recreate it, and reload from backups.
I worked out why the error occurred, though. When the disk was replaced, it of course came with a label/TOC with partitions defined. If these partitions don't match the pre-existing RAID setup, the metadisk tools die a death.
All that was required was to build up the proper partitioning, and then everything worked fine.
Assertion failed: mdrcp->colnamep->start_blk <= rcp->un_orig_devstart, file ../common/meta_raid.c, line 151
metastat: Abort
Abort (core dumped)
No documentation about this error available anywhere, and a Google only found 3 or 4 hits, none of them helpful (one of them involved using LD_PRELOAD to replace the abort() function to allow 'metaclear' to delete the RAID, recreate it, and reload from backups.
I worked out why the error occurred, though. When the disk was replaced, it of course came with a label/TOC with partitions defined. If these partitions don't match the pre-existing RAID setup, the metadisk tools die a death.
All that was required was to build up the proper partitioning, and then everything worked fine.
Subscribe to:
Posts (Atom)