Tuesday, January 29, 2008

Odd Solaris DiskSuite problem, and solution

One of the systems I admin had a failed disk that was in use by two DiskSuite RAID 5 volumes (IMO insane, given the performance hit, but not my decision). After the disk was replaced, any attempt to run DiskSuite programs such as 'metastat' gave the following error:

Assertion failed: mdrcp->colnamep->start_blk <= rcp->un_orig_devstart, file ../common/meta_raid.c, line 151
metastat: Abort
Abort (core dumped)

No documentation about this error available anywhere, and a Google only found 3 or 4 hits, none of them helpful (one of them involved using LD_PRELOAD to replace the abort() function to allow 'metaclear' to delete the RAID, recreate it, and reload from backups.

I worked out why the error occurred, though. When the disk was replaced, it of course came with a label/TOC with partitions defined. If these partitions don't match the pre-existing RAID setup, the metadisk tools die a death.

All that was required was to build up the proper partitioning, and then everything worked fine.

1 comment:

Anonymous said...

Thanks for posting this, you are one of 4 returns in google when you search for this error. I have just used your snippet of info regarding the TOC table to fix our server.
Sun and another engineer have spent a day looking at this and I simply googled and fixed the problem in under 5 minutes making me look like a hero.

Thanks again.