Showing posts with label solaris. Show all posts
Showing posts with label solaris. Show all posts

Friday, June 20, 2008

Finding the umask of a running process in Solaris…

…is inordinately difficult. This seemingly basic piece of information is not available through /proc, nor through dtrace, nor any other supported way. It can only be retrieved by crawling through the kernel's data structures, either with mdb(1) or through using libkvm, an even uglier way to do it.

Chad Mynhier provides the way to do it on his blog, as well as pointing to a thread on comp.unix.solaris about the same topic, showing how to do it in C using libkvm.

Wednesday, June 11, 2008

Today's boneheaded Solaris admin move

I converted the /var filesystem of a host I was installing to be a DiskSuite mirror, but forgot that I shouldn't attach the other side of the mirror until the filesystem was mounted through the metadevice.

Further compounding the problem, I didn't restart the system until after the mirror had resynced and I'd done a bunch of other work. After the reboot, a flood of errors from fsck; unsurprisingly, since everything I'd done to the system since I added the mirror had only been written to one half, but DiskSuite (Solaris Volume Manager, I guess, to give it its modern term) thought both mirrors were good, and was randomly reading from the good side or the bad side …

What makes it even more stupid is that I know better.

Given that I'd just installed the system, it was quicker to just re-jumpstart …

Tuesday, January 29, 2008

Odd Solaris DiskSuite problem, and solution

One of the systems I admin had a failed disk that was in use by two DiskSuite RAID 5 volumes (IMO insane, given the performance hit, but not my decision). After the disk was replaced, any attempt to run DiskSuite programs such as 'metastat' gave the following error:

Assertion failed: mdrcp->colnamep->start_blk <= rcp->un_orig_devstart, file ../common/meta_raid.c, line 151
metastat: Abort
Abort (core dumped)

No documentation about this error available anywhere, and a Google only found 3 or 4 hits, none of them helpful (one of them involved using LD_PRELOAD to replace the abort() function to allow 'metaclear' to delete the RAID, recreate it, and reload from backups.

I worked out why the error occurred, though. When the disk was replaced, it of course came with a label/TOC with partitions defined. If these partitions don't match the pre-existing RAID setup, the metadisk tools die a death.

All that was required was to build up the proper partitioning, and then everything worked fine.