|
楼主 |
发表于 2008-9-17 20:49:50
|
显示全部楼层
16 Sep 2008 22:25:07 UTC
Another week, another database maintenance outage. This one was short but busy. We actually had major upgrade plans for one server but feared this would take all day and lock out the servers so we postponed it until less week which may be less stressful.
Eric cleared a bunch of space of the workunit storage so that bottleneck has been alleviated for now, i.e we have elbow room to create enough workunits to keep up with demand. However this leads us to the first of two mysteries today. You see, he's moving all the beta workunits to our new homemade NAS box (ptolemy). While this move has been already been helpful, it's taking forever to complete. Why are the disks pegged at 100% utilization? Lack of spindles? PCI bus traffic? Old/slow controller cards? RAID5 biting us again? We'll either sort that out or eventually give up on this machine as anything more than archival storage.
The other mystery has been a known issue for some time, but with the down time we revisited the problem: our secondary science database server, bambi, works great except for the fact that upon reboot there's a random chance one or two (or three) drives simply don't show up on the 3ware controller, causing all kinds of RAID panics/rebuilds. It's never clear why this happens, or when it will happen, and when it does it's not always the same drives that disappear.
However, a full power cycle always works. The only difference really is that the drives have to spin up on power cycle, but not on reboot. So we've been assuming there's some spin-up settings that need to be tweaked. There's been talk of making bambi the primary database server, so today we looked for those settings. Couldn't find them - nothing in the regular motherboard BIOS, and nothing useful in the 3ware BIOS - and the latter was moot because the drives would have already disappeared according to the 3ware BIOS, so all the spin-up problems are happening before the 3ware is aware. I find nothing about this in any documentation or on the web. It's not a showstopper, we can still use bambi as the backup that it is, but this pretty much means we'll never be able to fully trust bambi as a "main" server.
Oh yeah.. other stuff. The mysql replica croaked this morning just before we arrived - a partition on the server filled up. Apparently when upgrading the OS we missed a sym link somewhere. So the replica is resync'ing yet again. Also messing around getting the CUDA development/testing server up and running.
- Matt |
|