Bcfg2 Now in the Gentoo Tree

Bcfg2 has finally hit the official Gentoo tree. For those who don’t know, it’s a configuration management system akin to Puppet or CFEngine. However, I think Bcfg2 has many advantages over these, the biggest of which being an active development and user community and great support on #bcfg2 on Freenode.net. Personally I also find the configuration definitions to be far more understandable that either of the other two programs. It’s written in Python, and has a nicely architected plugin infrastructure.

The package management plugin for Gentoo is still a little weak compared to the Yum or Deb plugins, but it works great otherwise. I encourage Gentoo users who manage any number of systems to install it and give it a try. Let’s find and squash some bugs and improve this great program. Emerge app-admin/bcfg2 to get started.

If you’re on an RPM of Debian-based distro, I also encourage you to give this program a try, it’s quite powerful and very fully featured.

Posted in Linux, Sysadmin | Tagged | View Comments

Authenticating Windows against Open Directory

First of all, apologies to everyone for the long time between posts, I’ve been suffering from a slight shortage of inspiration lately.

However, today I figured out something quite cool. It is possible to authenticate Windows (2000, XP, and possibly Vista) machines against Apple’s Open Directory. This is great if you have an Open Directory server as your user account central store.

The software that enables this is called pGina.

To get stared, simply download and install pGina. Then download the additional plugins. The one we’re interested in installing is the ldapauth plugin. Install the plugin somewhere in to your pGina installation. eg: c:\pGina\plugins

Now launch the configuration utility for pGina, and in the “Plugin” tab browse to the ldapauth_plus.dll plugin in c:\pGina\plugins\ldapauth\. Click the “Configure” button. Ensure the “LDAP Method” is set to “Search Mode”. In the “LDAP Server” field enter the DNS name or IP of your Open Directory server. Leave the port at the default 389. You can leave the rest of the fields blank. Then in contexts add cn=Users,dc=company,dc=com where the last two segments are your base DN. This will depend on your site’s configuration. If you’re unsure, I recommend using a tool like Apache Directory Studio to examine your LDAP server. Finally you should go to the “Password Configuration” tab and check the “Disable Change Password” box. If a user changes their password only on their LDAP server, it may mess up other things on the system such as their kerberos and keychain passwords.

Unfortunately, I don’t think it’s possible to use the groups features in the “User Configuration” tab of this plugin as I can’t find a way to make it look up group membership in the `cn=Groups` container. Perhaps I’ll try hacking this on some day if we ever need to use it here.

Now that you have configured the plugin, you can configure the rest of pGina as you see fit.

The one caveat to using pGina is that if you’re using it for the purpose of sharing files out over CIFS/SMB or remote desktop, the users will need to log in to the machine locally first, unless the share is readable by “Everyone”. This is because pGina only handles the authentication portion of things at the login window or when connecting remotely. When you are setting permissions on a share, Windows will not be able to look up the users from LDAP, so they will only be available if their account exists on the machine from a previous login.

I hope this has been of use. If you have any further tips then please share them in the comments section.

Posted in Sysadmin | Tagged , , , | View Comments

An exchange analogy

A funny analogy posted on slashdot.

If the same method that exchange/outlook uses to store email were used in the real world as a paper filing system: Every document is translated into Greek, and the original is burned. Then they are all glued together into one solid block and stuffed into a magic box with a tiny slot, through which you can talk to a little gnome who somehow gets each message for you as needed. Sometimes the gnome gets confused and it takes hours (sometimes days) for him to sort things out; meanwhile he can’t find your documents until he is totally finished becoming unconfused again. As an added bonus the gnome costs several thousand dollars and when he dies every few years you need to buy a new gnome. Oh and if the first box gets (arbitrarily) full you have to buy another special gnomebox, which of course costs $$$

Disclaimer: I don’t think exchange is all bad, but I think it could be much better and certainly more open.

Posted in Humour, Sysadmin | Tagged | View Comments

HOWTO: Make a Nagios Dashboard widget in 50 seconds.

Do you use Nagios to monitor your network? Do you use a mac? Do you think it would be handy to have a Dashboard widget showing your Nagios overview panel? Well, it’s quite easy to do. Check out this screencast where I make a Nagios Dashboard widget using Safari’s new web clips feature.

I apologize for the lack of audio, my mini at home doesn’t have a microphone. However, I think the images show you all you need to know.

You can also view the full size video (approximately 8 MB)

Posted in Sysadmin | Tagged , | View Comments

Why Gentoo?

Prompted by the editor of DistroWatch.com Gentoo developer Ben de Groot recently appealed to the Gentoo community to post their reasons for using Gentoo. As a big Gentoo fan and long time I user, I feel inclined to respond.

I am currently responsible for maintaining an HPC cluster of 76 nodes, at roughly 360 cores. The nodes all boot off of an NFS-mounted Gentoo image that we have heavily customized. In addition, we have roughly 10 developer machines which use a different Gentoo image. To support this infrastructure we also have a number of servers, which perform all sorts of tasks: Serving files via NFS, PBS scheduler, netboot server, Xen virtual machine host, etc. All of these are also Gentoo. In total we have over 100 machines running Gentoo.

The whole setup is surprisingly easy to manage. I have set up a local portage mirror, and a local overlay for our own packages. Classes of machines use the same image, so bringing up a new developer box is just an rsync away. A new cluster node can be added just by plugging a macine in to the network and netbooting. Xen domU’s have a template from which they are cloned. For servers, there’s a bootstrap script which builds the machine up with the minimum amount of input at the start of the process. We don’t yet have a binary package host, but it’s something I’m looking at adding.

Some reasons why I love using Gentoo:

  • New versions of packages are quickly available. Some as quickly as the day of the release! For example, the recent OpenSSH 5.0 already has ebuilds.
  • System configuration is not hidden or managed by a restrictive GUI. I’m a configuration file kind of guy, and the way system configuration is handled in Gentoo is very nice for that.
  • You install only exactly what you need. There isn’t really a “default” set of packages, other than the minimum to get the system up and running. This is great for things like our cluster node image.
  • It’s easy to create your own ebuilds. An ebuild for simpler things can take as little as 5 minutes to write, and lets you manage deployment across the network. We also maintain a number of software packages in our SVN repository. With the Subversion eclass, updating an ebuild to pull from a newer release in our repository can be as easy as renaming the file. This is nice.

That’s all for now. If you have any questions about our setup, please feel free to contact me and I’ll be glad to help you out.

Posted in Sysadmin | Tagged , | View Comments

Leopard AFP: Not production ready

The other day we downgraded our Leopard file server back to Tiger. Luckily we had a spare XServe available for this purpose. For a detailed description of the problem, see my previous post. I can offer this additional information:

  • The problem is triggered by attempted authentications. We set up a script to monitor the AFP server every 20 minutes to check if it was down. Often times we could see the DirectoryService crash log timestamp corresponding nearly exactly with the time that our script attempted its test.
  • Sending a HUP signal, or toggling an AFP option like EnableGuestAccess (which does the same thing, I think), allows people to authenticate AFP connections again. At least until the next time it crashes.
  • Eventually the server comes down hard. We managed to keep it up with our monitoring tool and the HUP periodically, but at some point it seems to give up and die completely. This requires a complete restart of AFP, and a loss of all client connections. It really sucks for home directories, and can corrupt files as well (myself and several other users seem to have lost some preference files which were in use at the time).

Since we’re doing testing on a new laptop image, and causing lots of AFP connections, we were getting more than 1 AFP/DirectoryService crash per hour. Unacceptable. There is definitely something wrong with the link between the two.

In the end, we were forced to call a network downtime for the end of the day and rebuilt our XServe with a Tiger image. Today was our first day running that setup, and it was solid as a rock. No slowness, no crashing.

In short, Leopard Server is not ready, at least not for serving AFP. Keep waiting.

Posted in Sysadmin | Tagged , | View Comments

AFP + Directory Services on Leopard = Disaster

All it takes is a cursory search on some Apple discussion boards or mailing lists for “AFP crash” or “DirectoryService crash” to turn up a load of discussions on the topic.

The summary of the problem is basically this: The DirectoryService process crashes for some reason, then gets restarted by launchd. However, AFP (or more specifically, the AppleFileServer process) appears to not regain its connection to it. This prevents any new AFP connections from being able to authenticate, and existing ones are unable to re-authenticate. Couple this with AFP mounted home directories, and now your users can’t log in to their workstations, or their existing session hangs.

In said discussions there are dozens of proposed workarounds. These include: Periodically HUP’ing the AppleFileServer process, setting up some crazy firewall rules, periodically toggling guest access, and numerous other things. I personally have tried many of them and can confidently say that none of them are a good solution. The toggling seems to mitigate the problem to some degree, but eventually things still come down hard.

One fix that appeared promising which we tried recently is not running Open Directory (of the network variety) on the same host as AFP. Fortunately we had a second XServe which was acting as an OD replica and not much else, so we demoted it to a server which is just connected to OD, and moved out AFP home share there. This seemed to work fine for at least a day, but then this weekend the DirectoryService process crashed yet again, causing the same problem as before.e

The thing that really blows my mind about this whole issue is that people have been reporting it since November of last year. That’s 5 whole months, and still no sign of a fix from Apple! Say what you will about other companies being slow to respond to problems, I’ve never seen a major issue like this take so long to be fixed by anyone else.

With OS X 10.5.3 being seeded to developers in the last few days, I hope that Apple finally gets on the ball and fixes this glaring problem! This is definitely one of the most frustrating problems I’ve encountered during my time in the computing industry..

Posted in Sysadmin | Tagged , | View Comments

Mounting an HFS+ volume from Linux

Yesterday I spent some time figuring out how to mount a snapshot of one of our Mac server’s HFS+ volumes from our iSCSI SAN.
There doesn’t appear to be any clear instructions for how to use an HFS+ formatted device in Linux, but I finally figured it out.

You need the following kernel options:

CONFIG_HFS_FS=m
CONFIG_HFSPLUS_FS=m
CONFIG_MAC_PARTITION=y
CONFIG_EFI_PARTITION=y

(of course you can always put y where I put m, but I prefer to use modules).

Most howto’s online omit the CONFIG_EFI_PARTITION option, but it’s required for the kernel to recognize partitions on devices using the UUID partition scheme. You can tell if your device is one of these by looking at it in Disk Utility on a mac.

Now when you connect to a block device which contains an HFS+ volume, it will show up with several partitions. You can inspect them if you use parted. Typically the first one will be some kind of special EFI partition, and the second one will be the partition that contains the actual data.

You should not be able to mount it with something like
# mount -t hfsplus /dev/sdc2 /mnt/tmp

However, it appears that if you have a journaled filesystem you can only read and not write to the volume, even if mounted rw. Another thing to keep in mind is that it doesn’t appear that ACL’s or the resource fork are handled in any way at the moment…

Update:
The partition numbering is not entirely correct. After further experimentation, it appears to vary in different cases. In the case of two or three partitions, it’s the second one. In the case that four partitions appear, it’s the third.

Posted in Sysadmin | Tagged , , | View Comments

A new focus

Since returning from Japan I’ve been quite busy and not posting many personal posts on this blog. Frankly I don’t have the time or inclination, and since I can see most of the people who would care about it on a day to day basis, there isn’t really much point.

I figure it’s time to take this blog to a different level and change the focus a little. I will no longer be posting random personal things things here but instead focus on writing concise and informative articles. The topics will include most of the things I deal with every day: Modern Unix (Linux and OS X) system administration, virtualization, clustering, and high performance computing.

There are lots of exciting new technologies out there, and I am fortunate enough to have the opportunity to work and experiment with many of them. I hope that I can use this blog as an outlet to share many of them with the internet and computing community.

Posted in Meta | View Comments

Forcing a zone retransfer in Bind on OS X Leopard Server

Recently we started migrating to OS X 10.5. Of course the process is fraught with many challenges, and lots of things are done differently.

For example, now views are used by default in DNS. This makes some things a bit more convoluted, but it may make other things easier in the future.

If you’ve ever administered a bind install, you may know about the rndc tool. It can perform all sorts of things without having to restart BIND (downtime is bad, mmkay?). However, the syntax is not always clear. For example, when using views, how do you retransfer a zone?

It goes something like this:

~# rndc retransfer zone myzone.mydomain.com IN myView

If you’re using Leopard server, you probably have the default view name, so this becomes:

~# rndc retransfer zone myzone.mydomain.com IN com.apple.ServerAdmin.DNS.public


Speaking of Server Admin, it was clearly a very rushed application. The DNS portion is particularly horrific… you can’t even enable transfers for a reverse zone!

Posted in Sysadmin | Tagged , | View Comments