Fast, Easy, Cheap: Pick One

Just some other blog about computers and programming

Bcfg2 0.9.6pre3 Released

The 3rd prerelease of Bcfg2 0.9.6 is now available.

For those not in the know, Bcfg2 is a system that:

helps system administrators produce a consistent, reproducible, and verifiable description of their environment, and offers visualization and reporting tools to aid in day-to-day administrative tasks.

Basically it comes down to managing your system configurations from a central location and then pulling (or optionally, pushing) the configuration data down to each machine. This ensures your machines are in a known state, and eliminates the need to go around to each one and manually verify or copy configuration.

Other tools in this category include Puppet or CFEngine, but IMO Bcfg2 trumps either of those.

Check it out. If you have any questions, feel free to come to #bcfg2 on irc.freenode.net and someone can surely help you out.

Connecting to a Cisco IPSEC VPN From Linux – Without the Cisco Client

So, let’s say your workplace uses a Cisco IPSEC VPN solution. Many places do. Let’s also say you at home have a Linux machine. Being the good Linux user that you are, you keep your system well patched and run a recent kernel release.

You download the Cisco VPN client – from your corporate website since, of course. Cisco would never make such a thing publicly downloadable.. who does that anyway?

You extract the tarball, run the vpn_install script as instructed and BAM. The whole thing bombs! Why? Because your system is too cutting edge for the guys at Cisco to keep up (clearly!). So, your possible solutions are:

  1. Dig through a bunch of random internet forums, searching for the right combination of patches and command incantations that will make the damn thing work on your particular OS and kernel version.
  2. Ditch the piece of junk altogether and install something nicer.

So which should we do? Alright.. let’s go with option 1… just kidding, I mean 2.

Enter a wonderful piece of software called vpnc. Now, I’ll be the first to admit I don’t know much about how this particular piece of software works. And that’s the great thing. Getting the VPN connection up and going was just that simple. So here’s how:

  1. I presume your company uses a PCF file along-side their Cisco VPN client. If not, you have to figure out how to enter the settings yourself. Download this .pcf file and put it somewhere. Say ~/mycompany.pcf
  2. Download http://svn.unix-ag.uni-kl.de/vpnc/trunk/pcf2vpnc
  3. Install vpnc. If you use Ubuntu, this means aptitude install vpnc. Yes, that is all.
  4. Run pcf2vpnc mycompany.pcf mycompany.conf
  5. cp mycompany.conf /etc/vpnc/
  6. sudo vpnc mycompany
  7. There is no step 6!

Oh yeah, at some point you want to disconnect and go do something else other than work. For that use sudo vpnc-disconnect.

I tested this on Hardy Heron, results may vary between distributions.

When running pcf2vpnc you may receive the following message: Can't exec "cisco-decrypt": No such file or directory at ./pcf2vpnc line 30. cisco-decrypt not in search path, adding passwords in obfuscated form This just means that your vpn configuration will contain your password in obfuscated form instead of plaintext, it does not mean the conversion failed.

Update 2009/02/20: Someone has posted a howto which can work for OS X as well: http://www.gdanko.net/vpnc.html

Update 2009/06/15:

If you receive an error message such as

vpnc: no response from target

you need to add the line

NAT Traversal Mode cisco-udp

to your mycompany.conf file.

Bcfg2 Now in the Gentoo Tree

Bcfg2 has finally hit the official Gentoo tree. For those who don’t know, it’s a configuration management system akin to Puppet or CFEngine. However, I think Bcfg2 has many advantages over these, the biggest of which being an active development and user community and great support on #bcfg2 on Freenode.net. Personally I also find the configuration definitions to be far more understandable that either of the other two programs. It’s written in Python, and has a nicely architected plugin infrastructure.

The package management plugin for Gentoo is still a little weak compared to the Yum or Deb plugins, but it works great otherwise. I encourage Gentoo users who manage any number of systems to install it and give it a try. Let’s find and squash some bugs and improve this great program. Emerge app-admin/bcfg2 to get started.

If you’re on an RPM of Debian-based distro, I also encourage you to give this program a try, it’s quite powerful and very fully featured.

Authenticating Windows Against Open Directory

First of all, apologies to everyone for the long time between posts, I’ve been suffering from a slight shortage of inspiration lately.

However, today I figured out something quite cool. It is possible to authenticate Windows (2000, XP, and possibly Vista) machines against Apple’s Open Directory. This is great if you have an Open Directory server as your user account central store.

The software that enables this is called pGina.

To get stared, simply download and install pGina. Then download the additional plugins. The one we’re interested in installing is the ldapauth plugin. Install the plugin somewhere in to your pGina installation. eg: c:\pGina\plugins

Now launch the configuration utility for pGina, and in the “Plugin” tab browse to the ldapauth_plus.dll plugin in c:\pGina\plugins\ldapauth\. Click the “Configure” button. Ensure the “LDAP Method” is set to “Search Mode”. In the “LDAP Server” field enter the DNS name or IP of your Open Directory server. Leave the port at the default 389. You can leave the rest of the fields blank. Then in contexts add cn=Users,dc=company,dc=com where the last two segments are your base DN. This will depend on your site’s configuration. If you’re unsure, I recommend using a tool like Apache Directory Studio to examine your LDAP server. Finally you should go to the “Password Configuration” tab and check the “Disable Change Password” box. If a user changes their password only on their LDAP server, it may mess up other things on the system such as their kerberos and keychain passwords.

Unfortunately, I don’t think it’s possible to use the groups features in the “User Configuration” tab of this plugin as I can’t find a way to make it look up group membership in the cn=Groups container. Perhaps I’ll try hacking this on some day if we ever need to use it here.

Now that you have configured the plugin, you can configure the rest of pGina as you see fit.

The one caveat to using pGina is that if you’re using it for the purpose of sharing files out over CIFS/SMB or remote desktop, the users will need to log in to the machine locally first, unless the share is readable by “Everyone”. This is because pGina only handles the authentication portion of things at the login window or when connecting remotely. When you are setting permissions on a share, Windows will not be able to look up the users from LDAP, so they will only be available if their account exists on the machine from a previous login.

I hope this has been of use. If you have any further tips then please share them in the comments section.

An Exchange Analogy

A funny analogy posted on slashdot.

If the same method that exchange/outlook uses to store email were used in the real world as a paper filing system: Every document is translated into Greek, and the original is burned. Then they are all glued together into one solid block and stuffed into a magic box with a tiny slot, through which you can talk to a little gnome who somehow gets each message for you as needed. Sometimes the gnome gets confused and it takes hours (sometimes days) for him to sort things out; meanwhile he can’t find your documents until he is totally finished becoming unconfused again. As an added bonus the gnome costs several thousand dollars and when he dies every few years you need to buy a new gnome. Oh and if the first box gets (arbitrarily) full you have to buy another special gnomebox, which of course costs $$$

Disclaimer: I don’t think exchange is all bad, but I think it could be much better and certainly more open.

HOWTO: Make a Nagios Dashboard Widget in 50 Seconds.

Do you use Nagios to monitor your network? Do you use a mac? Do you think it would be handy to have a Dashboard widget showing your Nagios overview panel? Well, it’s quite easy to do. Check out this screencast where I make a Nagios Dashboard widget using Safari’s new web clips feature.

I apologize for the lack of audio, my mini at home doesn’t have a microphone. However, I think the images show you all you need to know.

You can also view the full size video (approximately 8 MB)

Why Gentoo?

Prompted by the editor of DistroWatch.com Gentoo developer Ben de Groot recently appealed to the Gentoo community to post their reasons for using Gentoo. As a big Gentoo fan and long time I user, I feel inclined to respond.

I am currently responsible for maintaining an HPC cluster of 76 nodes, at roughly 360 cores. The nodes all boot off of an NFS-mounted Gentoo image that we have heavily customized. In addition, we have roughly 10 developer machines which use a different Gentoo image. To support this infrastructure we also have a number of servers, which perform all sorts of tasks: Serving files via NFS, PBS scheduler, netboot server, Xen virtual machine host, etc. All of these are also Gentoo. In total we have over 100 machines running Gentoo.

The whole setup is surprisingly easy to manage. I have set up a local portage mirror, and a local overlay for our own packages. Classes of machines use the same image, so bringing up a new developer box is just an rsync away. A new cluster node can be added just by plugging a macine in to the network and netbooting. Xen domU’s have a template from which they are cloned. For servers, there’s a bootstrap script which builds the machine up with the minimum amount of input at the start of the process. We don’t yet have a binary package host, but it’s something I’m looking at adding.

Some reasons why I love using Gentoo:

  • New versions of packages are quickly available. Some as quickly as the day of the release! For example, the recent OpenSSH 5.0 already has ebuilds.
  • System configuration is not hidden or managed by a restrictive GUI. I’m a configuration file kind of guy, and the way system configuration is handled in Gentoo is very nice for that.
  • You install only exactly what you need. There isn’t really a “default” set of packages, other than the minimum to get the system up and running. This is great for things like our cluster node image.

  • It’s easy to create your own ebuilds. An ebuild for simpler things can take as little as 5 minutes to write, and lets you manage deployment across the network. We also maintain a number of software packages in our SVN repository. With the Subversion eclass, updating an ebuild to pull from a newer release in our repository can be as easy as renaming the file. This is nice. That’s all for now. If you have any questions about our setup, please feel free to contact me and I’ll be glad to help you out.

Leopard AFP: Not Production Ready

The other day we downgraded our Leopard file server back to Tiger. Luckily we had a spare XServe available for this purpose. For a detailed description of the problem, see my previous post. I can offer this additional information:

  • The problem is triggered by attempted authentications. We set up a script to monitor the AFP server every 20 minutes to check if it was down. Often times we could see the DirectoryService crash log timestamp corresponding nearly exactly with the time that our script attempted its test.

  • Sending a HUP signal, or toggling an AFP option like EnableGuestAccess (which does the same thing, I think), allows people to authenticate AFP connections again. At least until the next time it crashes.

  • Eventually the server comes down hard. We managed to keep it up with our monitoring tool and the HUP periodically, but at some point it seems to give up and die completely. This requires a complete restart of AFP, and a loss of all client connections. It really sucks for home directories, and can corrupt files as well (myself and several other users seem to have lost some preference files which were in use at the time).

Since we’re doing testing on a new laptop image, and causing lots of AFP connections, we were getting more than 1 AFP/DirectoryService crash per hour. Unacceptable. There is definitely something wrong with the link between the two.

In the end, we were forced to call a network downtime for the end of the day and rebuilt our XServe with a Tiger image. Today was our first day running that setup, and it was solid as a rock. No slowness, no crashing.

In short, Leopard Server is not ready, at least not for serving AFP. Keep waiting.

AFP + Directory Services on Leopard = Disaster

All it takes is a cursory search on some Apple discussion boards or mailing lists for “AFP crash” or “DirectoryService crash” to turn up a load of discussions on the topic.

The summary of the problem is basically this: The DirectoryService process crashes for some reason, then gets restarted by launchd. However, AFP (or more specifically, the AppleFileServer process) appears to not regain its connection to it. This prevents any new AFP connections from being able to authenticate, and existing ones are unable to re-authenticate. Couple this with AFP mounted home directories, and now your users can’t log in to their workstations, or their existing session hangs.

In said discussions there are dozens of proposed workarounds. These include: Periodically HUP’ing the AppleFileServer process, setting up some crazy firewall rules, periodically toggling guest access, and numerous other things. I personally have tried many of them and can confidently say that none of them are a good solution. The toggling seems to mitigate the problem to some degree, but eventually things still come down hard.

One fix that appeared promising which we tried recently is not running Open Directory (of the network variety) on the same host as AFP. Fortunately we had a second XServe which was acting as an OD replica and not much else, so we demoted it to a server which is just connected to OD, and moved out AFP home share there. This seemed to work fine for at least a day, but then this weekend the DirectoryService process crashed yet again, causing the same problem as before.e

The thing that really blows my mind about this whole issue is that people have been reporting it since November of last year. That’s 5 whole months, and still no sign of a fix from Apple! Say what you will about other companies being slow to respond to problems, I’ve never seen a major issue like this take so long to be fixed by anyone else.

With OS X 10.5.3 being seeded to developers in the last few days, I hope that Apple finally gets on the ball and fixes this glaring problem! This is definitely one of the most frustrating problems I’ve encountered during my time in the computing industry..

Mounting an HFS+ Volume From Linux

Yesterday I spent some time figuring out how to mount a snapshot of one of our Mac server’s HFS+ volumes from our iSCSI SAN.
There doesn’t appear to be any clear instructions for how to use an HFS+ formatted device in Linux, but I finally figured it out.

You need the following kernel options:
CONFIG_HFS_FS=m CONFIG_HFSPLUS_FS=m CONFIG_MAC_PARTITION=y CONFIG_EFI_PARTITION=y

(of course you can always put y where I put m, but I prefer to use modules).

Most howto’s online omit the CONFIG_EFI_PARTITION option, but it’s required for the kernel to recognize partitions on devices using the UUID partition scheme. You can tell if your device is one of these by looking at it in Disk Utility on a mac.

Now when you connect to a block device which contains an HFS+ volume, it will show up with several partitions. You can inspect them if you use parted. Typically the first one will be some kind of special EFI partition, and the second one will be the partition that contains the actual data.

You should not be able to mount it with something like
# mount -t hfsplus /dev/sdc2 /mnt/tmp

However, it appears that if you have a journaled filesystem you can only read and not write to the volume, even if mounted rw. Another thing to keep in mind is that it doesn’t appear that ACL’s or the resource fork are handled in any way at the moment…

Update:
The partition numbering is not entirely correct. After further experimentation, it appears to vary in different cases. In the case of two or three partitions, it’s the second one. In the case that four partitions appear, it’s the third.