As you may or may not be aware, I started a new job several weeks back. I’m working for a local startup company (more details some other time) administering a Linux-based cluster, a bunch of Linux workstations, and a full-on OS X desktop environment. Not only is it an exciting and interesting job, but it’s sure to provide me with a ton of material to make my blog more technical in nature. This is going to be the first of what is likely to be a long trend of technical articles.
Over the past few days we’ve been experiencing some I/O problems on the cluster at work. Part of the bottleneck seems to have been NFS. NFS server performance tuning is a difficult process, and it’s often hard to measure the effects of your modifications.
In Linux, many of the NFS statistics are reported in somewhat convoluted manner which I will make an effort to explain here. The Linux kernel makes available a variety NFS server statistics in
/proc/net/rpc/nfsd. If you read this file, you should get some output which resembles the following:
1 2 3 4 5 6 7 8 9 10 11
So what does all of this stuff mean? Unfortunately any documentation appears to be hard to come by, but I managed to figure it out by searching the net and reading the comment in the
fs/nfsd/stats.c source found in the Linux kernel. The following is my understanding:
The first line, beginning with
rc is the stats for the NFS read cache. The three numbers are cache hits, cache misses, and “nocache” which is presumably requests that bypassed the cache (this is just a guess, no description is provided of this field). As you can see, my read cache has very few hits. This is because it’s serving the cluster users’ home directories from which their simulations are being run. Most of the jobs we run are not very I/O intensive and those that are perform primarily writes and rarely access the same files multiple times.
fh fields relate to file handles: stale, total lookups, anonymous lookups, “dir-not-in-dcache”, and “non-dir-not-in-dcache”. I have yet to figure out the significance of all of these. In my case they are all 0 (why?).
io is the overall I/O counter. Bytes read, bytes written. Simple.
th line reports the nfsd thread utilization. The first number here is the total number of nfsd threads currently executing. In my case it’s 64 (yes, that’s quite a lot of nfsd’s). The second number is the number of times that all threads were in use. It’s bad to have this number high (as in my case) because that means there are I/O processes on clients out there which become blocked waiting for a reply from the NFS server. The remaining 10 numbers on this line are a time histogram, each number representing a 10% range (1-10%, 11-20%, etc) of nfsd thread utilization in seconds. For example: my first number is 41.336 which means that for 41.336 seconds of nfsd cputime 1-10% of my nfsd threads were in use. My last number is 29.356, which is the time that 90-100% of the threads were in use (bad!). You can use these numbers to decide if the number of nfsd processes you have is sufficient. They are not reset if you restart your nfsd, so in this example I racked up most of the 100% utilization when I was still using only 8 nfsd threads.
Finally we have the
ra line, which has your read-ahead cache size as the first number. The next 10 numbers are the number of times an entry was found in the read-ahead cache < 10%, < 20%, …, < 100% in to the cache. The last number on this line is the number of times an entry was not found in the cache at all.
The lines beginning with
rpc and below are some common generic RPC statistics. I haven’t yet delved in to the details of these but presumably they do things like count the number of RPC requests. These same fields can also be found in
/proc/net/rpc/nfs and elsewhere.
The important thing to keep in mind is that these figures are only a current snapshot of the statistics. They give no indication of trends over time. A sudden spike of activity can make drastic changes to your numbers. In order to gather some useful data, it’s best to have a cron job or maybe some kind of daemon to periodically monitor and record the statistics to a log file or something like rrdtool. This is left as an exercise for the reader.
Since we haven’t yet reached the point where we are fully monitoring our systems I’m keeping the logging simple, just enough to see if my changes have been effective in reducing the NFS bottleneck. For now I just have a cron job every hour that runs
/bin/grep “th” /proc/net/rpc/nfsd
>> /var/log/nfsd_stats which is sufficiently granular for my purposes at the moment.
I hope this brief article has given you a better understanding of some of the available NFS server statistics. I’d appreciate any feedback and especially corrections.