Fast, Easy, Cheap: Pick One

Just some other blog about computers and programming

Understanding Linux Nfsd Statistics

As you may or may not be aware, I started a new job several weeks back. I’m working for a local startup company (more details some other time) administering a Linux-based cluster, a bunch of Linux workstations, and a full-on OS X desktop environment. Not only is it an exciting and interesting job, but it’s sure to provide me with a ton of material to make my blog more technical in nature. This is going to be the first of what is likely to be a long trend of technical articles.

Over the past few days we’ve been experiencing some I/O problems on the cluster at work. Part of the bottleneck seems to have been NFS. NFS server performance tuning is a difficult process, and it’s often hard to measure the effects of your modifications.

In Linux, many of the NFS statistics are reported in somewhat convoluted manner which I will make an effort to explain here. The Linux kernel makes available a variety NFS server statistics in /proc/net/rpc/nfsd. If you read this file, you should get some output which resembles the following:

rc 25 573145 152441  
fh 0 0 0 0 0  
io 1835013960 3866376633  
th 64 27835 41.336 6.116 2.588 0.552 1.392 1.052 1.288 1.664 0.068 29.356  
ra 128 84140 0 0 0 0 0 0 0 0 0 477  
net 725755 595452 130147 11  
rpc 725600 0 0 0 0  
proc2 18 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0  
proc3 22 128 18051 2480 8077 9679 0 84617 568055 1180 29 55 0 1001 3 315 38 4 1237 16 131 0 30501  
proc4 2 0 0  
proc4ops 40 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0  

So what does all of this stuff mean? Unfortunately any documentation appears to be hard to come by, but I managed to figure it out by searching the net and reading the comment in the fs/nfsd/stats.c source found in the Linux kernel. The following is my understanding:

The first line, beginning with rc is the stats for the NFS read cache. The three numbers are cache hits, cache misses, and “nocache” which is presumably requests that bypassed the cache (this is just a guess, no description is provided of this field). As you can see, my read cache has very few hits. This is because it’s serving the cluster users’ home directories from which their simulations are being run. Most of the jobs we run are not very I/O intensive and those that are perform primarily writes and rarely access the same files multiple times.

The fh fields relate to file handles: stale, total lookups, anonymous lookups, “dir-not-in-dcache”, and “non-dir-not-in-dcache”. I have yet to figure out the significance of all of these. In my case they are all 0 (why?).

io is the overall I/O counter. Bytes read, bytes written. Simple.

The th line reports the nfsd thread utilization. The first number here is the total number of nfsd threads currently executing. In my case it’s 64 (yes, that’s quite a lot of nfsd’s). The second number is the number of times that all threads were in use. It’s bad to have this number high (as in my case) because that means there are I/O processes on clients out there which become blocked waiting for a reply from the NFS server. The remaining 10 numbers on this line are a time histogram, each number representing a 10% range (1-10%, 11-20%, etc) of nfsd thread utilization in seconds. For example: my first number is 41.336 which means that for 41.336 seconds of nfsd cputime 1-10% of my nfsd threads were in use. My last number is 29.356, which is the time that 90-100% of the threads were in use (bad!). You can use these numbers to decide if the number of nfsd processes you have is sufficient. They are not reset if you restart your nfsd, so in this example I racked up most of the 100% utilization when I was still using only 8 nfsd threads.

Finally we have the ra line, which has your read-ahead cache size as the first number. The next 10 numbers are the number of times an entry was found in the read-ahead cache < 10%, < 20%, …, < 100% in to the cache. The last number on this line is the number of times an entry was not found in the cache at all.

The lines beginning with rpc and below are some common generic RPC statistics. I haven’t yet delved in to the details of these but presumably they do things like count the number of RPC requests. These same fields can also be found in /proc/net/rpc/nfs and elsewhere.

The important thing to keep in mind is that these figures are only a current snapshot of the statistics. They give no indication of trends over time. A sudden spike of activity can make drastic changes to your numbers. In order to gather some useful data, it’s best to have a cron job or maybe some kind of daemon to periodically monitor and record the statistics to a log file or something like rrdtool. This is left as an exercise for the reader.

Since we haven’t yet reached the point where we are fully monitoring our systems I’m keeping the logging simple, just enough to see if my changes have been effective in reducing the NFS bottleneck. For now I just have a cron job every hour that runs echodate/bin/grep “th” /proc/net/rpc/nfsd>> /var/log/nfsd_stats which is sufficiently granular for my purposes at the moment.

I hope this brief article has given you a better understanding of some of the available NFS server statistics. I’d appreciate any feedback and especially corrections.