Wednesday, 19 September 2012

The Linux Sysadmin's Toolkit

If you're an admin for Linux servers that are going to be doing any real kind of work, you'll need to know how to make sure they're running right. You need to understand how the CPU, memory and disk get utilised by the OS, and to do that you need to know how to use a few essential tools and how to interpret the results.

I'll try and write this so admins coming from a Windows background can understand how Linux works compared to Windows.


CPU


There are 3 things you need to be concerned about with regards to how the system is performing from a processor point of view.

1. CPU utilisation percentage
2. CPU run queue (load)
3. CPU I/O wait

In Windows, you mostly are just concerned with CPU utilisation from just a single percentage figure with the maximum being 100%. This isn't really the whole story though.

In Linux, if you have 4 cores in total, the CPU utilisation will be shown as a percentage with the maximum 400%. That may seem strange to someone used to seeing 100% as the maximum but it actually makes more sense to add up the totals of each core and show you all the cores together.

The thing to understand about this is that CPU utilisation isn't actually how busy your system is. It's a part of it, but not the whole story. It's simply a representation of how long the CPU was seen as being busy over a time period. If the system looks at a CPU core for 10ms, and that core was busy for 2ms, it will be 20% busy. It will then sample the other 3 cores, and add those to the total. If they were also all busy for 2ms out of that 10, the total CPU utilisation of the system will be 80%, with the maximum being 400%.

We have a percentage of how busy the CPU is, why isn't that the whole story?


Well, if a CPU core is used by 1 process for 2ms out of 10ms, but for those 2ms there are also 5 other processes waiting to jump on that core and do stuff, a utilisation of 20% isn't really accurate is it? Because for those 2ms, the system is actually trying to do 5 times more than it actually can.


When you understand that both the CPU utilisation _and_ the CPU load are factors to be taken in conjunction with each other, you can interpret what the tools tell you.

top


top - 16:49:48 up 14 days,  6:18,  5 users,  load average: 2.75, 3.64, 3.87
Tasks: 315 total,   1 running, 314 sleeping,   0 stopped,   0 zombie
Cpu(s):  8.3%us,  1.6%sy,  0.0%ni, 86.4%id,  3.1%wa,  0.1%hi,  0.5%si,  0.0%st
Mem:  98871212k total, 81501412k used, 17369800k free,    50108k buffers
Swap:  9446212k total,    32700k used,  9413512k free,  7281528k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                                                                                                                                                           
32031 mysql    -10   0 71.4g  69g 6132 S   97 74.2  13958:45 mysqld                                                                                                                                                                                                         
28358 root      20   0 44176  15m 1280 D   63  0.0 210:01.71 mysqlbackup                                                                                                                                                                                                       
19749 root      20   0 69624  18m 3160 S    7  0.0   3:02.40 iotop                                                                                                                                                                                                             
 6183 root      RT   0  161m  37m  17m S    5  0.0   1188:46 aisexec                                                                                                                                                                                                           
 5397 root      39  19     0    0    0 S    1  0.0 241:39.12 kipmi0                                                                                                                                                                                                         
 2971 root      15  -5     0    0    0 S    0  0.0  65:52.74 kjournald                                                                                                                                                                                                         
    1 root      20   0  1064  392  324 S    0  0.0   0:16.52 init 


top is the standard age-old tool for quickly looking at what's going on. The system above has 8 cores, which are hyperthreaded, so I know that it has 16 logical processors available (generally found out from cat /proc/cpuinfo). When I look at the processes, the mysqld process is taking 97%, but that's from a maximum of _around_ 1600%.

Then, as I said above, we can also look at the system load, which is represented as load average. In the output above, I can see that the first figure of 2.75 is the average over the last 1 minute, 3.64 over the last 5 minutes and 3.87 over the last 15 minutes.

What do these figures mean?

While the system was sampling how much was running on each usable CPU core, it also looked at how many processes were waiting to run. Out of 16 queues, around 3 were waiting at any time, 1 process has taken 97% of 1600%, and another 63%. Therefore, actually, what looked like the system was fairly busy, really has a lot of room to get busier. Until we're consistently filling almost all of the queues (16 on this system), and the CPU utilisation is getting nearer a total of 1600%, we don't need to worry.

The following is a Munin graph of the same system. We can see that the Max idle is 1600, and we're nowhere near it.


And this graph shows the load average


Again, it backs up what we saw from top. We don't have to worry about the load on this system, and we know this by combining the utilisation and load average to see what's really going on.

But what about IO?

A 3rd variable comes into the mix which complicates it a little further, which is IO wait. If a process is running on a CPU core, but you have a slow IO subsystem (e.g. a slow disk, or a saturated fibre channel host bus adapter), the process can be waiting for an IO request to complete. This in turn increases the CPU utilisation and the load average.

If you're seeing high CPU usage and need to find out why, you can see if it's IO wait by using vmstat.

These figures are from a web server. You can see that the io column has no blocks in and a few blocks out now and again. The blocks out are likely to be log files being written, and as it's a web server, everything is already in memory and doesn't need to be read in. No IO issues here.


procs -----------memory---------- ---swap-- -----io---- -system-- -----cpu------
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 3  0    100 266688 302404 5135804    0    0     0     0 17822 24008 15  1 84  0  0
15  0    100 266532 302404 5135820    0    0     0   124 16510 24104 12  1 87  0  0
 0  0    100 265504 302404 5135848    0    0     0     0 18332 24488 17  2 82  0  0
 4  0    100 264312 302404 5135852    0    0     0     0 16986 23787 14  2 84  0  0
 6  0    100 265476 302404 5135864    0    0     0   344 16711 23948 15  1 83  1  0

This one is from a database server. You can see that the blocks in and blocks out (1 block is 1KB) is a lot larger, and as I ran this as vmstat 1 it's cycling every 1 second, so it was reading 30-50MB/s and writing 10-20MB/s.

procs -----------memory---------- ---swap-- -----io---- -system-- -----cpu------
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 3  0     33    879     57  24340    0    0 36108 15304 27169 42413 10  3 83  4  0
 2  1     33    852     57  24384    0    0 36576 15762 26833 40486  9  3 85  4  0
 2  0     33    780     57  24439    0    0 47296  9735 21587 33633  8  2 85  4  0
 0  1     33    721     57  24499    0    0 49496 19881 22993 36320  8  3 86  4  0
 4  0     33    683     57  24547    0    0 42176 13993 23573 36176  8  2 87  2  0
 5  2     33    632     57  24595    0    0 38748 10611 26785 41753 11  3 76 10  0
 4  0     33    584     57  24636    0    0 37636 12618 23149 36298 14  2 80  4  0
 6  0     33    551     57  24685    0    0 34060 13504 25268 39642 14  2 79  5  0
 3  0     33    481     57  24739    0    0 50360 10973 24150 37552 13  2 82  3  0

That's a lot of throughput. Is it affecting the CPU by waiting on IO? Well, the 'wa' column in 'cpu' are figures in a percentage of 100%, so the single digit figures compared to the 'id' (idle) column, it's not waiting on IO for very long at all. Therefore, this server is heavily utilised for IO, but it's not affecting CPU utilisation or system load due to having decent IO.

IO is a bit easier to see by using iostat, which gives you % utilised of your IO subsystem.

# iostat -x -d 1
Linux 2.6.27.29-0.1-default (xxxxxxx)      09/19/12        _x86_64_

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util
sda               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sda1              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sda2              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdb               0.00     0.00  349.00   71.00 92320.00 21136.00   270.13     0.99    2.35   1.04  43.60
sdc               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
dm-0            145.00  1795.00  349.00   71.00 92320.00 21136.00   270.13     1.15    2.77   1.07  44.80
dm-1              0.00     0.00  493.00 1857.00 91976.00 14856.00    45.46    30.01   13.64   0.19  44.40
sdd               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sde               0.00     0.00    0.00   14.00     0.00  9490.00   677.86     0.20   14.57   2.00   2.80
dm-2              0.00     0.00    0.00   14.00     0.00  9490.00   677.86     0.21   15.14   2.57   3.60
dm-3              0.00     0.00    0.00   14.00     0.00  9490.00   677.86     0.21   15.14   2.57   3.60

Even easier to use is iotop, a layer on top of iostat to make it more like a top style interface.



Finally, on to memory

Memory is really misunderstood in Linux. Unused memory is inefficient. Some people see the below and panic.

# free -m
             total       used       free     shared    buffers     cached
Mem:         96553      93561       2992          0         73      21044
-/+ buffers/cache:      72443      24110
Swap:         9224         31       9192

That's 93GB used of 96GB installed RAM in the server going by the Mem: row.

Wrong. The Linux kernel grabs as much memory as it can, leaving only a small amount unused and then dishes it out to applications which request it. Anything which isn't requested by an application is then utilised for buffers and caches, including the IO buffer. Read the values in the -/+ buffers/cache line. 24GB is free, and 72GB is used by applications. That's obviously still a lot, but this is a database server, and we want to give the database engine as much memory to cache stuff as possible.





Here's another one from a slightly more modest server:


# free -m
             total       used       free     shared    buffers     cached
Mem:           463        397         65          0        134        136
-/+ buffers/cache:        126        336
Swap:          475         10        465

463MB of RAM and only 65MB free?! Nope, 336GB free as the kernel hasn't needed to dish it out and has allocated it to buffers and caches.