System Resources#

We'll use top to observe and explore some of the system statistics that measure system performance.

Let's fire it up: top

Note

It's going to launch into an app that will continuously update. Press Ctrl+S (XON) to stop the updating, allowing you to read the information. Use Ctrl+Q (XOFF) to resume the updates to the information (every three seconds by default.)

top - 00:20:54 up 40 min,  2 users,  load average: 0.00, 0.12, 0.16
Tasks:  92 total,   1 running,  91 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
MiB Mem :    978.6 total,    164.5 free,    152.0 used,    662.1 buff/cache
MiB Swap:      0.0 total,      0.0 free,      0.0 used.    668.9 avail Mem

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
      1 root      20   0  102968  12340   8148 S   0.0   1.2   0:04.70 systemd
      2 root      20   0       0      0      0 S   0.0   0.0   0:00.00 kthreadd
      3 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 rcu_gp
      4 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 rcu_par_gp
      6 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 kworker/0:0H-kblockd
      7 root      20   0       0      0      0 I   0.0   0.0   0:00.30 kworker/0:1-events
      9 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 mm_percpu_wq
     10 root      20   0       0      0      0 S   0.0   0.0   0:00.17 ksoftirqd/0
...

I've reduced my output to the top ten processes.

From top to bottom we can see the uptime of the system - how long it's been turned on and booted into Linux - at 40 min. We have 2 users logged into the system (me, twice), and we have a load average, which we'll cover later.

Then we have the tasks running at the time the snapshot (of the system) was taken: 92 total with 1 running, 91 sleeping, 0 stopped and 0 zombie (super-micro-project: Google for "Linux zombie process".)

Next, we have these rows:

1
2
3

%Cpu(s):  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
MiB Mem :    978.6 total,    164.5 free,    152.0 used,    662.1 buff/cache
MiB Swap:      0.0 total,      0.0 free,      0.0 used.    668.9 avail Mem

These represent CPU and memory usage on the system. The %Cpu(s) line is all the CPUs (cores) in the system represented in a single line. Checkout the manual page (man top), or press h inside of top to bring up the help menu, and workout how-to expand that line out to show each individual CPU or core (provided you have more than one.) We'll go over the meanings of us, sy, ni, etc., later on.

Memory is the final set of rows in the header. We have MiB Mem and MiB Swap. The former is the wired RAM in the system and we have 978.6 MiB of RAM total, across the entire system. We're using 152.0 MiB of it on software/processes. We have 164.5 MiB completely unused by anything on the system. And finally we have 662.1 MiB in use by buff/cache, which is the kernel "reserving" memory for various tasks (ahead of time), allowing new processes/threads to quickly start and start using the memory. It's therefore, technically, "free" RAM available for use, but the kernel is complicated and there are caveats to this.

Finally we've got a list of processes:

1 2	`PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 1 root 20 0 102968 12340 8148 S 0.0 1.2 0:04.70 systemd`

These columns breakdown the information about a process into the important stuff:

The PID is the process ID used to identify the process amongst all the processes on the system. USER is the user that ran the process. PID 1 is always run by the root user and is the parent of all other processes, like a tree like structure, with 1 being the root.

The PR column represents the scheduling priority of the process. When the kernel is scheduling processes/threads to run on the CPU, it uses a complex algorithm to determine which process goes next and which ones have to wait (because they've taken up a lot of time.) This can essentially be ignored for now, but it does have important when tuning the system for performance.

NI is the "nice" priority of the process and 20 is the lowest priority. -20 is the *highest. I don't know why they're that way around. That means this particular process - systemd - gets a default priority of 0, which all processes get by default on Linux systems.

VIRT is the amount of "virtual" memory that this process has requested, which is a combination of a lot of different things. It's more a holistic view of the task's memory usage.

RES is always less than the VIRT statistic as it's a sub-set of virtual memory. It represents the physical RAM being used by this task.

SHR is a sub-set of RES and represents the shared memory being used by a task.

The little S column is the status of the process/thread and can be one of the following:

D = uninterruptible sleep
I = idle
R = running
S = sleeping
T = stopped by job control signal
t = stopped by debugger during trace
Z = zombie

Research time: go online and find out wat each of these statuses mean. I managed to find a really good resource with a search for "linux process states".

Finally we have %CPU, %MEM, TIME+, COMMAND. The %CPU column in the percentage of time spend using the CPU. The %MEM is the total percentage of memory being used by the task.

The TIME+ column is interesting because it's quite complicated. It's the total amount of time spent by that process, that moment in time, but not in total. If you press Shift+S whilst top is running, you can switch the TIME+ column to operate as an accumulator of time, so it keeps ticking upwards. This isn't important today.

And the COMMAND column tells you the command that was used to create the process/task. If you press C whilst using top, it will expand the COMMAND out to include the full task, which leads us nicely onto a snapshot of top I've taken with some interesting details.

Artificial Load#

I'm going to apply some fake, artificial load to my system. It's technically real load on the CPU, RAM and disk, but it's fake in that it's not doing any productive.

I'm also going to do something in top to make the output more readable:

Run top
Press F to bring up the field selector
I'm going to use the Up and Down keys to select each column and use Space hide it
1. I'm going to hide the : PID, USER, PR, NI, RES, SHR, and TIME+ columns
Then I press Q to exit this screen and I get this the following...

top - 01:05:09 up  1:24,  2 users,  load average: 14.00, 12.87, 7.76
Tasks: 106 total,  17 running,  89 sleeping,   0 stopped,   0 zombie
%Cpu(s): 63.3 us, 36.7 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
MiB Mem :    978.6 total,    195.5 free,    277.3 used,    505.8 buff/cache
MiB Swap:      0.0 total,      0.0 free,      0.0 used.    544.0 avail Mem

   VIRT S  %CPU  %MEM COMMAND
   3856 R   7.6   0.0 stress --cpu 8 --io 4 --vm 2 --vm-bytes 128M
   3856 R   7.6   0.0 stress --cpu 8 --io 4 --vm 2 --vm-bytes 128M
   3856 R   7.6   0.0 stress --cpu 8 --io 4 --vm 2 --vm-bytes 128M
   3856 R   7.6   0.0 stress --cpu 8 --io 4 --vm 2 --vm-bytes 128M
   3856 R   7.6   0.0 stress --cpu 8 --io 4 --vm 2 --vm-bytes 128M
   3856 R   7.6   0.0 stress --cpu 8 --io 4 --vm 2 --vm-bytes 128M
   3856 R   7.6   0.0 stress --cpu 8 --io 4 --vm 2 --vm-bytes 128M
   3856 R   6.3   0.0 stress --cpu 8 --io 4 --vm 2 --vm-bytes 128M
   3856 R   6.3   0.0 stress --cpu 8 --io 4 --vm 2 --vm-bytes 128M
 134932 R   6.3  12.5 stress --cpu 8 --io 4 --vm 2 --vm-bytes 128M
   3856 R   6.3   0.0 stress --cpu 8 --io 4 --vm 2 --vm-bytes 128M
   3856 R   6.3   0.0 stress --cpu 8 --io 4 --vm 2 --vm-bytes 128M
   3856 R   6.3   0.0 stress --cpu 8 --io 4 --vm 2 --vm-bytes 128M
   3856 R   6.3   0.0 stress --cpu 8 --io 4 --vm 2 --vm-bytes 128M
  10988 R   1.3   0.4 top

That's much cleaner and it only shows me what I care about right now, but what are we looking at here?

Well we can see in the COMMAND column that there are a bunch of processes called stress in the R state (running). Some are using about 3,856 KiB worth of virtual memory and one of them is using 134,932 KiB of virtual memory - a lot more. There's a slight variation in the %CPU of each process, but the total amount of system wide CPU time can be calculated if you do some math: (7 * 7.6) + (7 * 6.3) which is 97.3 (%) of the CPU's total available time.

So, it's safe to say that this stress process is using a lot of our system's resources and if we're struggling to get other things done on the same system we should end this process. Obviously, however, this is just an example and I'll just close the stress process I have running in the background.

Next#

And that is the basics of using top to look at system performance, but top isn't showing use networking or disk statistics. We'll need other tools to get that information. Let's explore networking tools next.