I decided to do some of roadmap.sh’s DevOps projects. The first one is server-stats.sh, wherein we write a script to analyze and display server performance stats

The requirements are as follows:

## Requirements

You are required to write a script `server-stats.sh` that can analyse basic server performance stats. You should be able to run the script on any Linux server and it should give you the following stats:

- Total CPU usage
- Total memory usage (Free vs Used including percentage)
- Total disk usage (Free vs Used including percentage)
- Top 5 processes by CPU usage
- Top 5 processes by memory usage

totalCPU

The first task is to return total CPU usage. Since we are looking at a snapshot of the system when the command runs, I decided to us ps as the basis for this (top was also an option, but it is designed to be used more interactively - although it can be invoked once with -bn, which is batch mode n times)

By itself, ps isn’t all that useful. It returns only information about the current terminal that invoked the command, so the information is fairly sparse. According to the man page, we can use -e (or, equally, -A) to select all processes. This returns a table with PID TTY TIME CMD as headers.

Now the trick is to find CPU usage for every process and sum it up. Fortunately man ps can also help with this: the -o option allows us to use a specific format for the output, one of which is pcpu, or “percent CPU”. Thus we have a list of CPU usage per process at the moment where we run the command.

So to get total CPU utilization we need to add the column. Simple enough: awk '{total += $1}END{print total}'. As a sanity check, I compared the output of that command with the idle time in top - they should roughly match up, right?

Wrong!

top was reporting idle time of about 98%, yet the output of my command was about 40%. Turns out the issue is that ps reports CPU usage per core as a percentage of a single core. So my computer has a 16-core CPU, and sure enough, diving the ps percentage by my core count accounted for the different: 40 / 16 = 2.5.

Of course, hard-coding my core count into a script isn’t great. Fortunately, awk supports variable assignment, which can be the output of a sub-shell such as nproc (this returns the number of processing units available). This, in the end, completed the totalCPU function:

totalCPU () {
    # diving by cores to get total CPU utilization
    ps -eo pcpu | awk -v cores=$(nproc) '{total += $1} END {print total / cores}'
}

totalMem

To get the total memory utilization of the system the free command comes in handy. Using awk again I reformatted the output of free to make it work with the script.

Note that beyond simply printing total and used memory I wanted to make sure to include cache. This is because without the cache used, the used and available memory do not add up to 100%.

The most complicated part of this, frankly, was learning how to define a function inside awk so that I could calculate the percentages without repeating myself over and over:

totalMem () {
    free -h | awk '
    function percentage(part, total) {
        return int(part / total * 100) "%"
    }
    NR==2 {
        print "Total:", $2, \
              "\nUsed:", $3, percentage($3, $2), \
              "\nFree:", $4, percentage($4, $2), \
              "\nCache:", $6, percentage($6, $2)
    }'
}