Cycle Logo
  • Deploy anything, anywhere
  • Build your own private cloud
  • Eliminates DevOps sprawl

System Performance Monitoring Tools: Top, Htop, & Iostat

Keeping a system healthy is not just about fixing problems when they appear. It is about knowing what is happening under the hood at all times. That is where system performance monitoring comes in. By watching key metrics like CPU usage, memory consumption, disk I/O, and network activity, administrators can spot early warning signs before they turn into outages or degraded performance.

Most operating systems ship with Top, the classic command-line utility for viewing running processes. It is functional but limited. Over time, more powerful tools have emerged that give administrators a deeper view of system health. In this article, we will focus on two of the most widely used: Htop, an interactive and user-friendly alternative to Top, and Iostat, a specialized tool for monitoring I/O and CPU statistics.

Our goal is to give you a practical guide to these tools:

  • What they are and why they matter
  • How to install and set them up
  • Step-by-step examples of how to use them effectively
  • Pitfalls to avoid when interpreting their output
  • How they fit into modern monitoring practices alongside tools like Prometheus or Grafana

Whether you are running a busy web server, a mission-critical database, or just your personal workstation, having the right monitoring tools can mean the difference between smooth performance and hours of painful troubleshooting.

Understanding System Performance Monitoring

What is System Performance Monitoring?

System performance monitoring means continuously observing how your system behaves to ensure reliable and efficient operation under different workloads. Key metrics include:

  • CPU usage: How much processing power is consumed.
  • Memory usage: Available vs. consumed RAM.
  • Disk I/O: How fast data is read from or written to storage.
  • Network activity: Inbound and outbound traffic.

By combining these, administrators get a full picture of system health. For example, if the CPU seems busy but Iostat shows the CPU is mostly in %iowait, the real culprit may be slow storage, not lack of compute.

Importance of Monitoring Tools

Monitoring tools are critical for both proactive monitoring (spotting problems before they escalate) and reactive troubleshooting (figuring out what went wrong).

Example: An e-commerce site sees checkout slowdowns during traffic spikes. Without monitoring, engineers might blame the application code. With monitoring, they can see whether the slowdown comes from CPU saturation, memory leaks, or a storage bottleneck.

Monitoring tools also help establish baselines. A database that normally uses 40% CPU is fine — but if it suddenly jumps to 80% at idle times, you know something is off.

Installation & Setup

Before using Top, Htop, or Iostat, you need to install them.

Installing Top

Debian/Ubuntu

sudo apt update
sudo apt install top

RHEL/CentOS/Fedora

sudo yum install top        # On older systems
sudo dnf install top        # On newer systems

macOS (using Homebrew)

brew install top

Run with:

top

Installing Htop

Debian/Ubuntu

sudo apt update
sudo apt install htop

RHEL/CentOS/Fedora

sudo yum install htop        # On older systems
sudo dnf install htop        # On newer systems

macOS (using Homebrew)

brew install htop

Run with:

htop

Installing Iostat

Iostat is part of the sysstat package.

Debian/Ubuntu

sudo apt update
sudo apt install sysstat

RHEL/CentOS/Fedora

sudo yum install sysstat
sudo dnf install sysstat

macOS (using Homebrew)

brew install sysstat

Run with:

iostat

Top: the baseline monitor

You SSH into a production box that feels sluggish. You do not have package install rights, and you need a quick read on the system. You type:

top

The screen paints itself in two parts. A compact summary at the top tells you uptime, load average, CPU breakdown, and how memory is split between used, free, buffers, and cache. The scrolling table below is the live parade of processes that are competing for those resources.

14:24:18 up 5 days,  4:21,  3 users,  load average: 3.12, 2.47, 2.03
Tasks: 285 total,   2 running, 283 sleeping,   0 stopped,   0 zombie
%Cpu(s): 35.2 us,  7.8 sy,  0.0 ni, 54.3 id,  2.1 wa,  0.2 hi,  0.4 si,  0.0 st
MiB Mem :  32023.5 total,   1212.4 free,  21456.8 used,   9354.3 buff/cache
MiB Swap:   4095.0 total,   4078.8 free,     16.2 used.   9187.0 avail Mem
 
    PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
  12873 www-data  20   0  406516  38524  14920 S  78.6  0.1   2:31.45 php-fpm: pool www
   9871 postgres  20   0 1812452 512896  36024 S  52.3  1.6  18:44.90 postgres: writer process
  15422 ubuntu    20   0 1542036 312940  87244 S  31.1  1.0   9:02.33 python3 /srv/jobs/ingest.py
   6723 root      20   0 4235988 945632  41872 S  24.7  2.9  36:10.12 java -jar app.jar
   7310 root      20   0  167680  40284  12996 S  12.9  0.1   4:55.67 dockerd
   7441 root      20   0   29840  12924   7528 S   7.1  0.0   0:33.12 rsyslogd
  16341 ubuntu    20   0   21424   8400   5724 R   4.3  0.0   0:00.39 top
   2215 root      20   0  141504  25412  14488 S   2.1  0.1   1:22.07 systemd-journald
   3054 root      20   0  116544  18880  13208 S   1.2  0.1   0:18.51 sshd: ubuntu@pts/2
   9021 www-data  20   0  175312  22160  11208 S   0.9  0.1   0:05.48 nginx: worker process
   9019 www-data  20   0  175312  22080  11136 S   0.8  0.1   0:05.02 nginx: worker process
   1160 root      20   0   11632   7648   5620 S   0.3  0.0   0:03.11 cron

For the first 30 seconds you do not touch a thing. You watch the numbers settle. If the load average hovers well above the number of CPU cores, that hints at a queue forming. If memory free is tiny but cache is large, the kernel may simply be doing its job. If swap in and out counters tick up and the system feels sticky, you are likely paging.

Now you begin to steer. You press P to sort by CPU when a spike hits, then M to switch to memory when the spike does not explain client timeouts. You press 1 to expand the CPU line into per-core usage, useful on hosts where a single hot thread pins one core and leaves others idle. You press c to reveal full command lines, which turns a vague python into a clear python manage.py ingest. If you must end a process, you press k, type the PID that is highlighted, and confirm the default TERM signal.

Top is terse, which is part of its utility. You can land on a bare system and get a working mental model in under a minute. You learn whether the problem smells like runaway CPU, memory pressure, or something that Top cannot show directly, such as a disk that is too slow to answer.

There are limits. Top will not explain why %iowait is climbing or which device is saturated. It will not show an intuitive tree of parent and child processes. When the scene calls for richer interaction, you reach for Htop. When the mystery shifts to storage latency or device utilization, you run iostat to confirm or rule out an I/O bottleneck.

In the next section we use Htop to work the same kinds of incidents with a clearer view, faster navigation, and safer actions. After that, iostat gives the low-level truth about disks and I/O wait, which Top can only hint at.

Deep Dive into Htop

What is Htop?

Htop is an interactive process viewer for Unix-like systems. Think of it as “Top on steroids.” It provides:

  • A colorful, real-time display of CPU, memory, and swap usage.
  • Interactive navigation to scroll, filter, or search processes.
  • The ability to kill or renice processes directly without looking up PIDs.
  • Visibility into threads and multi-core usage.

Use cases include identifying CPU- or memory-heavy processes, monitoring load distribution across cores, and catching runaway processes quickly.

Usage Basics

Scenario: You're on call and a colleague says, “the web server is slow.”

You type:

htop

The top half shows colorful CPU bars. One core is maxed at 100%, the rest idle. In the process list, a Python process is hogging CPU.

Press F6 to sort by CPU usage — the culprit stays on top. Use arrow keys to highlight it, then press F9 to kill. The CPU bar drops, and the server steadies.

Key basics illustrated:

  • Launch with htop.
  • Sort with F6.
  • Kill with F9.
┌───────────────────────────────────────────────── System ─────────────────────────────────────────────────┐
 1  [|||||||||||||||||||||||||||           75%]   Tasks: 198, 1 running
 2  [|||||||||||||                          35%]   Load average: 1.24 0.92 0.67
 3  [|||||||||||||||||||||||||||||||||     88%]   Uptime: 12 days, 01:23:45
 4  [|||||||||                              22%]
 Mem[|||||||||||||||||||||||||||||   12.3G/31.3G]  Swp[|                     64.0M/4.0G]
└───────────────────────────────────────────────────────────────────────────────────────────────────────────┘
  PID  USER       PRI  NI   VIRT    RES    SHR S  CPU%  MEM%   TIME+   Command
12873 www-data     20   0 406.5M  37.6M  14.6M S  78.5   0.1   2:31.52 php-fpm: pool www
15422 ubuntu       20   0   1.5G 305.6M  85.2M S  31.0   1.0   9:02.54 python3 /srv/jobs/ingest.py
 9871 postgres     20   0   1.8G 500.9M  35.2M S  28.7   1.6  18:45.03 postgres: writer process
 6723 root         20   0   4.0G 923.4M  41.2M S  22.4   2.9  36:10.77 java -jar app.jar
 7310 root         20   0 163.8M  39.3M  12.7M S   7.0   0.1   4:55.91 dockerd
 9021 www-data     20   0 171.2M  21.4M  11.0M S   1.0   0.1   0:05.62 nginx: worker process
 9019 www-data     20   0 171.2M  21.3M  11.0M S   0.9   0.1   0:05.18 nginx: worker process
 2215 root         20   0 138.2M  24.8M  14.2M S   0.5   0.1   1:22.19 systemd-journald
 3054 root         20   0 113.8M  18.5M  13.0M S   0.3   0.1   0:18.58 sshd: ubuntu@pts/2
16341 ubuntu       20   0  21.1M   8.2M   5.5M R   0.2   0.0   0:00.41 htop

Customization & Advanced Features

Scenario: A server's memory is creeping upward, and you suspect a leak.

  • You sort by memory with F6 and select %MEM.
  • Press F4 to filter for the suspect service.
  • Toggle F5 (tree view) to see parent-child relationships.
  • Adjust refresh rate to capture changes more precisely.
  • Customize fields to include I/O activity and priority.

Over time, you notice one child process's memory usage growing without release. Htop lets you confirm the leak in real time.

Deep Dive into Iostat

What is Iostat?

Iostat (Input/Output Statistics) is part of the sysstat package. It answers two questions:

  1. Is the CPU spending more time waiting on I/O than doing work?
  2. Are disks or storage devices keeping up with demand?

Key Metrics in Iostat

MetricMeaning
%userCPU time spent in user space.
%systemCPU time in kernel space.
%iowaitCPU time waiting on disk I/O.
%idleCPU time with nothing to do.
r/s, w/sReads and writes per second.
kB_read/s, kB_wrtn/sThroughput per device.
awaitAverage time for I/O requests (ms).
%utilDevice busy percentage (saturation).

Usage Basics

Scenario: A database feels slow even though CPU looks fine.

Run:

iostat -xz 2 3

You see %util for /dev/sda at 100% and await over 180 ms. That means the disk is saturated, not the CPU.

Other useful forms:

  • iostat -c 1 5: CPU-only breakdown.
  • iostat -d 2 3: Device-only stats.
  • iostat -t -x 5: Extended stats with timestamps for logging.

Advanced Iostat in Action

Iostat becomes a true diagnostic tool when you apply advanced flags to real scenarios.

OptionPurposeExample Use
-cCPU stats onlySpot I/O wait vs. real CPU load.
-dDevice stats onlyFocus on disk throughput.
-xExtended statsAwait and utilization visibility.
-pPer-partition viewRAID imbalance or hot partitions.
-tTimestampsCorrelate stats with events.
-k/-mHuman-readable throughputInterpret numbers at a glance.
# Extended per-device stats, refresh every 2 seconds
iostat -xz 2
 
# CPU-only breakdown, helpful to confirm or rule out iowait
iostat -c 1 5
 
# Per-partition view for a single device
iostat -p sdc 2 5
 
# Extended stats with timestamps for correlation in logs
iostat -t -x 5

Fun Real World Scenarios

Scenario 1: CPU-bound single hot thread

What to notice: One process dominates CPU. Low %iowait indicates this is compute, not disk.

top snapshot

11:42:03 up 12 days,  6:55,  2 users,  load average: 3.98, 3.62, 2.11
Tasks: 291 total,   2 running, 289 sleeping,   0 stopped,   0 zombie
%Cpu(s): 84.1 us,  6.2 sy,  0.0 ni,  8.7 id,  0.4 wa,  0.1 hi,  0.5 si,  0.0 st
MiB Mem :  32023.5 total,   1911.4 free,  20580.7 used,   9529.9 buff/cache
MiB Swap:   4095.0 total,   4095.0 free,      0.0 used.   9931.2 avail Mem
 
    PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 215732 ubuntu    20   0  512.3m  92.8m  16.7m R  285.3  0.3   3:18.55 python3 /srv/jobs/transform.py
  16730 www-data  20   0  404.9m  36.4m  14.3m S   24.7  0.1   0:51.14 php-fpm: pool www
  26311 root      20   0    1.9g 902.1m  41.1m S   17.2  2.8  42:10.12 java -jar app.jar
  17455 postgres  20   0    1.8g 502.7m  35.4m S   12.0  1.5  19:44.93 postgres: writer process

htop snapshot

┌───────────────────────────────────────────────── System ─────────────────────────────────────────────────┐
 1  [|||||||||||||||||||||||||||||||||||||||||  100%]  Tasks: 291, 2 running
 2  [|||||                                         16%]  Load average: 3.98 3.62 2.11
 3  [||||||                                        22%]  Uptime: 12 days, 06:55:09
 4  [||||||                                        24%]
 Mem[||||||||||||||||||||||||||||   20.6G/31.3G]  Swp[                                        0/4.0G]
└───────────────────────────────────────────────────────────────────────────────────────────────────────────┘
  PID  USER       PRI  NI   VIRT    RES    SHR S  CPU%  MEM%   TIME+   Command
215732 ubuntu      20   0 512.3M  92.8M  16.7M R  285.2   0.3   3:18.7 python3 /srv/jobs/transform.py
 26311 root        20   0   1.9G 902.1M  41.1M S   18.1   2.8  42:10.9 java -jar app.jar
 17455 postgres    20   0   1.8G 502.7M  35.4M S   12.7   1.5  19:45.3 postgres: writer process
 16730 www-data    20   0 404.9M  36.4M  14.3M S    6.1   0.1   0:51.2 php-fpm: pool www

iostat CPU-only check (confirms low I/O wait)

$ iostat -c 1 3
Linux 5.15.0-78-generic (host)  09/16/2025  _x86_64_ (8 CPU)
 
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          82.37    0.00    6.51    0.92    0.00   10.20
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          83.10    0.00    6.20    0.88    0.00    9.82
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          84.05    0.00    6.01    0.79    0.00    9.15

Scenario 2: Memory leak and growing RSS

What to notice: One process climbs in %MEM, free memory shrinks, swap begins to be used.

top snapshot

02:11:44 up 21 days,  3:02,  1 user,  load average: 0.91, 0.77, 0.61
Tasks: 203 total,   1 running, 202 sleeping,   0 stopped,   0 zombie
%Cpu(s):  7.2 us,  3.3 sy,  0.0 ni, 87.6 id,  1.6 wa,  0.0 hi,  0.3 si,  0.0 st
MiB Mem :  64220.0 total,   1222.1 free,  55841.3 used,   7156.6 buff/cache
MiB Swap:  16384.0 total,   8196.7 free,   8187.3 used.   2742.5 avail Mem
 
    PID USER      PR  NI    VIRT      RES    SHR S  %CPU %MEM     TIME+ COMMAND
  44210 app       20   0   12.3g     9.8g  126.2m S   5.7 15.6  211:44.19 node /srv/app/index.js
  11873 root      20   0  904.2m   322.7m   42.1m S   2.0  0.5   10:12.77 java -jar worker.jar
   9123 postgres  20   0    1.8g   512.1m   35.9m S   1.3  0.8   86:33.52 postgres: wal writer

htop snapshot

┌───────────────────────────────────────────────── System ─────────────────────────────────────────────────┐
 1  [|||                                            11%]  Tasks: 203, 1 running
 2  [||                                              7%]  Load average: 0.91 0.77 0.61
 3  [||                                              6%]  Uptime: 21 days, 03:02:11
 4  [|                                               4%]
 Mem[||||||||||||||||||||||||||||||||||||||||  54.6G/62.7G]  Swp[|||||||||        8.0G/16.0G]
└───────────────────────────────────────────────────────────────────────────────────────────────────────────┘
  PID  USER       PRI  NI   VIRT     RES     SHR S  CPU%  MEM%    TIME+   Command
44210  app         20   0  12.3G   9.8G  126.2M S   5.7   15.6  211:44.4 node /srv/app/index.js
11873  root        20   0 904.2M 322.7M  42.1M  S   2.0    0.5   10:12.8 java -jar worker.jar
 9123  postgres    20   0   1.8G 512.1M  35.9M  S   1.3    0.8   86:33.6 postgres: wal writer

(No iostat needed here. Disk is fine. The problem is memory growth in one process.)

Scenario 3: Disk bottleneck with high I/O wait

What to notice: High %wa in top. iostat -xz shows long await and %util near 100 percent on a device.

top snapshot

16:05:27 up 8 days,  9:00,  4 users,  load average: 7.41, 6.88, 5.95
Tasks: 317 total,   1 running, 316 sleeping,   0 stopped,   0 zombie
%Cpu(s): 12.4 us,  5.1 sy,  0.0 ni, 49.8 id, 31.9 wa,  0.2 hi,  0.6 si,  0.0 st
MiB Mem :  32023.5 total,   1420.6 free,  22411.9 used,   9191.0 buff/cache
MiB Swap:   4095.0 total,   3902.7 free,    192.3 used.   8032.2 avail Mem
 
    PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
  30121 postgres  20   0    1.9g  934.8m  52.1m S   8.7  2.8  10:12.44 postgres: checkpointer
   9871 postgres  20   0    1.8g  521.1m  36.1m S   7.9  1.6  23:44.90 postgres: writer process
   7441 root      20   0   298.4m 129.4m   7.5m S   3.2  0.4   3:33.12 rsyslogd

iostat -xz 2 3 snapshot

$ iostat -xz 2 3
Linux 5.15.0-78-generic (host)  09/16/2025  _x86_64_ (8 CPU)
 
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          13.11    0.00    4.92   30.88    0.00   51.09
 
Device            r/s     w/s   rkB/s   wkB/s  rrqm/s  wrqm/s  r_await  w_await  aqu-sz  rareq-sz  wareq-sz  %util
sda              12.3   185.4   864.2 24356.1     0.1     1.2     25.0   178.4     2.1     70.2     131.3    99.4
sdb               0.2     0.3    10.7     4.5     0.0     0.0      1.9     2.1     0.00    49.0      15.0     0.3
dm-0              0.0     0.0     0.0     0.0     0.0     0.0      0.0     0.0     0.00     0.0       0.0     0.0

Interpretation hint: sda has w_await ~178 ms and %util ~99 percent. The disk cannot keep up with writes. Consider faster storage, batching, or moving write-heavy components.

Scenario 4: Per-partition or RAID imbalance

What to notice: One member of an array is saturated while others are fine.

iostat -p sdc 2 3 snapshot

$ iostat -p sdc 2 3
Linux 5.15.0-78-generic (host)  09/16/2025  _x86_64_ (8 CPU)
 
Device            r/s     w/s   rkB/s   wkB/s  r_await  w_await  aqu-sz  rareq-sz  wareq-sz  %util
sdc              18.2   145.7  1250.1 18742.6    22.9    95.3      1.7      68.7     128.6   95.8
sdc1              0.3     1.1    11.9    96.3     1.7     2.4      0.00     39.7      87.0    0.4
sdc2             17.8   144.6  1238.1 18646.1    23.2    96.1      1.7      69.5     129.0   95.4

Comparison device (healthy peer)

$ iostat -xz 2 1 | grep sdd
sdd               2.1     3.7   156.0   492.1     0.0     0.1      2.1     2.6     0.01     74.3     132.9    5.3

Interpretation hint: sdc sits near 95 to 96 percent utilization while a peer like sdd idles near 5 percent. Expect a failing or throttled device, misbalanced RAID, or a hot partition.

Scenario 5: Healthy storage baseline for comparison

What to notice: Low await, low %util, balanced reads and writes. Useful as a “known good” reference.

iostat -xz 1 3 snapshot

$ iostat -xz 1 3
Linux 5.15.0-78-generic (host)  09/16/2025  _x86_64_ (8 CPU)
 
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          12.88    0.00    3.77    0.62    0.00   82.73
 
Device            r/s     w/s   rkB/s   wkB/s  rrqm/s  wrqm/s  r_await  w_await  aqu-sz  rareq-sz  wareq-sz  %util
nvme0n1          22.4    18.6  2850.7  4120.9     0.0     0.0      1.1     1.4     0.05    127.2     221.6    3.2
nvme1n1          20.9    21.4  2749.1  4387.5     0.0     0.0      1.0     1.2     0.05    131.6     205.1    3.0

Bringing It All Together

Htop and Iostat complement each other. One shows interactive, process-level activity. The other diagnoses storage and CPU balance. Used together, they let you answer both “what is eating resources now?” and “is the system bottlenecked at the disks?”

Comparison at a Glance

ToolBest ForKey StrengthsLimitations
HtopProcess monitoringColor-coded, interactive, easy to kill/renice processesNo I/O visibility, not built for logging
IostatCPU + I/O analysisExtended stats, reveals bottlenecksStatic output, requires interpretation
TogetherEnd-to-end troubleshootingProcess + device visibilityStill snapshots; long-term monitoring needed

Top, htop, and iostat are like stethoscopes. They are fast, direct, and essential for real-time diagnosis. Modern stacks like Prometheus + Grafana are like medical charts — they show long-term history and trends. Both matter, but when something breaks now, Htop and Iostat are the tools you'll be glad you know.

🍪 Help Us Improve Our Site

We use first-party cookies to keep the site fast and secure, see which pages need improved, and remember little things to make your experience better. For more information, read our Privacy Policy.