System Performance Monitoring Tools: Top, Htop, & Iostat
Keeping a system healthy is not just about fixing problems when they appear. It is about knowing what is happening under the hood at all times. That is where system performance monitoring comes in. By watching key metrics like CPU usage, memory consumption, disk I/O, and network activity, administrators can spot early warning signs before they turn into outages or degraded performance.
Most operating systems ship with Top, the classic command-line utility for viewing running processes. It is functional but limited. Over time, more powerful tools have emerged that give administrators a deeper view of system health. In this article, we will focus on two of the most widely used: Htop, an interactive and user-friendly alternative to Top, and Iostat, a specialized tool for monitoring I/O and CPU statistics.
Our goal is to give you a practical guide to these tools:
- What they are and why they matter
- How to install and set them up
- Step-by-step examples of how to use them effectively
- Pitfalls to avoid when interpreting their output
- How they fit into modern monitoring practices alongside tools like Prometheus or Grafana
Whether you are running a busy web server, a mission-critical database, or just your personal workstation, having the right monitoring tools can mean the difference between smooth performance and hours of painful troubleshooting.
Understanding System Performance Monitoring
What is System Performance Monitoring?
System performance monitoring means continuously observing how your system behaves to ensure reliable and efficient operation under different workloads. Key metrics include:
- CPU usage: How much processing power is consumed.
- Memory usage: Available vs. consumed RAM.
- Disk I/O: How fast data is read from or written to storage.
- Network activity: Inbound and outbound traffic.
By combining these, administrators get a full picture of system health. For example, if the CPU seems busy but Iostat shows the CPU is mostly in %iowait
, the real culprit may be slow storage, not lack of compute.
Importance of Monitoring Tools
Monitoring tools are critical for both proactive monitoring (spotting problems before they escalate) and reactive troubleshooting (figuring out what went wrong).
Example: An e-commerce site sees checkout slowdowns during traffic spikes. Without monitoring, engineers might blame the application code. With monitoring, they can see whether the slowdown comes from CPU saturation, memory leaks, or a storage bottleneck.
Monitoring tools also help establish baselines. A database that normally uses 40% CPU is fine — but if it suddenly jumps to 80% at idle times, you know something is off.
Installation & Setup
Before using Top, Htop, or Iostat, you need to install them.
Installing Top
Debian/Ubuntu
sudo apt update
sudo apt install top
RHEL/CentOS/Fedora
sudo yum install top # On older systems
sudo dnf install top # On newer systems
macOS (using Homebrew)
brew install top
Run with:
top
Installing Htop
Debian/Ubuntu
sudo apt update
sudo apt install htop
RHEL/CentOS/Fedora
sudo yum install htop # On older systems
sudo dnf install htop # On newer systems
macOS (using Homebrew)
brew install htop
Run with:
htop
Installing Iostat
Iostat is part of the sysstat
package.
Debian/Ubuntu
sudo apt update
sudo apt install sysstat
RHEL/CentOS/Fedora
sudo yum install sysstat
sudo dnf install sysstat
macOS (using Homebrew)
brew install sysstat
Run with:
iostat
Top: the baseline monitor
You SSH into a production box that feels sluggish. You do not have package install rights, and you need a quick read on the system. You type:
top
The screen paints itself in two parts. A compact summary at the top tells you uptime, load average, CPU breakdown, and how memory is split between used, free, buffers, and cache. The scrolling table below is the live parade of processes that are competing for those resources.
14:24:18 up 5 days, 4:21, 3 users, load average: 3.12, 2.47, 2.03
Tasks: 285 total, 2 running, 283 sleeping, 0 stopped, 0 zombie
%Cpu(s): 35.2 us, 7.8 sy, 0.0 ni, 54.3 id, 2.1 wa, 0.2 hi, 0.4 si, 0.0 st
MiB Mem : 32023.5 total, 1212.4 free, 21456.8 used, 9354.3 buff/cache
MiB Swap: 4095.0 total, 4078.8 free, 16.2 used. 9187.0 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
12873 www-data 20 0 406516 38524 14920 S 78.6 0.1 2:31.45 php-fpm: pool www
9871 postgres 20 0 1812452 512896 36024 S 52.3 1.6 18:44.90 postgres: writer process
15422 ubuntu 20 0 1542036 312940 87244 S 31.1 1.0 9:02.33 python3 /srv/jobs/ingest.py
6723 root 20 0 4235988 945632 41872 S 24.7 2.9 36:10.12 java -jar app.jar
7310 root 20 0 167680 40284 12996 S 12.9 0.1 4:55.67 dockerd
7441 root 20 0 29840 12924 7528 S 7.1 0.0 0:33.12 rsyslogd
16341 ubuntu 20 0 21424 8400 5724 R 4.3 0.0 0:00.39 top
2215 root 20 0 141504 25412 14488 S 2.1 0.1 1:22.07 systemd-journald
3054 root 20 0 116544 18880 13208 S 1.2 0.1 0:18.51 sshd: ubuntu@pts/2
9021 www-data 20 0 175312 22160 11208 S 0.9 0.1 0:05.48 nginx: worker process
9019 www-data 20 0 175312 22080 11136 S 0.8 0.1 0:05.02 nginx: worker process
1160 root 20 0 11632 7648 5620 S 0.3 0.0 0:03.11 cron
For the first 30 seconds you do not touch a thing. You watch the numbers settle. If the load average hovers well above the number of CPU cores, that hints at a queue forming. If memory free is tiny but cache is large, the kernel may simply be doing its job. If swap in and out counters tick up and the system feels sticky, you are likely paging.
Now you begin to steer. You press P to sort by CPU when a spike hits, then M to switch to memory when the spike does not explain client timeouts. You press 1 to expand the CPU line into per-core usage, useful on hosts where a single hot thread pins one core and leaves others idle. You press c to reveal full command lines, which turns a vague python into a clear python manage.py ingest. If you must end a process, you press k, type the PID that is highlighted, and confirm the default TERM signal.
Top is terse, which is part of its utility. You can land on a bare system and get a working mental model in under a minute. You learn whether the problem smells like runaway CPU, memory pressure, or something that Top cannot show directly, such as a disk that is too slow to answer.
There are limits. Top will not explain why %iowait is climbing or which device is saturated. It will not show an intuitive tree of parent and child processes. When the scene calls for richer interaction, you reach for Htop. When the mystery shifts to storage latency or device utilization, you run iostat to confirm or rule out an I/O bottleneck.
In the next section we use Htop to work the same kinds of incidents with a clearer view, faster navigation, and safer actions. After that, iostat gives the low-level truth about disks and I/O wait, which Top can only hint at.
Deep Dive into Htop
What is Htop?
Htop is an interactive process viewer for Unix-like systems. Think of it as “Top on steroids.” It provides:
- A colorful, real-time display of CPU, memory, and swap usage.
- Interactive navigation to scroll, filter, or search processes.
- The ability to kill or renice processes directly without looking up PIDs.
- Visibility into threads and multi-core usage.
Use cases include identifying CPU- or memory-heavy processes, monitoring load distribution across cores, and catching runaway processes quickly.
Usage Basics
Scenario: You're on call and a colleague says, “the web server is slow.”
You type:
htop
The top half shows colorful CPU bars. One core is maxed at 100%, the rest idle. In the process list, a Python process is hogging CPU.
Press F6 to sort by CPU usage — the culprit stays on top. Use arrow keys to highlight it, then press F9 to kill. The CPU bar drops, and the server steadies.
Key basics illustrated:
- Launch with
htop
. - Sort with F6.
- Kill with F9.
┌───────────────────────────────────────────────── System ─────────────────────────────────────────────────┐
│ 1 [||||||||||||||||||||||||||| 75%] Tasks: 198, 1 running │
│ 2 [||||||||||||| 35%] Load average: 1.24 0.92 0.67 │
│ 3 [||||||||||||||||||||||||||||||||| 88%] Uptime: 12 days, 01:23:45 │
│ 4 [||||||||| 22%] │
│ Mem[||||||||||||||||||||||||||||| 12.3G/31.3G] Swp[| 64.0M/4.0G] │
└──────────────────────────── ───────────────────────────────────────────────────────────────────────────────┘
PID USER PRI NI VIRT RES SHR S CPU% MEM% TIME+ Command
12873 www-data 20 0 406.5M 37.6M 14.6M S 78.5 0.1 2:31.52 php-fpm: pool www
15422 ubuntu 20 0 1.5G 305.6M 85.2M S 31.0 1.0 9:02.54 python3 /srv/jobs/ingest.py
9871 postgres 20 0 1.8G 500.9M 35.2M S 28.7 1.6 18:45.03 postgres: writer process
6723 root 20 0 4.0G 923.4M 41.2M S 22.4 2.9 36:10.77 java -jar app.jar
7310 root 20 0 163.8M 39.3M 12.7M S 7.0 0.1 4:55.91 dockerd
9021 www-data 20 0 171.2M 21.4M 11.0M S 1.0 0.1 0:05.62 nginx: worker process
9019 www-data 20 0 171.2M 21.3M 11.0M S 0.9 0.1 0:05.18 nginx: worker process
2215 root 20 0 138.2M 24.8M 14.2M S 0.5 0.1 1:22.19 systemd-journald
3054 root 20 0 113.8M 18.5M 13.0M S 0.3 0.1 0:18.58 sshd: ubuntu@pts/2
16341 ubuntu 20 0 21.1M 8.2M 5.5M R 0.2 0.0 0:00.41 htop
Customization & Advanced Features
Scenario: A server's memory is creeping upward, and you suspect a leak.
- You sort by memory with F6 and select
%MEM
. - Press F4 to filter for the suspect service.
- Toggle F5 (tree view) to see parent-child relationships.
- Adjust refresh rate to capture changes more precisely.
- Customize fields to include I/O activity and priority.
Over time, you notice one child process's memory usage growing without release. Htop lets you confirm the leak in real time.
Deep Dive into Iostat
What is Iostat?
Iostat (Input/Output Statistics) is part of the sysstat
package. It answers two questions:
- Is the CPU spending more time waiting on I/O than doing work?
- Are disks or storage devices keeping up with demand?
Key Metrics in Iostat
Metric | Meaning |
---|---|
%user | CPU time spent in user space. |
%system | CPU time in kernel space. |
%iowait | CPU time waiting on disk I/O. |
%idle | CPU time with nothing to do. |
r/s, w/s | Reads and writes per second. |
kB_read/s, kB_wrtn/s | Throughput per device. |
await | Average time for I/O requests (ms). |
%util | Device busy percentage (saturation). |
Usage Basics
Scenario: A database feels slow even though CPU looks fine.
Run:
iostat -xz 2 3
You see %util
for /dev/sda
at 100% and await
over 180 ms. That means the disk is saturated, not the CPU.
Other useful forms:
iostat -c 1 5
: CPU-only breakdown.iostat -d 2 3
: Device-only stats.iostat -t -x 5
: Extended stats with timestamps for logging.
Advanced Iostat in Action
Iostat becomes a true diagnostic tool when you apply advanced flags to real scenarios.
Option | Purpose | Example Use |
---|---|---|
-c | CPU stats only | Spot I/O wait vs. real CPU load. |
-d | Device stats only | Focus on disk throughput. |
-x | Extended stats | Await and utilization visibility. |
-p | Per-partition view | RAID imbalance or hot partitions. |
-t | Timestamps | Correlate stats with events. |
-k/-m | Human-readable throughput | Interpret numbers at a glance. |
# Extended per-device stats, refresh every 2 seconds
iostat -xz 2
# CPU-only breakdown, helpful to confirm or rule out iowait
iostat -c 1 5
# Per-partition view for a single device
iostat -p sdc 2 5
# Extended stats with timestamps for correlation in logs
iostat -t -x 5
Fun Real World Scenarios
Scenario 1: CPU-bound single hot thread
What to notice: One process dominates CPU. Low %iowait
indicates this is compute, not disk.
top
snapshot
11:42:03 up 12 days, 6:55, 2 users, load average: 3.98, 3.62, 2.11
Tasks: 291 total, 2 running, 289 sleeping, 0 stopped, 0 zombie
%Cpu(s): 84.1 us, 6.2 sy, 0.0 ni, 8.7 id, 0.4 wa, 0.1 hi, 0.5 si, 0.0 st
MiB Mem : 32023.5 total, 1911.4 free, 20580.7 used, 9529.9 buff/cache
MiB Swap: 4095.0 total, 4095.0 free, 0.0 used. 9931.2 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
215732 ubuntu 20 0 512.3m 92.8m 16.7m R 285.3 0.3 3:18.55 python3 /srv/jobs/transform.py
16730 www-data 20 0 404.9m 36.4m 14.3m S 24.7 0.1 0:51.14 php-fpm: pool www
26311 root 20 0 1.9g 902.1m 41.1m S 17.2 2.8 42:10.12 java -jar app.jar
17455 postgres 20 0 1.8g 502.7m 35.4m S 12.0 1.5 19:44.93 postgres: writer process
htop
snapshot
┌───────────────────────────────────────────────── System ─────────────────────────────────────────────────┐
│ 1 [||||||||||||||||||||||||||||||||||||||||| 100%] Tasks: 291, 2 running │
│ 2 [||||| 16%] Load average: 3.98 3.62 2.11 │
│ 3 [|||||| 22%] Uptime: 12 days, 06:55:09 │
│ 4 [|||||| 24%] │
│ Mem[|||||||||||||||||||||||||||| 20.6G/31.3G] Swp[ 0/4.0G] │
└───────────────────────────────────────────────────────────────────────────────────────────────────────────┘
PID USER PRI NI VIRT RES SHR S CPU% MEM% TIME+ Command
215732 ubuntu 20 0 512.3M 92.8M 16.7M R 285.2 0.3 3:18.7 python3 /srv/jobs/transform.py
26311 root 20 0 1.9G 902.1M 41.1M S 18.1 2.8 42:10.9 java -jar app.jar
17455 postgres 20 0 1.8G 502.7M 35.4M S 12.7 1.5 19:45.3 postgres: writer process
16730 www-data 20 0 404.9M 36.4M 14.3M S 6.1 0.1 0:51.2 php-fpm: pool www
iostat
CPU-only check (confirms low I/O wait)
$ iostat -c 1 3
Linux 5.15.0-78-generic (host) 09/16/2025 _x86_64_ (8 CPU)
avg-cpu: %user %nice %system %iowait %steal %idle
82.37 0.00 6.51 0.92 0.00 10.20
avg-cpu: %user %nice %system %iowait %steal %idle
83.10 0.00 6.20 0.88 0.00 9.82
avg-cpu: %user %nice %system %iowait %steal %idle
84.05 0.00 6.01 0.79 0.00 9.15
Scenario 2: Memory leak and growing RSS
What to notice: One process climbs in %MEM
, free memory shrinks, swap begins to be used.
top
snapshot
02:11:44 up 21 days, 3:02, 1 user, load average: 0.91, 0.77, 0.61
Tasks: 203 total, 1 running, 202 sleeping, 0 stopped, 0 zombie
%Cpu(s): 7.2 us, 3.3 sy, 0.0 ni, 87.6 id, 1.6 wa, 0.0 hi, 0.3 si, 0.0 st
MiB Mem : 64220.0 total, 1222.1 free, 55841.3 used, 7156.6 buff/cache
MiB Swap: 16384.0 total, 8196.7 free, 8187.3 used. 2742.5 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
44210 app 20 0 12.3g 9.8g 126.2m S 5.7 15.6 211:44.19 node /srv/app/index.js
11873 root 20 0 904.2m 322.7m 42.1m S 2.0 0.5 10:12.77 java -jar worker.jar
9123 postgres 20 0 1.8g 512.1m 35.9m S 1.3 0.8 86:33.52 postgres: wal writer
htop
snapshot
┌───────────────────────────────────────────────── System ─────────────────────────────────────────────────┐
│ 1 [||| 11%] Tasks: 203, 1 running │
│ 2 [|| 7%] Load average: 0.91 0.77 0.61 │
│ 3 [|| 6%] Uptime: 21 days, 03:02:11 │
│ 4 [| 4%] │
│ Mem[|||||||||||||||||||||||||||||||||||||||| 54.6G/62.7G] Swp[||||||||| 8.0G/16.0G] │
└───────────────────────────────────────────────────────────────────────────────────────────────────────────┘
PID USER PRI NI VIRT RES SHR S CPU% MEM% TIME+ Command
44210 app 20 0 12.3G 9.8G 126.2M S 5.7 15.6 211:44.4 node /srv/app/index.js
11873 root 20 0 904.2M 322.7M 42.1M S 2.0 0.5 10:12.8 java -jar worker.jar
9123 postgres 20 0 1.8G 512.1M 35.9M S 1.3 0.8 86:33.6 postgres: wal writer
(No iostat
needed here. Disk is fine. The problem is memory growth in one process.)
Scenario 3: Disk bottleneck with high I/O wait
What to notice: High %wa
in top
. iostat -xz
shows long await
and %util
near 100 percent on a device.
top
snapshot
16:05:27 up 8 days, 9:00, 4 users, load average: 7.41, 6.88, 5.95
Tasks: 317 total, 1 running, 316 sleeping, 0 stopped, 0 zombie
%Cpu(s): 12.4 us, 5.1 sy, 0.0 ni, 49.8 id, 31.9 wa, 0.2 hi, 0.6 si, 0.0 st
MiB Mem : 32023.5 total, 1420.6 free, 22411.9 used, 9191.0 buff/cache
MiB Swap: 4095.0 total, 3902.7 free, 192.3 used. 8032.2 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
30121 postgres 20 0 1.9g 934.8m 52.1m S 8.7 2.8 10:12.44 postgres: checkpointer
9871 postgres 20 0 1.8g 521.1m 36.1m S 7.9 1.6 23:44.90 postgres: writer process
7441 root 20 0 298.4m 129.4m 7.5m S 3.2 0.4 3:33.12 rsyslogd
iostat -xz 2 3
snapshot
$ iostat -xz 2 3
Linux 5.15.0-78-generic (host) 09/16/2025 _x86_64_ (8 CPU)
avg-cpu: %user %nice %system %iowait %steal %idle
13.11 0.00 4.92 30.88 0.00 51.09
Device r/s w/s rkB/s wkB/s rrqm/s wrqm/s r_await w_await aqu-sz rareq-sz wareq-sz %util
sda 12.3 185.4 864.2 24356.1 0.1 1.2 25.0 178.4 2.1 70.2 131.3 99.4
sdb 0.2 0.3 10.7 4.5 0.0 0.0 1.9 2.1 0.00 49.0 15.0 0.3
dm-0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.00 0.0 0.0 0.0
Interpretation hint: sda
has w_await
~178 ms and %util
~99 percent. The disk cannot keep up with writes. Consider faster storage, batching, or moving write-heavy components.
Scenario 4: Per-partition or RAID imbalance
What to notice: One member of an array is saturated while others are fine.
iostat -p sdc 2 3
snapshot
$ iostat -p sdc 2 3
Linux 5.15.0-78-generic (host) 09/16/2025 _x86_64_ (8 CPU)
Device r/s w/s rkB/s wkB/s r_await w_await aqu-sz rareq-sz wareq-sz %util
sdc 18.2 145.7 1250.1 18742.6 22.9 95.3 1.7 68.7 128.6 95.8
sdc1 0.3 1.1 11.9 96.3 1.7 2.4 0.00 39.7 87.0 0.4
sdc2 17.8 144.6 1238.1 18646.1 23.2 96.1 1.7 69.5 129.0 95.4
Comparison device (healthy peer)
$ iostat -xz 2 1 | grep sdd
sdd 2.1 3.7 156.0 492.1 0.0 0.1 2.1 2.6 0.01 74.3 132.9 5.3
Interpretation hint: sdc
sits near 95 to 96 percent utilization while a peer like sdd
idles near 5 percent. Expect a failing or throttled device, misbalanced RAID, or a hot partition.
Scenario 5: Healthy storage baseline for comparison
What to notice: Low await
, low %util
, balanced reads and writes. Useful as a “known good” reference.
iostat -xz 1 3
snapshot
$ iostat -xz 1 3
Linux 5.15.0-78-generic (host) 09/16/2025 _x86_64_ (8 CPU)
avg-cpu: %user %nice %system %iowait %steal %idle
12.88 0.00 3.77 0.62 0.00 82.73
Device r/s w/s rkB/s wkB/s rrqm/s wrqm/s r_await w_await aqu-sz rareq-sz wareq-sz %util
nvme0n1 22.4 18.6 2850.7 4120.9 0.0 0.0 1.1 1.4 0.05 127.2 221.6 3.2
nvme1n1 20.9 21.4 2749.1 4387.5 0.0 0.0 1.0 1.2 0.05 131.6 205.1 3.0
Bringing It All Together
Htop and Iostat complement each other. One shows interactive, process-level activity. The other diagnoses storage and CPU balance. Used together, they let you answer both “what is eating resources now?” and “is the system bottlenecked at the disks?”
Comparison at a Glance
Tool | Best For | Key Strengths | Limitations |
---|---|---|---|
Htop | Process monitoring | Color-coded, interactive, easy to kill/renice processes | No I/O visibility, not built for logging |
Iostat | CPU + I/O analysis | Extended stats, reveals bottlenecks | Static output, requires interpretation |
Together | End-to-end troubleshooting | Process + device visibility | Still snapshots; long-term monitoring needed |
Top, htop, and iostat are like stethoscopes. They are fast, direct, and essential for real-time diagnosis. Modern stacks like Prometheus + Grafana are like medical charts — they show long-term history and trends. Both matter, but when something breaks now, Htop and Iostat are the tools you'll be glad you know.