Overview
When a Linux system exhausts its available memory (RAM and swap), the kernel invokes the OOM killer to terminate processes and prevent system instability. OOM logs provide critical insights into which processes were targeted and why.
1. Key Components of OOM Logs
OOM events generate logs in /var/log/messages
, syslog
, or dmesg
:
Mar 24 18:41:04 PLEDXDBOR0G kernel: Out of memory: Killed process 2475067 (postgres) total-vm:2484556kB, anon-rss:143224kB, file-rss:0kB, shmem-rss:452kB, UID:1011 pgtables:588kB oom_score_adj:900
Field | Description | |
---|---|---|
total-vm |
Total virtual memory allocated (includes unused/reserved memory). | |
anon-rss |
Non-swappable RAM (critical for OOM decisions, e.g., heap/stack). | |
pgtables |
Kernel overhead for memory mapping. | |
oom_score_adj |
OOM priority modifier (-1000 to 1000). Higher = More likely to be killed. |
2. Identifying OOM Events
grep -i 'oom-kill\|out of memory' /var/log/messages
dmesg | grep -i 'oom'
3. Interpreting Process-Specific OOM Log Entries
The kernel logs detailed per-process memory statistics during OOM evaluations. Below is a breakdown of the provided logs:
Sample logs
Mar 24 18:41:03 PLEDXDBOR0G kernel: [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name
Mar 24 18:41:03 PLEDXDBOR0G kernel: [ 64320] 1011 64320 47586463 5345805 128630784 547757 0 yb-tserver
Mar 24 18:41:03 PLEDXDBOR0G kernel: [ 69072] 1011 69072 602678 8365 360448 751 0 postgres
...
...
Mar 24 18:41:03 PLEDXDBOR0G kernel: [ 1660308] 1011 1660308 621139 24748 700416 527 900 postgres
Key Fields Explained:
Field | Description | Example (yb-tserver) |
---|---|---|
pid | Process ID. | 64320 |
uid | User ID owning the process. |
1011 (e.g., user or service account). |
tgid | Thread group ID (matches pid for single-threaded processes). |
64320 (main process, not a thread). |
total_vm | Total virtual memory allocated (in pages, 1 page = 4KB). |
47,586,463 pages ≈ 181.7GB1. |
rss | Resident Set Size: Physical RAM used (in pages). |
5,345,805 pages ≈ 20.4GB1. |
pgtables_bytes | Memory used for page tables (kernel overhead for memory mapping). |
128,630,784 bytes ≈ 122.7MB. |
swapents | Number of swap entries (pages moved to swap space). |
547,757 pages ≈ 2.1GB of swap. |
oom_score_adj | OOM priority adjustment (-1000 to 1000). Higher = More likely to be killed. |
0 (protected) vs. 900 (likely killed). |
name | Process name. |
yb-tserver , postgres . |
4. Analyzing Memory Consumption
- Use below command to find amount of memory used by each process. Credits
grep kernel /var/log/messages |
rev | cut -d"]" -f1 | rev |
awk '{ print $3, $4, $5, $8 }' |
grep '^[0-9].*[a-zA-Z][a-zA-Z]' |
awk '{db[$4]+=$2;} END {for (name in db) printf("%.1fG %s\n", (db[name]*4096)/1024/1024/1024, name)}' |
sort -gr | head -n 10
Output Example:
- This is sum of memory used by same process name, Meaning in below output, 11.6G memory is used by all postgres backends and postmaster.
20.4G yb-tserver
11.6G postgres
0.4G yb-master
0.4G ssh-servant-g3
0.4G ds_am
0.2G systemd-journal
0.2G java
0.1G ds_agent
0.1G cbdaemon
0.1G ECStateEngine
- yb-tserver used ~20.4GB of RAM (
rss
pages × 4KB / 1024³).
5. Why Was a Process Killed?
The OOM killer prioritizes processes based on:
-
oom_score: Combines
rss
,oom_score_adj
, process age, and hierarchy. -
oom_score_adj: A value of
900
(as in the sample) makes a process highly vulnerable.
6. Frequently Asked Questions (FAQs)
Q1: Why wasn’t the largest process (by RSS) killed?
- The OOM killer considers
oom_score
, not just RSS. A process with a loweroom_score_adj
(e.g.,0
foryb-tserver
) is protected, even if it uses more memory.
Q2: Why does postgres
have a high oom_score_adj
(900)?
- This is often intentional. Databases or critical services may have high
oom_score_adj
to prioritize their child processes or workers for termination, preserving the parent.
Comments
0 comments
Please sign in to leave a comment.