OSWatcher & System Health Monitor (SHM)

During a recent investigation, I wanted to dig a bit into the operating system (OS) Metrics. A creature of habit, I immediately started pulling the OSWatcher data. But I soon realized I already had another tool available — System Health Monitor (SHM), bundled with Oracle Autonomous Health Framework (AHF). Since AHF already integrates SHM, I did not need to configure anything additional and simply pulled the SHM report. But what I was really curious about was – what has changed, and how does this telemetry data differ from the age-old OSWatcher?

OS Monitoring with OSWatcher

AHF now bundles OSWatcher, and in most installations, the system already installs and runs it by default. We can use it constantly to go back in time and review the captured ‘snapshots’ of OS metrics. OSWatcher takes a snapshot using utilities like iostat, vmstat, top, etc, and stores the output in files.

Now when you start oswatcher, you get a warning.

We have shipped a new generation of Operating system monitoring feature called System Health Monitor(SHM) in AHF 24.6. we would recommend you to start using System Health Monitor(SHM) data for Operating system resource monitoring and root cause analysis using AHF Insights.

tfactl toolstatus

## for eg take a snap every 30 seconds and retain for 48 hours
tfactl stop oswbb

tfactl start oswbb 30 48

To find the current retention, you can also review the properties file

cat   $REPO_LOCATION/suptools/hostname/oswbb/$PROCESS_OWNER/.osw.prop

# cat /u01/oracle.ahf/data/repository/suptools/host2/oswbb/oracle/.osw.prop
interval=30
hours=48
zip=/bin/gzip
runuser=oracle

You can find the location of the OSWatcher Data Files by using the following commands, and then view the Data file directly. Older data files are zipped up.

tfactl print directories | egrep "Permission|oswbb"


Trace Directory                                                                              /u01/oracle.ahf/data/repository/suptools/host2/oswbb/oracle/archive 

cd
/u01/oracle.ahf/data/repository/suptools/host2/oswbb/oracle/archive;ls

oswarp  oswbuddyinfo  oswcpuinfo  oswifconfig  oswiostat  oswmeminfo  oswmpstat  oswnetstat  oswnfsiostat  oswnumastat  oswpagetype  oswpidstat  oswpidstatd  oswprvtnet  oswps  oswslabinfo  oswtop  oswvmstat  oswxentop  oswzoneinfo

As you can see, the OSWB trace directory contains “*.dat” files for different types of Utilities, and these files hold the output of the command.

 tail oswiostat/host2_iostat_26.04.11.0000.dat 
nvme0n13         0.00    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   0.00   0.00
nvme0n14        53.00  167.00    338.00    337.00     0.00     0.00   0.00   0.00    0.57    1.01   0.20     6.38     2.02   0.97  21.40
nvme0n15        44.00  167.00    187.00    337.00     0.00     1.00   0.00   0.60    0.55    1.05   0.20     4.25     2.02   1.01  21.40
nvme0n16         0.00    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   0.00   0.00
nvme0n17         0.00    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   0.00   0.00
nvme0n18         0.00    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   0.00   0.00
nvme0n19         0.00    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   0.00   0.00
nvme0n20         0.00    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   0.00   0.00
nvme0n21         0.00    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   0.00   0.00

System Health Monitor (SHM)

Oracle documentation describes the SHM/CHM as “high-performance, lightweight daemons that collect, analyze, aggregate, and store a large set of operating system metrics“. Let’s look at how this data is different from OSWatcher.

You use ahf commands to view the status and disable/enable gathering of OS Metrics Data.

ahf configuration get --property ahf.collectors.enhanced_os_metrics
ahf.collectors.enhanced_os_metrics: on

Oracle SHM telemetry is typically stored under: $AHF_BASE/data/$HOSTNAME/shm.

If we look at raw data, stored in a JSON structure, we can see that SHM aligns multiple metrics to the same TIMESTAMP. That is, SHM links system summary metrics, per-CPU metrics, per-disk metrics, and per-network metrics to the same timestamp.

tail /u01/oracle.ahf/data/host2/shm/shmosdata_<hostname>_<timestamp>.log


{"TIMESTAMP":"2026-04-01 01.00.00+0000","HOSTNAME":"host2","boottime[s]":1775592232,"JSON_SAMPLE_TYPE":"FILTERED","SUMMARY":2.10GHz","usage[%]":1.56,"system[%]":0.83,"user[%]":0.72,"nice[%]":0.00,"ioWait[%]":1.12,"steal[%]":0.00,"cpuQ[#]":0,"freeMem[KB]":115755808,"totalMe1,"procsBlocked[#]":0,"procs
InDState[#]":0,"rtProcs[#]":20,"rtProcsOnCpu[#]":null,"fds[#]":28640,"sysFdLimit[#]":6815744,"disks[#]":21,"nfs[#]":null, "nics[#]":2, "loadAvg1": 0.27, "loadAvg5":0.41,  "loadAvg15":0.90, "nicErrs[#/s]":0, "intr[#/s]":9552, "ctxSwitch[
#/s]":14695},"PROCESS_AGGREGATE":{"HEADER":["category", "cpuWeight[%]", "cpu[%]", "rss[KB]", "shMem[KB]", "thrds[#]", "fds[#]", "processes[#]", "sid"],"METRICS":[["DBBG",48.43,0.76,6382032,93664,119,null,74,"DB"],["ASMBG",4.68,0.07,1684964,103384,26,null,22,"+ASM"],["ASMFG""CPUS":{"HEADER":["cpuId", "system[%]", 
"user[%]", "nice[%]", "usage[%]", "ioWait[%]", "steal[%]"],"METRICS":[[7,1.59,1.99,0.00,3.58,8.76,0.00],[3,1.60,1.00,0.00,2.60,8.00,0.00],

AHF insights analyzes SHM data and includes the results as part of the report. However, if you want to view only the OS metrics-related information, you can generate the Nodeview output through TFA.

tfactl set smartprobclassifier=off
Successfully set smartprobclassifier=OFF
.---------------------------------.
|  host-2  |
+-------------------------+-------+
| Configuration Parameter | Value |
+-------------------------+-------+
| smartprobclassifier     | OFF   |
'-------------------------+-------'

# tfactl diagcollect -last 1h -tag shm_last_1h;

Analysis Report

Another thing Oracle also does with this report is that it identifies and categorizes the process into different categories, like ASM, DB, Background, etc., and that adds intelligence to the report.

Also note, Oracle AHF is not only storing raw data, it processess it and stores summaries. I could not find the locaiton where these summaries are stored, so am curious about that bit. Per the documentation,

  • SHM first captures detailed system snapshots into shmosdata files
  • Every hour, TFA analyzes triggers, Inline Analysis, which processes the data and stores the results of the analysis.

The Metric Repository is self-managed, and the shmosdata files are purged once the default retention limit (default 200MB) is reached.

So the next time you need to review OS metrics, remember that if AHF is enabled, you already have SHM data available alongside the traditional OSWatcher output.


Discover more from oratrails-aish

Subscribe to get the latest posts sent to your email.