How to understand and capture ExaCLI Metrics

When utilizing Exadata Service on OCI or Cloud@Customer, reviewing the performance metrics of the storage servers becomes indispensable. although we cannot ssh to the storage servers, Oracle has exposed limited metrics via the ExaCLI utility.

This blog will discuss the various metrics accessible via ExaCLI and try to expand on a few key metrics. The blog will explain how to understand the purpose of the metric, capturing over 700 metrics, which can be overwhelming. This in turn will help you identify metrics relevant to your issue.

We will use cookies to allow for batch capture of metrics, and if you want to check out how to establish passwordless connection to ExaCLI, refer to the previous blog.

Listing Cell Version

Once you have set up the cookies, you can use the commands to fetch general attributes of the cell. In the below example, we use the list command to check the version of the cell software.

# get the version of the cell software
# list cell attributes attribute_name;

$ exacli -c cloud_user_clustername@10.10.10.10 --cookie-jar -e list cell attributes releaseVersion
         23.1.10.0.0.240208

Understanding ExaCLI commands to fetch Cell Metrics

There are two main commands that you can use to identify and fetch the metrics LIST & DESCRIBE. Just as in SQLPlus, the DESCRIBE command gives a description of the attributes associated with the metric, and the LIST gets you the Metric value. The format of the commands is listed below where you ca

  • VERB & OBJECT_TYPE – can be found by using the command “help” on ExaCLI
  • OBJECT_NAME – I could not find a direct command to list all objects when the object_type is a metric, I have provided the list below, this list is valid as of cell server version 23.1.10
  • ATTRIBUTES – Describe object_type generally lists the attributes associated with the object_type

<verb> <object_type> <object_name|attribute_filter> <DETAIL|ATTRIBUTES attribute_list>.

Reference https://docs.oracle.com/en/engineered-systems/exadata-database-machine/sagug/exadata-storage-server-cellcli.html#GUID-98D9215E-9BFB-4C0A-8827-9F7EE3828B26

## <verb>  <object_type>    <object_name>  ATTRIBUTES <attribute_list> ; 
$   list    metricdefinition   CL_FSUT      ATTRIBUTES name,description ;

CL_FSUT         "Percentage of total space on this file system that is currently used"

## <verb>  <object_type>    <attribute_filter>      ATTRIBUTES <attribute_list> ; 
$ list metricdefinition  where objectType ='CELL' ATTRIBUTES name,description ;

CL_CPUT         "Percentage of time over the previous minute that the system CPUs were not idle."
CL_CPUT_CS      "Percentage of CPU time used by CELLSRV"
.....

## list of attributes
$ describe metricdefinition
        name
        description
        fineGrained
        metricType
        objectType
        persistencePolicy
        streaming
        unit

Although I couldn’t find a direct command to list object types that can be used to filter metrics, I could compile the following list. In the above example, when you filter using the clause “where objectType ='CELL'“, you can see the definitions for all the CELL-related metrics that are at your disposal.

###LIST of OBJECT TYPES

Object Type       Number of Metrics available for the metric
-------------    --------------------------------------------
CELL	                23
CELL_FILESYSTEM	         1
CELLDISK	        38
DEVICE	                 4
DISK	                12
FLASHCACHE	       140
FLASHLOG	        30
GRIDDISK	        34
HOST_INTERCONNECT	11
IBPORT	                 6
IORM_CATEGORY	        61
IORM_CLUSTER	        61
IORM_CONSUMER_GROUP	61
IORM_DATABASE	        67
IORM_PLUGGABLE_DATABASE	66
NET_INTERFACE	         6
NETDEV_QUEUE	         3
SERVER	                37
SMARTIO	                38
XRMEMCACHE	         1

If you describe the METRICCURRENT & METRIC History object_type, you will notice another important attribute, collectionTime. This attribute helps filter the metric values for a certain time. It’s a handy attribute, as it can further restrict the data.

Remember pulling metrics is an overhead on the cell servers, and hence one should be mindful and try to pull only the metrics for the required time frame.

With that bit of advice in mind, now let’s try to view the metrics for a cell server.

exacli -l cloud_user_clustername -c 10.10.10.10 --cookie-jar -e "list METRICHISTORY WHERE objectType =  'FLASHCACHE' AND  collectionTime > '2024-04-01T23:00:09+00:00' and collectionTime < '2024-04-01T11:00:09+00:00' " > cell-01.txt

## Multiple objects

exacli -l cloud_user_clustername -c 10.10.10.10 --cookie-jar -e "list METRICHISTORY WHERE objectType like  'FLASHCACHE|CELLDISK|IORM_DATABASE|FLASHLOG|SMARTIO|IORM_CATEGORY|IORM_CONSUMER_GROUP' AND  collectionTime > '2024-03-18T23:00:09+00:00' and collectionTime < '2024-04-01T11:00:09+00:00' " > cell-01.txt

Again a word of caution, only try and pull the data you need. The CELLSRV process collects the metrics and places them in the memory, and every hour the Management Server (MS) summarizes them and flushes them to the internal disk. This great blog delves further into Exadata Storage cell metrics and how to utilize them effectively.