Introduction

Oracle Real Application Clusters (RAC) Multi-node Cluster is a database architecture that runs a single Oracle database across multiple servers (nodes), enhancing availability, scalability, and performance by distributing the workload and ensuring continuous operation even if one node fails.

Key Features

  • High Availability (HA): Continuous operation by eliminating single points of failure; nodes continue to operate if one fails.
  • Scalability: Add more nodes to handle increased users and transactions.
  • Load Balancing: Even distribution of workloads across nodes, optimizing performance.
  • Fault Tolerance: Redundancy ensures database availability despite node failures.
  • Improved Performance: Multiple nodes handle more transactions and queries.

Components of Oracle RAC Multi-Node Cluster

  • Cluster Nodes: Servers running an instance of the Oracle Database.
  • Oracle Clusterware: Software for clustering services, including crsctl and srvctl.
  • Shared Storage: Nodes share access to storage via SAN, NAS, or Oracle ASM.
  • Interconnect: Private network for internode communication.
  • Oracle ASM: Simplifies storage management, providing striping and mirroring.
  • Global Resource Directory (GRD): Tracks data blocks and resources across instances.

Oracle Data Guard in RAC

Oracle Data Guard enhances RAC’s high availability, data protection, and disaster recovery by maintaining synchronized standby databases.

Key Features

  • Disaster Recovery: Standby databases in different locations for site-level recovery.
  • Data Protection: Continuous application of redo logs ensures data consistency.
  • High Availability: Handles node-level and site-level failures for robust availability.

Key Benefits

  • Unified Device Discovery: Provides a comprehensive view of all elements in an Oracle RAC Database with Data Guard Multi-Node Cluster, including their relationships.
  • Proactive Device Monitoring: Collects metric values over time and sends alerts to the appropriate team when thresholds are breached or unexpected behavior occurs, ensuring minimal or zero downtime.
  • Job Scheduling Metrics: Offers detailed metrics on job scheduling times and statuses.
  • Concern Alerts: Generates alerts for each metric to notify administrators of any resource issues promptly.

Supported Target Versions

The application is validated on Oracle Database 19c Enterprise Edition Release 19.0.0.0.0.

Prerequisites

  • OpsRamp Classic Gateway 15.0.0 and above.

  • OpsRamp Nextgen Gateway 15.0.0 and above Note: OpsRamp recommends using the latest Gateway version for full coverage of recent bug fixes, enhancements, etc.

  • For Bash CLI cmdlets the following are prerequisites:

    • SSH User (Prefer - Oracle user) should be able to execute bash commands and listener related commands (like crsctl, srvctl..etc).
  • Users can establish db connection from gateway to oracle scan name and local listeners as well.

Oracle authorization permissions:

For monitoring some metrics, we are using JDBC. For JDBC connections we are supporting Database authentication.

This utilizes CLI commands such as crsctl, srvctl, and olsnodes..etc for monitoring and discovery. Additionally, we are not using .oraenv to set the Oracle environment; instead, we configure the Oracle environment variables in .bashrc.

Please find the below screenshot having oracle environment configuration in .bashrc file

Please check below points in gateway:

  • ping <scan name> - scan hostname / scan ip address based on what is provide in the configuration. (If you are using scan hostname, ensure that the hostname is resolved by checking proper dns is configured on the gateway.)
  • telnet <scan name> 1521
  • connect to gcli using “gcli” cmd
  • execute
    db oracledb <scan_name> <username> <password> <db_port> <db_name>:servicename 15000 10000 insecure Yes "SELECT INST_ID, INSTANCE_NUMBER, INSTANCE_NAME, HOST_NAME FROM gv$instance"

Note: While establishing connection on the scan hostname / Ipaddress it is internally redirected to the local listeners, ensure that the end device (all RAC nodes) accepts inbound connections on all these IpAddresses.

Privileges - The provided database user should have the SELECT ANY TABLE privilege.

Roles - The provided database user should have the CONNECT and SELECT_CATALOG_ROLE

Hierarchy of Oracle RAC resource

      • Oracle RAC
             - Oracle Node
                    - Oracle DB Instance
             - Oracle Disk Group
                    - Oracle Disk

Supported Metrics

Tabbed Interface with Table
Oracle RAC
Oracle Node
Oracle DB Instance
Oracle Disk Group
Oracle Disk
Metric NameDisplay NameMetric CategoryUnitApplication VersionDescription
oracle_cluster_NodeStateOracle Cluster Node StateAvailability1.0.0State of all nodes of the cluster such as Active or InActive. Possible values Active(0),INACTIVE(1)
oracle_cluster_online_NodeCountOracle Cluster Online Nodes CountAvailabilitycount1.0.0Count of nodes which are in Online state
oracle_cluster_InstanceCountOracle Cluster Instances CountAvailabilitycount1.0.0Count of Total Database Instances
oracle_cluster_active_InstanceCountOracle Cluster Active Instances CountAvailabilitycount1.0.0Count of Oracle Active Database Instance
oracle_cluster_check_db_AliveOracle Cluster Check DB AliveAvailability1.0.0Aliveness status of the Oracle Database Instance
oracle_cluster_ServicesStatusOracle Cluster Services StatusAvailability1.0.0Status of the available Oracle RAC Cluster services. Possible values are ONLINE(0), OFFLINE(1), INTERMEDIATE(2), UNKNOWN(3)
oracle_cluster_sessions_UtilizationOracle Cluster Sessions UtilizationUsage%1.0.0To monitoring db sessions utilization
oracle_cluster_executions_PerTxnOracle Cluster Executions Per TransactionPerformance1.0.0The average amount of time per execution
oracle_cluster_executions_PerSecOracle Cluster Executions Per SecPerformance1.0.0The average transactions per second
oracle_cluster_cpu_UsagePerSecOracle Cluster CPU Usage Per SecUsage1.0.0Represents the CPU usage per second by the database processes, measured in hundredths of a second.
oracle_cluster_cpu_UsagePerTxnOracle Cluster CPU Usage Per TransactionUsage1.0.0The amount of CPU usage per transaction for the specific task or session.
oracle_cluster_database_cpu_time_RatioOracle Cluster Database CPU Time RatioUsage1.0.0The Database CPU Time Ratio is of limited value as a tuning tool.The Database CPU Time Ratio is computed by dividing the amount of CPU used in the database by the amount of total database time. Total database time is the time spent by the database on user-level calls .
oracle_cluster_blocking_SessionCountOracle Cluster Blocking Session CountUsagecount1.0.0To monitor the count of blocking sessions
oracle_cluster_session_limit_UsageOracle Cluster Session Limit UsageUsage%1.0.0To monitor the session limit usage
oracle_cluster_inactive_SessionsOracle Cluster Inactive SessionsAvailabilitycount1.0.0To monitors the inactive sessions.
oracle_cluster_active_SessionsOracle Cluster Active SessionsAvailabilitycount1.0.0To monitors the active sessions
oracle_cluster_system_waits_PerClassOracle Cluster System Waits PerClassPerformances1.0.0To monitor oracle system class waits (The system-level waits represent a high level summary of all session-level waits).This metric evaluated using this formula avg of waits = sum(time_waited)/sum(total_waits)
oracle_cluster_long_running_QueriesOracle Cluster Long Running QueriesPerformancecount1.0.0Validates the how many long running queries on particular database.
oracle_cluster_bufferCacheHitRatio_PctOracle Cluster BufferCacheHitRatio PercentageUsage%1.0.0To monitoring Buffer cache hit ratio value in Percentage
oracle_cluster_sequence_UtilizationOracle Cluster Sequence UtilizationUsage%1.0.0To monitoring db sessions usage in Pct
oracle_cluster_temp_tableSpace_UtilizationOracle Cluster Temp Tablespace UtilizationUsage%1.0.0To monitor Temp tableSpace space usage in Pct
oracle_cluster_database_cdb_pdb_tableSpace_UtilizationOracle Cluster Database CDB PDB Tablespace UtilizationUsage%1.0.0To monitor the tableSpace utilization of the Oracle CDB & PDB.
oracle_cluster_database_cdb_pdb_tableSpace_SizeUsedOracle Cluster Database CDB and PDB Tablespace Size UsedUsageMB1.0.0To monitor the tableSpace used size of the Oracle CDB and PDB
oracle_cluster_database_cdb_pdb_tableSpace_SizeFreeOracle Cluster Database CDB and PDB Tablespace Size FreeCapacityMB1.0.0To monitor the tableSpace free size of the Oracle CDB and PDB
oracle_cluster_process_UtilizationOracle Cluster Processes Used pctUsage%1.0.0The percentage of elapsed time that the processor spends to execute a non-Idle thread(This doesn't includes CPU steal time)
oracle_cluster_dataguard_StatusOracle Cluster Dataguard StatusAvailability1.0.0To indicates the status of the Oracle Data Guard. Possible values are ALL(0) - Indicates all users other than SYS are prevented from making changes to any data in the database,STANDBY(1) - Indicates all users other than SYS are prevented from making changes to any database object being maintained by logical standby,NONE(2) - Indicates normal security for all data in the database.
oracle_cluster_dataguard_BrokerStatusOracle Cluster Dataguard Broker StatusAvailability1.0.0To indicates the status of the Oracle Data Guard Broker. Possible value areENABLED(0) - Database is part of a broker configuration and broker management of the database is enabled,DISABLED(1) - Database is part of a broker configuration and broker management of the database is disabled. This value is displayed if the user disabled broker management of the database or configuration, or if broker management was disabled due to a role change (for example, the old primary was disabled after a failover operation).
oracle_cluster_dataguard_fs_FailoverModeOracle Cluster Dataguard Fast-Start Failover ModeAvailability1.0.0To indicates the status of the current fast-start failover mode. Possible values are: DISABLED(0) - Fast-start failover is disabled.OBSERVE-ONLY(1) - Fast-start failover is enabled in test drive mode.ZERO DATA LOSS(2) - Fast-start failover is enabled and a fast-start failover cannot incur any data loss.POTENTIAL DATA LOSS(3) - Fast-start failover is enabled and a fast-start failover can incur data loss within FastStartFailoverLagLimit seconds.
Metric NameDisplay NameMetric CategoryUnitApplication VersionDescription
oracle_node_UptimeOracle Node UptimeAvailabilitym1.0.0Time lapsed since last reboot in minutes
oracle_node_cpu_LoadOracle Node CPU LoadUsage1.0.0Monitors the system's last 1min, 5min and 15min load. It sends per cpu core load average.
oracle_node_cpu_UtilizationOracle Node CPU UtilizationUsage%1.0.0The percentage of elapsed time that the processor spends to execute a non-Idle thread(This doesn't includes CPU steal time)
oracle_node_memory_UsedSpaceOracle Node Memory Used SpaceUsageGB1.0.0Physical and virtual memory usage in GB
oracle_node_memory_UtilizationOracle Node Memory UtilizationUsage%1.0.0Physical and virtual memory usage in GB
oracle_node_disk_usage_UsedSpaceOracle Node Disk Usage UsedSpaceUsageGB1.0.0Monitors disk used space in GB
oracle_node_disk_usage_UtilizationOracle Node Disk UtilizationUsage%1.0.0To monitor node disk utilization
oracle_node_disk_inode_UtilizationOracle Node Disk Inode UtilizationUsage%1.0.0This monitor is to collect DISK Inode metrics for all physical disks in a server.
Metric NameDisplay NameMetric CategoryUnitApplication VersionDescription
oracle_dbInstance_StatusOracle DB Instance StatusAvailability1.0.0Status of the Oracle Cluster Database Instance.Possible values are STARTED(0),MOUNTED(1),OPEN(2),OPEN MIGRATE(3)
oracle_dbInstance_UptimeOracle DB Instance UptimeAvailabilityDays1.0.0Uptime (In Days) of the Oracle Cluster Database Instance
Metric NameDisplay NameMetric CategoryUnitApplication VersionDescription
oracle_diskGroup_StateOracle ASM DiskGroup StateAvailability1.0.0To monitor the states of the each ASM disk group.Possible values are CONNECTED(0),BROKEN(1),DISMOUNTED(2),MOUNTED(3),QUIESCING(4),RESTRICTED(5),UNKNOWN(6)
oracle_diskGroup_UsableFileMBOracle ASM Disk Group Usable File Size In MBUsageMB1.0.0To monitor the amount of free space that can be safely utilized.
oracle_diskGroup_RequiredMirrorFreeMBOracle ASM Disk Group Required Mirror Free Size In MBUsageMB1.0.0To monitor the amount of space that is required to be available in a given disk group in order to restore redundancy after one or more disk failures
oracle_diskGroup_UtilizationOracle ASM DiskGroup Space UtilizationUsage%1.0.0To monitor ASM DiskGroup Space Utilization
Metric NameDisplay NameMetric CategoryUnitApplication VersionDescription
oracle_disk_ModeStatusOracle ASM Disk Mode StatusAvailability1.0.0To monitor ASM DATA diskgroup status..Possible values are ONLINE(0),OFFLINE(1),SYNCING(2)
oracle_disk_StateOracle ASM Disk StateAvailability1.0.0To monitor the state of the each ASM disk.Possible values are NORMAL(0),ADDING(1),DROPPING(2),HUNG(3),FORCING(4),UNKNOWN(5)
oracle_disk_UtilizationOracle ASM Disk UtilizationPerformance%1.0.0To monitor the utilization of the each ASM Disk
oracle_disk_ReadsOracle ASM Disk ReadsUsagecount1.0.0To monitor the Total number of I/O read requests for the disk
oracle_disk_WritesOracle ASM Disk WritesUsagecount1.0.0To monitor the Total number of I/O write requests for the disk
oracle_disk_ReadErrorsOracle ASM Disk Read ErrorsUsagecount1.0.0To monitor the Total number of failed I/O read requests for the disk
oracle_disk_WriteErrorsOracle ASM Disk Write ErrorsUsagecount1.0.0To monitor the Total number of failed I/O write requests for the disk
oracle_disk_ReadTimeOracle ASM Disk Read TimeUsages1.0.0To monitor the Total I/O time (in seconds) for read requests for the disk if the TIMED_STATISTICS initialization parameter is set to true (0 if set to false)
oracle_disk_WriteTimeOracle ASM Disk Write TimeUsages1.0.0To monitor the Total I/O time (in seconds) for write requests for the disk if the TIMED_STATISTICS initialization parameter is set to true (0 if set to false)

Default Monitoring Configurations

Oracle RAC Multi Node Cluster application has default Global Device Management Policies, Global Templates, Global Monitors and Global Metrics in OpsRamp. You can customize these default monitoring configurations as per your business requirement by cloning respective Global Templates and Global Device Management Policies. It is recommended to clone them before installing the application to avoid noise alerts and data.

  1. Default Global Device Management Policies

    You can find the Device Management Policy for each Native Type at Setup > Resources > Device Management Policies. Search with suggested names in global scope:

    {appName nativeType - version}

    Ex: oracle-cluster Oracle RAC - 1 (i.e, appName = oracle-cluster, nativeType = Oracle RAC, version = 1)

  2. Default Global Templates

    You can find the Global Templates for each Native Type at Setup > Monitoring > Templates. Search with suggested names in global scope. Each template adheres to the following naming convention:

    {appName nativeType 'Template' - version}

    Ex: oracle-cluster Oracle RAC Template - 1 (i.e, appName = oracle-cluster, nativeType = Oracle RAC, version = 1)

  3. Default Global Monitors

    You can find the Global Monitors for each Native Type at Setup > Monitoring > Monitors. Search with suggested names in global scope. Each Monitors adheres to the following naming convention:

    {monitorKey appName nativeType - version}

    Ex: Oracle RAC Monitor oracle-cluster Oracle RAC 1(i.e, monitorKey = Oracle RAC Monitor, appName = oracle-cluster, nativeType = Oracle RAC, version = 1)

Configure and Install the Oracle Cluster Integration

  1. To select your client, navigate to All Clients, and click the Client/Partner dropdown menu.
    Note: You may either type your client’s name in the search bar or select your client from the list.
  2. Navigate to Setup > Account. The Account Details screen is displayed.
  3. Click Integrations. The Installed Integrations screen is displayed with all the installed applications.
    Note: If you do not have any installed applications, you will be navigated to the Available Integrations and Apps page with all the available applications along with the newly created application with the version.
  4. Click + ADD on the Installed Integrations page.
    Note: Search for the integration either by entering the name of the integration in the search bar or by selecting the category of the integration from the All Categories dropdown list.
  5. Click ADD in the Oracle Cluster application.
  6. In the Configuration screen, click + ADD. The Add Configuration screen appears.
  7. Enter the following BASIC INFORMATION:
Field NameDescriptionField Type
NameEnter the name for the configuration.String
Oracle RAC Scan Hostname/ IP AddressEnter the Oracle RAC Scan Hostname/ IP Address of the Oracle Cluster. It should be accessible from Gateway.String
SSH PortSSH Port

Note: By default 22 is the SSH port value.
Integer
Oracle RAC SSH CredentialsSelect the credential associated with your Oracle Cluster account. If you want to use the existing credentials, select them from the Select Credentials dropdown. Else, click + Add to create credentials. The ADD CREDENTIAL window is displayed. Enter the following information.
  • Name: Credential name.
  • Description: Brief description of the credential.
  • User Name: User name.
  • Password: Password.
  • Confirm Password: Confirm password
Dropdown
Database PortDatabase Port

Note: By default 1521 is the Database port value.
Integer
Oracle RAC Database CredentialsSelect the credential associated with your Oracle Cluster account. If you want to use the existing credentials, select them from the Select Credentials dropdown. Else, click + Add to create credentials. The ADD CREDENTIAL window is displayed. Enter the following information.
  • Name: Credential name.
  • Description: Brief description of the credential.
  • User Name: User name.
  • Password: Password.
  • Confirm Password: Confirm password
Dropdown
Database NameDatabase NameInteger
App Failure NotificationsWhen selected, you will be notified in case of an application failure such as Connectivity Exception, Authentication Exception.Checkbox
  1. CUSTOM ATTRIBUTES: Custom attributes are the user-defined data fields or properties that can be added to the preexisting attributes to configure the integration.
Field NameDescriptionField Type
Custom AttributeSelect the custom attribute from the dropdown. You can add attributes by clicking the Add icon (+).Dropdown
ValueSelect the value from the dropdown.Dropdown

Note: The custom attribute that you add here will be assigned to all the resources that are created by the integration. You can add a maximum of five custom attributes (key and value pair).

  1. In the RESOURCE TYPE section, select:
    • ALL: All the existing and future resources will be discovered.
    • SELECT: You can select one or multiple resources to be discovered.
  2. In the DISCOVERY SCHEDULE section, select recurrence pattern to add one of the following patterns:
    • Minutes
    • Hourly
    • Daily
    • Weekly
    • Monthly
  3. Click ADD.

Now the configuration is saved and displayed on the configurations page after you save it.
Note: From the same page, you may Edit and Remove the created configuration.
12. Under the ADVANCED SETTINGS, Select the Bypass Resource Reconciliation option, if you wish to bypass resource reconciliation when encountering the same resources discovered by multiple applications.
Note: If two different applications provide identical discovery attributes, two separate resources will be generated with those respective attributes from the individual discoveries.
13. Click NEXT.
14. (Optional) Click +ADD to create a new collector. You can either use the pre-populated name or give the name to your collector.
15. Select an existing registered profile.

  1. Click FINISH.
    The integration is installed and displayed on the INSTALLED INTEGRATION page. Use the search field to find the installed integration.

Modify Oracle Cluster Integration

Discover Resources in Oracle Cluster Integration

  1. Navigate to Infrastructure > Search > DATABASES > Oracle Cluster. The Oracle Cluster page is displayed.
  2. Select the application on the Oracle Cluster page
  3. The RESOURCE page appears from the right.
  4. Click the ellipsis () on the top right and select View Details.
  1. Navigate to the Attributes tab to view the discovery details.

View resource metrics

To confirm Oracle Cluster Cluster monitoring, review the following:

  • Metric graphs: A graph is plotted for each metric that is enabled in the configuration.
  • Alerts: Alerts are generated for metrics that are configured as defined for integration.
  1. Click the Metrics tab to view the metric details for Oracle Cluster.

Supported Alert Custom Macros

Customize the alert subject and description with the following macros so that it can generate alerts accordingly.

Supported macros keys:

${resource.name}${resource.ip}${resource.mac}
${resource.aliasname}${resource.os}${resource.type}
${resource.dnsname}${resource.alternateip}${resource.make}
${resource.model}${resource.serialnumber}${resource.systemId}
${Custome Attributes in the resource}${parent.resource.name}

Resource Filter Input keys

Oracle Cluster application Resources are filtered and discovered based on below keys.

Note: You can filter the resources with the discoverable keys only.

The following tabs represent the Resource Type of Oracle Cluster

Click here to view the Supported Input Keys
Resource TypeKeys
All Types
resourceName
hostName
aliasName
dnsName
ipAddress
macAddress
os
make
model
serialNumber
Oracle RACPath
Total Disks
Total Nodes
Version
Oracle NodeArchitecture
Icon name
Kernel
Machine ID
Oracle DB InstanceDB Type
Version
Oracle Disk GroupCompatibility
Database Compatibility
Disk Group Number
Offline Disks
Type
Oracle DiskPath

Risks, Limitations & Assumptions

  • The integration can manage critical/recovery failure alerts for the following two scenarios when the user activates App Failure Notifications in the settings:
    • Connectivity Exception
    • Authentication Exception
  • Oracle Cluster will send any duplicate/repeat failure alert notification for every 6 hours.
  • Metrics can be used to monitor Oracle resources and can generate alerts based on the threshold values.
  • We have provided the provision to give Cluster Ip Address OR HostName in configuration, but hostName provision will work only if the host name resolution works.
  • Oracle Cluster supports only Classic Gateway and NextGen Gateway. Not supported with Cluster Gateway.
  • No support of showing activity logs.
  • The Template Applied Time will only be displayed if the collector profile (Classic and NextGen Gateway) is version 18.1.0 or higher.
  • Component level thresholds can be configured on each resource level.
  • Latest snapshot metric support from Gateway 14.0.0.

Troubleshooting

Before troubleshooting, ensure all prerequisites prerequisites are met.

If Oracle Cluster integrations fail to discover or monitor, troubleshoot using the following steps:

  • Check if any alerts have been generation on the cluster or in vprobe.
  • If there is an error or alert related to the end device connectivity or authentication, try checking the reachability of the end device from the gateway with the following commands:
    • to ping the scan hostname provided in the configuration: {ping <scan name>}
    • to try telnet: {telnet <scan name> <Port>}
    • to try ssh to the end device: {ssh <username>@<node IP Address>}
    • to connect to the gcli: {gcli} {db oracledb <scan_name> <username> <password> <db_port> <db_name>:servicename 15000 10000 insecure Yes "SELECT INST_ID, INSTANCE_NUMBER, INSTANCE_NAME, HOST_NAME FROM gv$instance"}
    • to try SSH to nodes: {ssh <user name>@<node IPAddress>}

Version History

Application VersionBug fixes / Enhancements
1.0.2Enhancements related to the latest snapshot, Activity Log and DebugHandler changes.
1.0.1Changes related to resource discovery.
1.0.0Initial support for Oracle Cluster application.