1: What are Elastic Collector Profiles, and what conditions trigger elastic scaling for collector workloads?

Standard collector profiles do not scale when the number of mapped or attached devices increases significantly. A single collector may struggle to monitor hundreds or thousands of resources, which can lead to performance bottlenecks.

Elastic Collector Profiles

Elastic Collector Profiles were introduced to address this limitation:

  • These profiles scale manually based on the number of managed devices.
  • Scaling is controlled through scale up and scale down operations:
    • Scale Up: Adds new replica profiles when device load increases.
    • Scale Down: Removes replicas when load decreases to optimize resource usage.

Trigger Mechanism

Elastic scaling is triggered manually from the Collector Profiles section when the number of managed devices exceeds the capacity of existing replicas. This ensures that monitoring continues without delays or missed events as your device count grows.

Key Features

  • Only Elastic Collector Profiles support scaling.
  • Standard profiles remain static and cannot scale automatically.
  • Elastic scaling improves high availability, performance, and resource utilization for large deployments.

2: What are the roles of master and replica profiles in an Elastic Collector Profile?

In an Elastic Collector Profile, both master and replica profiles are responsible for performing monitoring tasks on the devices assigned to them. However, there are some key distinctions:

Master Profile

  • Responsible for device and resource discovery for the profile. Only the master profile performs discovery.
  • Synthetics is enabled or disabled only on the master profile.
  • Handles scale-up and scale-down requests.
  • Does not coordinate any replica profiles; each replica operates independently and collects the monitoring data for attached resources.

Replica Profile

  • Collects monitoring on devices which are attached to them, just like the master profile.
  • Does not perform discovery; they rely on the master profile to detect new devices.

Additional Notes

Using both master and replica profiles ensures efficient monitoring of large or distributed networks. The master profile is critical for accurate network discovery; without it, new devices may not be detected.

Example Scenario

If you have a network with 100 devices and 2 replicas:

  • The master profile discovers all devices and collects monitoring for 50 devices.
  • The replica profile handles monitoring of the remaining 50 devices, reducing the load on the master.

3: How does the scale up or scale down operation work in an HA (High Availability) setup for Elastic Collector Profiles?

Scale-Up Operation

When you perform a scale-up on an Elastic Collector Profile:

  • A new replica is created in the same namespace with the same configuration as existing replicas.
  • A new replica profile is automatically created and registered to the OpsRamp cloud.
  • Any resources attached to existing master and replica profiles are automatically distributed to the new replica.

Prerequisite: Ensure your Kubernetes cluster has sufficient CPU, memory, and other resources to prevent replica creation failures.

Key Points:

  • No manual configuration is needed for the new replica.
  • The process ensures horizontal scalability without disrupting existing monitoring.
  • Auto-distribution of resources maintains consistent monitoring coverage.

Scale-Down Operation

When you perform a scale-down on an Elastic Collector Profile:

  • The most recently created replica is removed from the cluster.
  • Resources attached to the removed replica are automatically redistributed to the remaining registered profiles (both master and replica profiles).
  • Scale-down ensures efficient resource usage while maintaining monitoring coverage.

Key Points:

  • Scale-down always removes the latest replica first to minimize impact.
  • Existing monitoring tasks continue to run on remaining replicas.
  • Resource redistribution is automatic, avoiding gaps in monitoring.

Additional Notes

Elastic profiles are recommended to run in HA mode to take full advantage of automatic scaling and fault tolerance. Scaling operations are managed through the Elastic Collector Profile, not manually on individual replicas. Kubernetes resource planning is critical—insufficient resources may cause replica creation failures during scale-up.

Example Scenario

An HA cluster has 1 master and 2 replica profiles monitoring 1500 devices. Scale up adds 1 new replica profile; resources are auto-distributed so monitoring load is balanced across 4 profiles. Scale down removes the last added replica; its resources are redistributed to the remaining 3 replicas to maintain monitoring coverage.

4: Is there any downtime while performing scale-up or scale-down operations?

Monitoring Impact

  • Master and replica profile gateways do not go down during scale-up or scale-down operations.
  • A brief monitoring impact may occur during automatic redistribution of devices and templates:
    • Templates and monitoring configurations are unassigned from existing replicas and reassigned to newly created or remaining replicas.
    • This ensures monitoring continues on all devices with only a very short interruption.

Key Points

  • The impact is typically negligible and lasts only while resource redistribution occurs.
  • Scale operations are designed to maintain continuous monitoring with minimal disruption.

Example Scenario

Scale-up:

  • A new replica profile is created.
  • Devices are redistributed from older replicas to the new replica.
  • During this redistribution, a few monitoring events may be delayed for seconds to a few minutes, depending on the number of devices.

Scale-down:

  • The most recently added replica is removed.
  • Its resources are reassigned to remaining replicas.
  • Minimal monitoring interruption may occur while redistribution completes.

Summary

  • No downtime for master or replica profiles.
  • Minimal and temporary monitoring impact during resource redistribution.
  • Scale operations are safe for production environments.

5: How long does a scale-up or scale-down operation typically take for Elastic Collector Profiles?

Scale-Up Operation

During a scale-up, a new replica is created. All related Docker images are downloaded from the OpsRamp GCR repository (or your configured repository). The total image size is approximately 2–3 GB. The duration depends on network speed and system resources (CPU, RAM, SSD):

  • On high-speed networks with adequate resources, scale-up usually completes in approximately 2–3 minutes.
  • On slower networks or systems with limited resources, it may take longer, depending on download and initialization speed.

Scale-Down Operation

During a scale-down, the most recently created replica is removed. Any resources mapped to the replica are automatically redistributed or rebalanced across remaining replicas. The time required depends on the number of resources attached to the replica:

  • For a typical replica, scale-down usually takes a few minutes.
  • Larger numbers of resources may increase the redistribution time slightly.

Key Points

  • Scale-up takes longer than scale-down due to image download and container initialization.
  • Scale-down is generally faster since it only involves resource redistribution and replica removal.
  • Proper network speed and sufficient Kubernetes resources (CPU, RAM, SSD) significantly improve scaling times.

Example Scenario

Scale-Up: Adding a new replica profile for 500 devices:

  • Docker images (approximately 2–3 GB) are pulled.
  • Replica is initialized and resources are assigned.
  • Duration: approximately 2–3 minutes on a well-provisioned cluster.

Scale-Down: Removing the same replica:

  • Resources are redistributed to remaining replicas.
  • Replica is deleted.
  • Duration: approximately 1–2 minutes depending on number of devices.

Scale timing depends on network speed and system resources (CPU, RAM, SSD):

  • On high-speed networks with adequate resources, scale-up usually completes in approximately 2–3 minutes.
  • On slower networks or systems with limited resources, it may take longer, depending on download and initialization speed.

6: Is there a maximum limit on the number of replicas for an Elastic Collector Profile?

There is no hard-coded maximum replica limit. However, the practical limit depends on the available infrastructure resources and overall deployment design.

For most environments, the following configuration is recommended:

  • 3 to 5 replicas per Elastic Collector Profile

Scaling beyond this should be validated based on workload and infrastructure capacity.

High Availability Recommendation

Instead of scaling a single elastic profile excessively, it is recommended to:

  • Deploy multiple Elastic Collector Profiles
  • Distribute them across multiple Kubernetes clusters

The reason is that if a single Kubernetes cluster goes down:

  • All gateways (replicas) running in that cluster will be impacted.
  • Monitoring for devices mapped to that profile will be affected.

Using multiple clusters improves:

  • Fault isolation
  • Resiliency
  • Business continuity
  • Large-scale deployment flexibility

Summary

  • No strict maximum replica limit.
  • Recommended: 3–5 replicas per profile.
  • For large environments, use multiple elastic profiles across multiple clusters.
  • Avoid placing all gateways in a single cluster to prevent full monitoring impact during a cluster outage.

7: Can Elastic Collector Profiles operate when the gateway is deployed behind a proxy?

Yes, Elastic Collector Profiles work behind a proxy in the same way as standard collector profiles.

When proxy details are provided during the master profile registration, the same proxy configuration is automatically applied to all replicas (master and replica profiles).

All replicas:

  • Use the configured proxy settings
  • Connect to the OpsRamp cloud through the proxy
  • Communicate with external services using the same proxy configuration

This ensures consistent outbound connectivity across the entire Elastic Collector Profile.

Key Points

  • No additional proxy configuration is required for replica profiles.
  • Proxy settings are inherited automatically from the master profile.
  • Elastic scaling does not affect proxy behavior.
  • Ensure the proxy allows required outbound connectivity to avoid communication issues.

8: What occurs when a Gateway upgrade is performed on an Elastic Collector Profile?

When a Gateway upgrade is performed, Elastic Collector Profiles are upgraded in the same way as standard collector profiles.

During the upgrade:

  • All replicas (master and replica profiles) are upgraded to the new version.
  • The entire Elastic Collector Profile remains on a consistent version.
  • Version mismatch between replicas does not occur.

Example

If the current Gateway version is 21.0.0 with 1 master profile and 2 replica profiles, after upgrading to 22.0.0:

  • The master profile will upgrade to 22.0.0
  • All replica profiles will also upgrade to 22.0.0

All replicas will run the same upgraded version.

Key Points

  • Upgrade behavior is consistent with standard collector profiles.
  • All replicas are maintained on the same version.
  • No manual upgrade is required for individual replicas.
  • Ensures version consistency across the entire Elastic Collector Profile.
  • In the current implementation, after a gateway appliance upgrade, scale operations must be performed again manually based on the previously configured settings.

9: Are scale up and scale down events recorded for Elastic Collector Profiles?

Yes. All scale up and scale down events for Elastic Collector Profiles are captured in the OpsRamp platform audit logs.

You can review these events in the Audit Reports section within the OpsRamp platform to track scaling activities and user actions.

Additional Information

Audit logs include scaling operations such as:

  • Scale-up requests
  • Scale-down requests
  • Related configuration changes

Customers can also enable and configure audit logging at the Kubernetes cluster level for additional visibility and compliance requirements. For OpsRamp-provided ISO or OVA-based deployments, refer to the official deployment guide to configure Kubernetes audit logging within your cluster.

Summary

  • Scaling events are recorded in OpsRamp audit logs.
  • Audit reports can be reviewed within the platform.
  • Additional Kubernetes level auditing can be enabled if required.

10: How can I verify app monitoring and discovery logs for Elastic Collector Profiles?

You can view app monitoring and discovery logs for Elastic Collector Profiles in the same way as standard collector profiles using Kubernetes pod logs.

Use the following command to check logs:

kubectl logs <pod-name> -c vprobe -n <namespace> -f

Understanding Replica Pod Names

In an Elastic Collector Profile:

  • nextgen-gw-0 → Acts as the master profile
  • nextgen-gw-1, nextgen-gw-2, …, nextgen-gw-n → Act as replica profiles

To verify logs for a specific replica, replace the pod name accordingly.

Examples

  • To check logs for the master profile:
    kubectl logs nextgen-gw-0 -c vprobe -n <namespace> -f
    To check logs for a replica profile 1:
    kubectl logs nextgen-gw-1 -c vprobe -n <namespace> -f

Key Points

  • Discovery logs are primarily generated by the master profile.
  • Monitoring logs can be checked on both master and replica profiles.
  • Ensure you are running the command in the correct Kubernetes namespace.
  • Replace the pod name based on the replica you want to inspect.

11: When a scale-up operation is performed, are additional polling cycles triggered for monitored resources?

No, scale-up operations do not intentionally trigger additional polling cycles for monitored resources.

When a scale-up occurs:

  • Resources are automatically redistributed across the master and replica profiles within the Elastic Collector Profile.
  • Monitoring templates are:
    • Unassigned from the previous replica
    • Reassigned to the newly created replica

During this redistribution process, monitoring continues as per the configured polling intervals.

What to Verify After Scale-Up

Although additional polling is not triggered by design, it is recommended to:

  • Verify that templates are successfully applied on the new replica.
  • Confirm that monitoring resumes and operates as expected.
  • Check logs if any resource appears temporarily delayed.

Key Points

  • Scale-up does not duplicate or increase polling frequency.
  • Resources are simply reassigned to balance load.
  • A brief monitoring delay may occur during template reassignment.
  • Normal polling resumes once redistribution completes.

12: Which gateway types support Elastic Collector Profiles?

Elastic Collector Profiles are supported only on NextGen Gateways.

The following gateway types are not supported:

  • Classic Gateways
  • Windows Gateways

Elastic scaling capabilities are available exclusively with the NextGen Gateway architecture.

Summary

  • Supported: NextGen Gateways
  • Not Supported: Classic Gateways
  • Not Supported: Windows Gateways

To use Elastic Collector Profiles and scaling functionality, ensure your deployment is based on the NextGen Gateway.

13: Are additional sessions established to monitored devices when using Elastic Collector Profiles?

Yes, the session behavior differs between standard and Elastic Collector Profiles.

In a standard collector profile, all devices are attached to a single gateway. Discovery and monitoring sessions are initiated from that single gateway instance.

In an Elastic Collector Profile, multiple replicas (master and replica profiles) are deployed. Devices are mapped across these replicas. Each replica independently creates its own sessions to the devices assigned to it.

What This Means

  • Sessions are not shared between replicas.
  • Each replica establishes sessions only for the devices mapped to it.
  • There are no duplicate sessions for the same device unless the device is explicitly mapped to multiple replicas (which is not the default behavior).

Key Points

  • Standard Profile: One gateway → One set of sessions.
  • Elastic Profile: Multiple replicas → Each replica creates sessions for its assigned devices.
  • This approach supports scalability and higher device capacity.

14: What happens if a master or child replica goes offline during an auto-rebalance or manual rebalance operation?

Auto or manual rebalance operations are managed by the OpsRamp platform and are not immediately dependent on the gateway being online.

If a master or child replica goes offline:

  • The OpsRamp platform still evaluates the gateway’s registration status.
  • The rebalance process is initiated at the platform level.

However:

  • Template unassignment and reassignment to the affected replica will complete only after the gateway comes back online and reconnects to the OpsRamp SaaS platform.
  • Once connectivity is restored, the gateway synchronizes and applies the pending template changes.

Key Points

  • Rebalance logic is platform-driven.
  • A temporarily offline gateway does not permanently block rebalance operations.
  • Template updates are applied once the gateway reconnects.
  • Monitoring for devices mapped to the offline replica will resume after it comes back online and synchronization completes.

Summary

  • Rebalance operations proceed even if a gateway is temporarily offline.
  • Template reassignment is finalized when the gateway reconnects.
  • Monitoring resumes normally after synchronization.

15: What happens if a master or child replica goes offline during a scale-up or scale-down operation?

Scaling operations depend on the master replica profile. The master replica is responsible for handling scale-up and scale-down operations. Child replicas do not control or initiate scaling activities.

If a Child Replica Goes Offline

Scaling operations can still proceed. The master replica continues to handle the scale process. Once the offline child replica comes back online, it will synchronize with the platform.

If the Master Replica Goes Offline

Scaling operations will fail. The master replica must be:

  • Up and running
  • Connected to the OpsRamp SaaS platform

Without the master being active and connected, scale-up or scale-down operations cannot be executed.

Key Points

  • Scaling is controlled by the master replica only.
  • Child replica availability does not block scaling.
  • Ensure the master replica is healthy before initiating scaling operations.

16: How should infrastructure resources be sized for Elastic Collector deployments?

Each Elastic Collector instance (Master or Replica) requires the following resources:

  • 8 GB RAM
  • 4 CPU cores
  • 50 GB disk

If a deployment includes 1 Master and 2 Replicas, the total infrastructure requirement would be:

  • 24 GB RAM
  • 12 CPU cores
  • 150 GB disk

However, the recommended number of replicas depends on the number of monitored resources and the type of monitoring workload. In many environments, 1 Master and 1 Replica is sufficient for handling typical workloads (for example, approximately 800–1000 resources). In some cases, a single Master collector may also be adequate depending on the resource type.

Examples

  • For monitoring approximately 1000 VMware resources, typically 1 Master + 1 Replica is recommended.
  • For monitoring approximately 1000 SDK-based resources, a single Master collector may be sufficient.

High Availability Considerations

If the infrastructure is provisioned for 3 nodes (24 GB RAM, 12 CPU cores, 150 GB disk) but only 1 Master and 1 Replica are deployed, the remaining node can act as a standby capacity. In case of a node failure, Kubernetes can reschedule the collector pod to the available node, maintaining service availability.

If 1 Master and 2 Replicas are actively deployed across 3 nodes, there will be no spare node available for failover. In the event of a node failure, Kubernetes may not have sufficient capacity to reschedule the collector pod unless additional resources are available.

Therefore, the final sizing and replica configuration should be determined based on:

  • The number of monitored resources
  • The type of monitoring workloads
  • The desired level of availability and redundancy

17: Can Elastic Collector Profiles be deployed on a single-node Kubernetes cluster?

Yes, Elastic Collector Profiles can be deployed on a single-node Kubernetes cluster. However, the node must be vertically scaled to provide sufficient resources for all collector instances running on that node.

Each Elastic Collector instance (Master or Replica) requires the following resources:

  • 8 GB RAM
  • 4 CPU cores
  • 50 GB disk

For example, if you plan to deploy 1 Master and 2 Replicas, the single node must be provisioned with at least:

  • 24 GB RAM
  • 12 CPU cores
  • 150 GB disk

Since all gateway instances would run on the same node in this setup, the node must have enough capacity to handle the combined workload.

Important Considerations

  • In a single-node cluster, there is no high availability (HA). If the node becomes unavailable, all gateway instances will go down.
  • Multiple replicas in a single-node setup help distribute workload but do not provide node-level fault tolerance.
  • Resource utilization should be carefully monitored to ensure the node can handle the monitoring workload.

Best Practice Recommendation

OpsRamp strongly recommends deploying Elastic Collector Profiles on a multi-node Kubernetes cluster to achieve high availability and better fault tolerance. In a multi-node cluster, Kubernetes can reschedule collector pods to another node in case of node failure, ensuring continued monitoring operations.

18: What is the Rebalance action in Elastic Collector Profiles, and when should it be used?

The Rebalance action in Elastic Collector Profiles allows users to manually redistribute resources across all collector replicas when the automatic distribution does not complete successfully after scaling operations. This helps maintain balanced resource allocation and ensures optimal load distribution among all replicas in the profile.

When Should You Use the Rebalance Action?

You should use the Rebalance option in the following scenarios:

  • A Scale Up or Scale Down operation successfully creates a new replica, but the automatic redistribution of resources fails.
  • Resources are unevenly distributed across replicas and need to be manually balanced.
  • You want to ensure optimal load distribution across all replicas in an Elastic Collector Profile.

How Rebalance Works

Manual Redistribution

If automatic resource redistribution fails after a scale-up operation, users can manually trigger the Rebalance action from the OpsRamp Collector Profiles UI.

Formula-Based Distribution Resources are redistributed using the formula:

Total Resources in Elastic Profile ÷ Total Replicas in Elastic Profile

Replica-Aware Allocation

The redistributed resources are evenly allocated across all available replicas, including both the Master Replica and any newly created replicas.

Key Benefit

Rebalancing ensures that workloads remain evenly distributed across collector replicas, preventing resource overload on individual collectors and maintaining efficient system performance.

19: Can I trigger multiple Scale Up, Scale Down, or Rebalance operations at the same time?

No, you cannot run multiple operations of the same type simultaneously on an Elastic Collector Profile.

How It Works

  • When a Scale Up or Scale Down operation is initiated, another Scale Up or Scale Down request cannot be triggered until the current request completes (either successfully or with failure).
  • Similarly, when a Rebalance operation is in progress, another Rebalance request cannot be started until the current one finishes.
  • However, Scale operations (Scale Up and Scale Down) and Rebalance operations are restricted independently based on their type.

System Behavior

If you attempt to initiate the same operation while it is already in progress, the system will display an error message indicating that the request is already being processed.

Example error message:

An operation has already been initiated on {date/time} and is currently in progress for the {profile name} profile.

Key Note

This restriction helps maintain system stability and prevents conflicts that could occur if multiple requests of the same operation type are executed simultaneously on the same Elastic Collector Profile.

20: What permissions are needed to perform Collector actions like Rebalance, Scale Up, or Scale Down?

To perform any of the following collector actions in OpsRamp, you must have the Manage Management Profile permission:

  • Rebalance
  • Scale Up
  • Scale Down
ActionRequired Permission
RebalanceManage Management Profile
Scale UpManage Management Profile
Scale DownManage Management Profile

What Happens If You Don’t Have the Required Permission?

If you attempt to perform any of these actions without the required permission, the system will display the following error message: “Access denied to perform collector profile actions.”

21: Where are discovery and monitoring configurations stored in a Gateway setup – the master gateway or the replicas?

In a gateway setup with a master profile and replica profiles, discovery and monitoring configurations are handled in two stages.

Discovery Phase

All resource discovery operations are always initiated through the Master Gateway Profile. The master profile is responsible for discovering the environment and identifying devices or resources that need to be monitored.

Resource Distribution

After discovery is completed, the identified resources are distributed across available gateway profiles, including the master and its replicas, based on the load distribution mechanism.

Configuration Persistence

Once a resource is assigned to a specific gateway profile:

  • The monitoring configuration for that resource is stored only in the profile to which the device is attached.
  • If a device is assigned to the master profile, its configuration is stored in the master profile.
  • If a device is assigned to a replica profile, its configuration is stored in that replica profile.

This design ensures that each gateway profile manages and maintains the monitoring configurations for the resources assigned to it, improving scalability and workload distribution.

Summary:

DiscoveryBehavior
RebalanceAlways performed by the Master Gateway Profile
Resource DistributionDevices are distributed across master and replica profiles
Configuration StorageStored in the profile where the device is assigned

This approach allows efficient load balancing and management of monitored resources across multiple gateway replicas.

22: If monitoring fails after a rebalance or scaling operation, which gateway logs should be checked?

After a manual or automatic rebalance or scaling operation, the monitored resources are redistributed and attached to their respective collector profiles (master or replicas). Monitoring for each device is then handled by the profile to which the device is assigned.

If monitoring fails after rebalancing or scaling, troubleshooting should focus on the profile that currently manages the device, not necessarily the master gateway.

Step-by-Step Troubleshooting

1. Identify the gateway profile handling the device

After rebalance or scaling is completed, confirm which gateway profile the device is attached to:

  • Master profile
  • Replica profile (for example: nextgen-gw-1, nextgen-gw-2, etc.)

Monitoring operations are executed from the profile where the device is assigned.

2. Verify that monitoring templates are pushed successfully

Check whether the monitoring templates were successfully pushed to the assigned profile.

  • If the templates are not pushed, monitoring will fail.
  • If templates appear out of sync, perform a template sync.

Template sync ensures that the monitoring configuration on the gateway matches the configuration defined in the platform.

3. Check logs on the assigned replica gateway

If the templates are synchronized and the issue persists, check the logs on the specific replica handling the device.

Example

If the resource or device is assigned to Replica 2 (nextgen-gw-2), check logs on that replica.

Example:

kubectl logs nextgen-gw-2 -c vprobe -n {NAME_SPACE} -f

These logs help identify runtime errors related to monitoring execution.

4. Verify RAM job logs

For deeper troubleshooting, verify the following components on the assigned gateway replica:

  • RAM job logs – responsible for scheduled monitoring jobs

Use the gcli command on the respective replica to inspect these logs.

This step helps determine whether the monitoring job is being triggered correctly or failing during execution.

Summary

Troubleshooting StepWhat to Check
Identify assigned profileConfirm which master/replica manages the device
Template validationEnsure monitoring templates are pushed and synced
Replica logsCheck logs on the specific replica handling the device
Probe executionVerify VProbe logs and RAM jobs using gcli
Monitoring issues after rebalance usually occur due to template synchronization issues or probe execution failures on the assigned gateway replica.

23: When scaling up an elastic gateway, is device distribution based on integration discovery counts or the total resource count?

Device distribution during a scale-up operation is not based on how many devices are discovered by a specific integration. Instead, the distribution is calculated using the total number of resources managed by the entire Elastic Profile.

An Elastic Profile consists of:

  • Master Profile
  • All Replica Profiles

When a scale-up operation occurs and new replicas are added, the system redistributes resources across all available profiles (master and replicas) to ensure balanced workload distribution.

Distribution Logic

The number of resources assigned to each profile is calculated using the following formula:

Total Resources in Elastic Profile ÷ Total Number of  Profiles (Master + Replicas)

This ensures that the monitoring workload is evenly distributed across all available gateway nodes.

Important

Even if a single integration discovers a large number of devices, those devices will not remain tied to that integration’s profile after scaling. During redistribution, the system considers all managed resources collectively, regardless of which integration discovered them.

Example

Suppose the following scenario:

  • Integration A discovers 100 devices
  • Integration B discovers 50 devices

Total resources managed in the Elastic Profile:

100 + 50 = 150 devices

Now assume the elastic profile has:

  • 1 Master Profile
  • 2 Replica Profiles

Total gateway profiles = 3 Resource distribution calculation:

150 ÷ 3 = 50 devices per profile
After the scale-up and rebalance process:

Gateway ProfileDevices Assigned
Master Profile~50 devices
Replica Profile 1~50 devices
Replica Profile 2~50 devices

This means devices discovered by a specific integration may be distributed across multiple gateway profiles to maintain balanced resource management.

24: What happens if a master or replica gateway goes offline in an Elastic Gateway setup?

In an Elastic Gateway Profile, resources are distributed across the master gateway and its replicas. Each gateway profile is responsible for monitoring the devices assigned to it.

If either the master gateway or a replica gateway goes offline, the impact depends on which gateway becomes unavailable.

What Happens If a Replica Gateway Goes Offline?

If a replica gateway goes offline:

  • Monitoring will be impacted only for the devices assigned to that replica.
  • Devices assigned to other replicas or the master profile will continue to be monitored normally.
  • Discovery operations will not be affected, because discovery is handled by the master profile.

In this situation, the platform does not automatically rebalance the resources to other gateways.

Why Is Automatic Rebalance Not Triggered?

Sometimes a gateway may go offline due to temporary issues such as:

  • Network glitches
  • Temporary connectivity loss
  • Gateway restart
  • Infrastructure maintenance

If automatic rebalancing occurred immediately, it would cause unnecessary platform load due to:

  • Monitoring template unassignment
  • Template reassignment
  • Resource redistribution

Instead, administrators are expected to investigate and restore the replica gateway.

What Happens If the Master Gateway Goes Offline?

If the master profile goes offline:

  • Discovery operations will stop, because discovery is always performed through the master gateway.
  • Monitoring will be impacted only for the devices assigned to the master gateway.
  • Replica gateways will continue monitoring their assigned devices independently.

This means that existing monitoring handled by replicas will continue running normally, even if the master gateway is unavailable.

Example Scenario

Consider an Elastic Gateway Profile with:

  • 1 Master Gateway
  • 2 Replica Gateways

Device distribution:

GatewayDevices Managed
Master Gateway100 devices
Replica 1100 devices
Replica 2100 devices

Total devices: 300

Scenario 1: Replica 1 Goes Offline

Impact:

ComponentImpact
Devices on Replica 1Monitoring stops for 100 devices
Devices on MasterMonitoring continues
Devices on Replica 2Monitoring continues
DiscoveryNo impact

Only the 100 devices managed by Replica 1 are affected.

Scenario 2: Master Gateway Goes Offline

Impact:

ComponentImpact
Devices on MasterMonitoring stops for 100 devices
Devices on Replica 1Monitoring continues
Devices on Replica 2Monitoring continues
DiscoveryStops until master is restored

In this case, replicas continue monitoring their devices, but new discovery cannot be performed. Summary

ScenarioImpact
Replica gateway goes offlineOnly devices assigned to that replica lose monitoring
Master gateway goes offlineMaster devices lose monitoring and discovery stops
Other replicasContinue monitoring normally
Automatic rebalanceNot triggered to avoid unnecessary platform load

Users should identify the reason for the gateway outage and restore the gateway before considering manual redistribution of resources.