How to set up a PRTG cluster




Transcript - How to set up a PRTG Cluster

Hello, and welcome to this video about how to set up a PRTG cluster.  My name is Kimberley Trommler and I’m in presales at Paessler.

Before we start, I’d like to show you an excellent article in our knowledge base that explains how to set up a cluster, step-by-step.  If you search for “cluster step by step” on our website, you’ll find this article, and I recommend having a copy of it handy when you set up your cluster.

www.paessler.com, search “cluster step by step”, which will lead you to this manual page.

We’re going to be following the steps in this article today, to set up a PRTG failover cluster with two servers, one master and one failover.  

Depending on what license you have, it’s possible to build a cluster with up to five servers.  If you’re setting up a cluster with more than two servers, then the instructions for the additional servers are exactly the same as what we’ll be doing for the one failover server today.   You just need to repeat the steps that we do on the failover for all of your additional failover servers.

At the top of this article, there’s also a link to a second article called “Failover cluster configuration”.  

This article has a section called “getting started”, which has important information about some aspects that you need to consider before you start your setup.

Before you start:

So, what do you need to consider before you even start?

First of all, you need to have two servers available, one to be the master and one to be the failover.  These two servers need to be similar in regards to system performance and speed, such as the CPU, amount of RAM, etc.  Please keep in mind that a PRTG cluster is an active-active cluster, so the failover server has nearly the same load on it as the master server does.  

Please also keep in mind that cluster servers might be rebooted automatically without notice, so please only run PRTG on servers where a reboot won’t cause serious issues with other applications on the same server.

I have two servers running, one will become the master and the other will be the failover.  To help tell them apart during this video, the server with the red background is going to be the master, and the server with the green background is going to be the failover.

Next, the two servers must be able to communicate with each other, in both directions.  If you have any firewalls or access lists between the two servers, please ensure that the necessary ports are open.  The cluster feature runs on TCP port 23570.  And, if you’re using remote probes, they run on TCP port 23560.  You need to ensure that the two servers can reach each other in both directions on port 23570, and you need to ensure that remote probes can reach all cluster servers on port 23560.

At the operating system level, please ensure that both servers are set to use the same time zone.  My two servers are both set to GMT.

Now that we’ve checked that the windows-level configuration is okay, it’s time to look at the two PRTG installations.

On the servers themselves, you need to run the exact same software version of PRTG.  The version number is important when you first set up a new cluster.  Once the cluster is established you don’t need to worry about the version number anymore, because all cluster nodes will be automatically updated when the master is updated.

When you’re looking at the version numbers, please also check that both servers are running as 32-bit or both as 64bit.  You can see this in the version number:  if there’s a plus sign at the end of the number, then PRTG is running as 64bit.  If there’s no plus sign, then it’s running as 32bit.

And on my failover server I have PRTG installed, with the same version number.

Another thing to check ahead of time is the license on each server.  Both PRTG installations need to use the same license key, so please double check that both have the same license key entered.  You can see the license key under Setup/ License/ Status, and if you need to change the key, just click on “Enter license key” here to add a new key.

Before you start building a cluster, please take some time to consider how you would like remote probes to behave in the cluster.  For each remote probe, you can configure whether that remote probe should send data to only the one master server, or whether it should send its data to both servers.  In addition, if you would like all the servers to receive data from remote probes, then you need to enable remote probes on all the servers.  

Our master server is already accepting remote probes.  To ensure that the failover server will be able to receive data from remote probes too, please enable this functionality on the failover server, under Setup/ System Administration/ Core and Probes. Here you need to allow probes to connect, add the access keys that the remote probes are using, and allow the IP addresses for all the remote probes. 

And, there’s one more thing to consider before we start the cluster, which is notification delivery.  

Since either of the servers could send out notifications, you now need to ensure that both servers are able to send notifications.  Since the master server pushes its configuration out to the failover server, you need to ensure that the master is configured in such a way that the notifications will also work from the failover server.  

So, on the master, go into the Setup/ System Administration/ Notification Delivery to set up email options, the SMTP delivery and SMS delivery.

The email options and the SMS delivery will usually be the same for both PRTG servers.

However, for email notifications to work, it’s important to make sure that the SMTP servers configured here are available from both PRTG servers.  In many cases, you will need to select the option “Use two SNMP rely servers”.  As the first SMPT server, enter the SMTP server that would be used by your primary master PRTG server.   Then, as the second server, enter the SMPT server that would be used by your failover server.  When the two PRTG servers synchronize, they will both have both of these STMP servers configured.  

If the master PRTG server needs to send an email, it will try the first server in the list and will send using that server.  If the failover PRTG server needs to send an email, it will also try the first server in the list, which will fail, and then it will try the second server in the list, which it can then use to send the email.

So, we’re done the preparation now, but before I start the cluster configuration I’d like to take a quick look at the two PRTG installations.

The first server, which will be the master, has a local probe with some devices on it, and one remote probe.

This second server, which is going to be my failover server, only has a fresh install of PRTG on it, without any additional configuration.  Please don’t configure anything else on this failover server, because it will lose its entire configuration when it joins the cluster.  As soon as this server joins the cluster, the master server will push its configuration onto this server, which will overwrite whatever configuration *was* here.  You’ll see that this server has a local probe, because the local probe is created automatically as part of the install.  Don’t get too attached to this local probe, because it’s going to disappear too when we join the cluster.

To begin the cluster setup, start on the master node.  You need to use the PRTG administration tool, which you can find in the windows start menu, under All Programs/ PRTG Network Monitor/  PRTG Administration Tool.

Go to the Cluster tab, and click on “Create a PRTG Cluster”, then confirm that you want to proceed.

The cluster access key in the next field is important:  please copy it somewhere safe, because we’re going to need it again in a minute.  You can change the cluster access key if you’d like, but we recommend just using the one it gives you.

You get a message that PRTG is configured to run as primary master node after restarting the services.  So we’ll click okay, and wait for the server to restart.

The PRTG server will now restart…

Okay, the server is back up.  Before we continue with the setup, let’s take a look at what impact this first step has had on our server:

If we start with our device tree, we see that it has created a new probe, called the cluster probe.  So we now have three probes:  the local probe, the cluster probe and one remote probe.

We also have a new menu item, under Setup/ System Administration/ Cluster, where we can see a list of all the servers in the cluster.  So far, we only have this one server in the cluster, but we’ll come back to this page again after we add the second server to the cluster.

And, under Setup/ PRTG Status there’s a new menu item “Cluster Status”, which also shows the one server we have so far.

So, we have the first half of a cluster.  And now we can turn our attention to the second half.  So, I’m switching now to my failover server.

Here, on the failover server, we also need to open the Adminstration Tool, which is, again, under the Start menu, in PRTG Network Monitor/ PRTG Administration Tool.  And, we go to the cluster tab again.

This time, on the failover, instead of selecting “Create a PRTG cluster”, we’ll choose “Join a PRTG cluster” and confirm that we want to proceed.

Here, we enter the IP address of the master server and the cluster access key that we saved from before.  

And now click Join to proceed.

You get a long message now, that this failover server is configured as part of the cluster, and that you need to activate this node in the Web interface of the master node.  We’ll see in a second what this bit about activation means.

The failover node is now going to restart, so we need to wait a second for it to come back up.

When I try again, I can’t connect to the web interface anymore:  I get an error that this node has to be activated on the master node first.  So, let’s go back to the master node and activate it.

Back on the master node now, go back to the new menu item Setup/ System Administration/ Cluster.  We saw this page before, with only one server listed.  Now we see that the second server is in the list, but that it is marked as inactive.  To activate the second server, just click on Active and Save Changes.

While we’re here, also take a quick look at the IP addresses.  Since my two demo servers are on the same subnet, they can contact each other easily.  However, if your two servers are in different networks, or especially if you have NAT between the servers, you will need to edit the IP addresses that are shown here.  You need to enter the IP address that each server must use to successfully contact the other one.  So, if you have NAT running between the two servers, enter the IP address that would actually work for one server to reach the other.

Now that we’ve activated the second server, we should have a functioning cluster.  It can take 5 to 10 minutes for the two servers to complete their configuration and synchronization, but we can look at what’s happening in the meantime.

Let’s take another look at the remote probe connections to the cluster, because there are a couple of places that you may need to adjust the settings.  

First, every server that should accept remote probes needs to be configured to do so.  We did this together earlier in this video for our failover server.  However, if you forgot to permit remote probes earlier, you’ll notice it now, because you’ll see remote probes in the tree, but they show up as disconnected.  To allow a failover server to accept remote probes, you can change this setting using the Administration Tool on the failover machine.    

And, let’s also take a look at the individual remote probes, working on the master server.

When you create a cluster, the remote probes get a new configuration option in the settings tab.  So, let’s look at the remote probe settings.  If you scroll down to “Administrative Probe Settings”, you now see an option for “Cluster Connectivity”.  You can choose either that the probe sends data only to the primary master node, or that the probe sends data to all cluster nodes.

It’s important to take a look at this for *each* of your remote probes, because the default behaviour is different depending on when the remote probe was first created.  If the remote probe was created in an older software version which did not include support for remote probes in a cluster, then the default is to send only to the primary master node.  However, if the remote probe was created using a newer software version that *does* include support for remote probes in a cluster, then the default is to send to all of the cluster nodes.  So, please go through all of your remote probes to ensure they’re all configured the way you want.

So, we now have a functioning PRTG cluster, with one master and one failover.  I’d like to try one more thing with you:  I’d like to kill off the master, to see what happens. 

Before I kill it, let’s look at the status of the cluster:  (Setup/ PRTG Status/ Cluster Status).  We see both servers are running, and that the master is currently the server TRAINING.001.  I will now shutdown Training001 by stopping the Core Server service.

If I try to connect to the core server, it doesn’t respond.

So, let’s see what’s happened on our failover.

We see the same device tree as before, but now the cluster status is red.   If we look at the status page (Setup/ PRTG Status/ Cluster Status), we see that the machine 10.0.9.108 is now the master, that TRAINING001 is dead, and that the connection between the two is gone.

You will also see a yellow warning message when you start working on the failover machine: it tells you that the failover node is the current master.  And, an important piece of information:  you can perform configuration changes on the failover, but these changes will be overwritten when the primary master comes back online.  So, don’t make any permanent changes on this failover server while it is acting as the master server.  If you make changes on this failover server that you want to keep, then you will need to promote this server to be the primary master server.

There’s an important point you need to be aware of when the master is gone, and the failover server is active.  

During the time that the master server is unavailable, you will need to somehow direct your PRTG users to the failover server.  You can either give them a second URL to use, or you can use a DNS service that automatically redirects users to the second server when the master server is unavailable. Please pay particular attention to this point if you’re using maps or api calls:  the URLs for the maps and api calls use the dns name or IP address of the master server. If you need access to maps or the api during a failover scenario, you will need to ensure that the URLs get redirected to the failover server.

That brings me to the end of this video.  If you have any questions about clustering or about PRTG in general, please contact our support.

Notification Delivery:

Since either of the servers could send out notifications, you now need to ensure that both servers are able to send notifications.  Since the master server pushes its configuration out to the failover server, you need to ensure that the master is configured in such a way that the notifications will also work from the failover server.  

So, on the master, go into the Setup/ System Administration/ Notification Delivery to set up email options, the SMTP delivery and SMS delivery.

The email options and the SMS delivery will usually be the same for both PRTG servers.

However, for email notifications to work, it’s important to make sure that the SMTP servers configured here are available from both PRTG servers.  In many cases, you will need to select the option “Use two SNMP rely servers”.  As the first SMPT server, enter the SMTP server that would be used by your primary master PRTG server. Then, as the second server, enter the SMPT server that would be used by your failover server.  When the two PRTG servers synchronize, they will both have both of these STMP servers configured.  

If the master PRTG server needs to send an email, it will try the first server in the list and will send using that server.  If the failover PRTG server needs to send an email, it will also try the first server in the list, which will fail, and then it will try the second server in the list, which it can then use to send the email.