Category Archives: networking

Tech dive: Causal event correlation in NNMi 9.10

Another (deep) tech dive in NNMi event configuration, following up on this post. NNM consultants read on.

In my previous post I’ve described a way to create an incident (and affect node status) whenever a certain condition is detected on the node, like an interface coming up. This was done by creating a custom poller and works fine, as long as you need to monitor only one status change on the managed node. But, what happens if you need to detect two or more status changes on the same node occuring on different entities, like detecting the case when on a branch router both the main and backup lines are up?

In the NNMi world this is a “causal rule”. A causal rule is fulfilled whenever a certain number of “child incidents” are detected by NNMi. In our case, one child incident is “main line up” and the other “backup line up”. Whenever these incidents are detected for the same managed node, a causal correlation is fulfilled and a custom event is shown in the operator incident browser.

The procedure is (not so) simple. First, make sure you have created the custom pollers for the incidents you are interested in. Then, you need to create a causal correlation rule in NNMi:

  • Go to Configuration tab -> Custom correlation configuration. Click on the “Causal rules” tab on the right pane.
  • Create a new rule. Type a rule name and in the “Parent Incident” drop down entry select “New”. This will be the event that will be generated in NNMi when your causal correlation kicks in. Type a description and set its criticality.
  • Now we need to define the conditions that must occur to produce the event described above. Back in the “Causal rule” form, select as correlation nature “Root Cause” and in the “Common Child Incident Attribute” type:

 ${hostname}

This is the primary key in correlating the children incidents: If they occur in the same node, then they are correlated.

  • In the “correlation window” box below set a threshold time value to a meaningful time window. NNMi will create the correlation if the desired children incidents for the same node occur within this time window.
  • Now it’s time to define the children incident subrules that, when fulfilled, will generate our custom event. In the “Causal rule” form, create a new “child incident”. Set a proper name and in the “Child Incident” drop down, select the right incident. In our example in my previous post, this would be a “CustomPollWarning” or “CustomPollMinor” event. It is important to designate this correctly and match your custom poller type.
  • Make sure you check the “Use Child Incident’s Source Object for Parent” and “Use Child Incident’s Source Node for Parent” check boxes. Leave the “Optional Child Incident” unchecked.
  • If you have defined policies in your custom pollers, move to the pane on the right in the “Child Incident Filter” tab. Create a filter like this:
${valueOfCia(cia.custompoller.policy)} = PolicyName

This will match your policy set in your custom poller.

  • Repeat the same steps for the second child incident. The difference is that you have to leave the “Use Child Incident’s Source Object for Parent” and “Use Child Incident’s Source Node for Parent” check boxes unchecked. Again, leave the “Optional Child Incident” unchecked.
  • Save everything and close the forms.
That’s it! If it doesn’t work at the first time, experiment with the custom poller policies, correlation window and custom poller definitions. Drop me a line if you need help.
Advertisements

Use python to talk to NNMi

One more post for people who work with NNMi 9.x. NNMi, being a jboss animal, has a pretty decent WS-I API to talk to the world. The interface is documented in the developer’s toolkit guide (available from the HP documentation site, need an HP passport account to access) and open.

Accessing the API is peanuts to any Java developer, but for the rest of us, it’s not straightforward. Well, with a little help from Python, everything is possible:

#!/usr/bin/python

from suds.client import Client
from suds.transport.http import HttpAuthenticated

t = HttpAuthenticated(username='system', password='NNMPASSWORD')

url = 'http://nnmi/NodeBeanService/NodeBean?wsdl'

client = Client(url, transport=t)

# Retrieve full node list
allNodes = client.service.getNodes('*')

print "Nodes in topology:", len(allNodes.item)

for i in allNodes.item[:]:
  print i.name,i.deviceModel

This small script connects to the NNMi NodeBean web service, retrieves the full list of managed nodes to populate the ‘allNodes’ object and from there it prints out the hostname and device type as discovered from NNMi. What you need is the suds library (available here, or installable with a few clicks in the Ubuntu software center).

Tech dive: Custom incidents in HP NNMi 9.10

This post is a tech dive into HP NNMi 9.10, intended to illustrate a way to create custom incidents. People that make a living from tuning and playing with NNM should find it rather interesting, others are encouraged to seek amusement elsewhere….

NNMi 9.10 is a substantial progress from NNM editions before 8.xx. It is an entirely new implementation, based on JBoss and designed as NNM should have been all along: Multithread, multiuser, with a decent database and an open (web services) API. Being new, certain things are done in a different way, and one of them is generating custom incidents.

Recently I was asked to implement in NNMi 9.10 a way to create an event and change the status of a network node whenever a certain condition occured. The nodes in question were branch office routers with ISDN interfaces as backup lines and the condition was the activation of the ISDN interface. The customer wants their NOC to be alerted whenever a branch office router activates the ISDN interface when the primary line goes down. The catch here is that a switch to the ISDN backup line is not regarded as an event from the NNMi perspective, since whenever the router detects that the primary route goes down, it turns it administratively down and brings up the ISDN interface, so there is no fault: The router is polled normally from the ISDN side from NNMi and no incident is created, the node remains green.

In previous versions of NNM, it was possible to change the internal OvIfUp event of NNM so that it triggered an external action, a perl script that manually changed the node status and created an NNM event. With NNMi 9.10, this is no longer the case. So, what do we do now?

The first step is to go to the configuration menu -> Custom poller configuration and create a custom poller collection, as above. Enable the custom poller and create a new one, with the name “ISDN Poller”.

Above is the poller creation form. The important stuff is the MIB expression and the MIB Filter variable. What we want is to check the operational status of the ISDN interface: The algorithm that the poller should do is to poll the interfaces of the router, filter out the ISDN interface via the “ifDescr” MIB variable and then check the value of the operational status of that interface. If this is “Up” the router should be set in the “Major” state. This is shown in the form above: The MIB filter is set to ifDescr and we select to poll the ifOperStatus object from the “MIB Expression” field: We create a new expression, select “ifOperStatus” from the MIB mgmt-2 interfaces tree and set the node status to “Major”.

The next step is to make this poller work for us, so we need to bind it to a policy, as shown below. Go to the “Policies” tab and create a new policy. Select the node group that you want the new custom poller to be applied to and type the MIB filter. The MIB filter will be matched to the “MIB Filter Variable” of the previous form, in our case, this is “Dialer1”, which is set to the router as the ISDN interface.

That’s it. After this configuration, the custom poller will be activated according to the defined policy: It will run only for the node group (Collection) you have specified, will poll only the interfaces that their description is “Dialer1”, which in our case are ISDN interfaces, and whenever the Operational Status (ifOperStatus) of these interfaces is set to “1”, which is “Up”, a new incident will be created and the node will turn orange on the map (Major status). Straightforward? No. Does it work? Yes.

Your Virtual Cisco IOS

Want to play with IOS but you don’t have a catalyst around? Try this. GNS3 is a marvelous and clever frontend to dynamips, dynagen and qemu which allow emulation/execution of IOS and JunOS code under a third operating system. That is, Cisco and Juniper virtualized on your desktop.

GNS3 topology of our virtual lab

What you will need is a decent PC with lots of memory (4GB to start with plus a fast CPU) and IOS/JunOS software images. The first is easy to do, for the second you will need access to licensed software or the actual hardware itself. My recommendation for the OS is Ubuntu with readily donwloadable and installable packages of all bits and pieces (# apt-get install gns3), however windows works just fine but with the 4GB constrains for a 32bit OS. Cool screenshots here.

How it works in a few words: MIPS and PPC based hardware (Cisco 26xx, 36XX, 37xx, 72xx) is emulated via dynamips running the IOS image unchanges. JunOS on the other hand is emulated with qemu using Olive, a stripped down version of JunOS, sort of an SDK. You design the topology via a snappy GUI (that is, GNS3), configure your virtual gear and then GNS3 fires up the emulators underneath. CPU and mem usage go skyhigh, but then, you have your own virtual private lab. Communication with the real world (the wire) is done via tap and bridge interfaces. Using a sniffer you can actually see real packets (with Cisco MAC prefixes and stuff) from your virtual devices swimming in your LAN.

What will work: All popular Cisco IOS devices with most linecards, JunOS Olive.

What will not work: Virtualizing dynamips itself is tricky. The emulator engine will work in a virtual host savagely consuming virtual CPU and memory resources, yet, the forged MAC addresses may not exit your hypervisor virtual switch. In vSphere, *sometimes* dynamips could emit packages only to other virtual machines running on the same ESX host, but this was not always the case… Also, note that performance is sluggish, so use GNS3 only as a demonstration and lab tool.