Tag Archives: NNM

Tech dive: Causal event correlation in NNMi 9.10

Another (deep) tech dive in NNMi event configuration, following up on this post. NNM consultants read on.

In my previous post I’ve described a way to create an incident (and affect node status) whenever a certain condition is detected on the node, like an interface coming up. This was done by creating a custom poller and works fine, as long as you need to monitor only one status change on the managed node. But, what happens if you need to detect two or more status changes on the same node occuring on different entities, like detecting the case when on a branch router both the main and backup lines are up?

In the NNMi world this is a “causal rule”. A causal rule is fulfilled whenever a certain number of “child incidents” are detected by NNMi. In our case, one child incident is “main line up” and the other “backup line up”. Whenever these incidents are detected for the same managed node, a causal correlation is fulfilled and a custom event is shown in the operator incident browser.

The procedure is (not so) simple. First, make sure you have created the custom pollers for the incidents you are interested in. Then, you need to create a causal correlation rule in NNMi:

  • Go to Configuration tab -> Custom correlation configuration. Click on the “Causal rules” tab on the right pane.
  • Create a new rule. Type a rule name and in the “Parent Incident” drop down entry select “New”. This will be the event that will be generated in NNMi when your causal correlation kicks in. Type a description and set its criticality.
  • Now we need to define the conditions that must occur to produce the event described above. Back in the “Causal rule” form, select as correlation nature “Root Cause” and in the “Common Child Incident Attribute” type:

 ${hostname}

This is the primary key in correlating the children incidents: If they occur in the same node, then they are correlated.

  • In the “correlation window” box below set a threshold time value to a meaningful time window. NNMi will create the correlation if the desired children incidents for the same node occur within this time window.
  • Now it’s time to define the children incident subrules that, when fulfilled, will generate our custom event. In the “Causal rule” form, create a new “child incident”. Set a proper name and in the “Child Incident” drop down, select the right incident. In our example in my previous post, this would be a “CustomPollWarning” or “CustomPollMinor” event. It is important to designate this correctly and match your custom poller type.
  • Make sure you check the “Use Child Incident’s Source Object for Parent” and “Use Child Incident’s Source Node for Parent” check boxes. Leave the “Optional Child Incident” unchecked.
  • If you have defined policies in your custom pollers, move to the pane on the right in the “Child Incident Filter” tab. Create a filter like this:
${valueOfCia(cia.custompoller.policy)} = PolicyName

This will match your policy set in your custom poller.

  • Repeat the same steps for the second child incident. The difference is that you have to leave the “Use Child Incident’s Source Object for Parent” and “Use Child Incident’s Source Node for Parent” check boxes unchecked. Again, leave the “Optional Child Incident” unchecked.
  • Save everything and close the forms.
That’s it! If it doesn’t work at the first time, experiment with the custom poller policies, correlation window and custom poller definitions. Drop me a line if you need help.
Advertisements

Use python to talk to NNMi

One more post for people who work with NNMi 9.x. NNMi, being a jboss animal, has a pretty decent WS-I API to talk to the world. The interface is documented in the developer’s toolkit guide (available from the HP documentation site, need an HP passport account to access) and open.

Accessing the API is peanuts to any Java developer, but for the rest of us, it’s not straightforward. Well, with a little help from Python, everything is possible:

#!/usr/bin/python

from suds.client import Client
from suds.transport.http import HttpAuthenticated

t = HttpAuthenticated(username='system', password='NNMPASSWORD')

url = 'http://nnmi/NodeBeanService/NodeBean?wsdl'

client = Client(url, transport=t)

# Retrieve full node list
allNodes = client.service.getNodes('*')

print "Nodes in topology:", len(allNodes.item)

for i in allNodes.item[:]:
  print i.name,i.deviceModel

This small script connects to the NNMi NodeBean web service, retrieves the full list of managed nodes to populate the ‘allNodes’ object and from there it prints out the hostname and device type as discovered from NNMi. What you need is the suds library (available here, or installable with a few clicks in the Ubuntu software center).

Tech dive: Custom incidents in HP NNMi 9.10

This post is a tech dive into HP NNMi 9.10, intended to illustrate a way to create custom incidents. People that make a living from tuning and playing with NNM should find it rather interesting, others are encouraged to seek amusement elsewhere….

NNMi 9.10 is a substantial progress from NNM editions before 8.xx. It is an entirely new implementation, based on JBoss and designed as NNM should have been all along: Multithread, multiuser, with a decent database and an open (web services) API. Being new, certain things are done in a different way, and one of them is generating custom incidents.

Recently I was asked to implement in NNMi 9.10 a way to create an event and change the status of a network node whenever a certain condition occured. The nodes in question were branch office routers with ISDN interfaces as backup lines and the condition was the activation of the ISDN interface. The customer wants their NOC to be alerted whenever a branch office router activates the ISDN interface when the primary line goes down. The catch here is that a switch to the ISDN backup line is not regarded as an event from the NNMi perspective, since whenever the router detects that the primary route goes down, it turns it administratively down and brings up the ISDN interface, so there is no fault: The router is polled normally from the ISDN side from NNMi and no incident is created, the node remains green.

In previous versions of NNM, it was possible to change the internal OvIfUp event of NNM so that it triggered an external action, a perl script that manually changed the node status and created an NNM event. With NNMi 9.10, this is no longer the case. So, what do we do now?

The first step is to go to the configuration menu -> Custom poller configuration and create a custom poller collection, as above. Enable the custom poller and create a new one, with the name “ISDN Poller”.

Above is the poller creation form. The important stuff is the MIB expression and the MIB Filter variable. What we want is to check the operational status of the ISDN interface: The algorithm that the poller should do is to poll the interfaces of the router, filter out the ISDN interface via the “ifDescr” MIB variable and then check the value of the operational status of that interface. If this is “Up” the router should be set in the “Major” state. This is shown in the form above: The MIB filter is set to ifDescr and we select to poll the ifOperStatus object from the “MIB Expression” field: We create a new expression, select “ifOperStatus” from the MIB mgmt-2 interfaces tree and set the node status to “Major”.

The next step is to make this poller work for us, so we need to bind it to a policy, as shown below. Go to the “Policies” tab and create a new policy. Select the node group that you want the new custom poller to be applied to and type the MIB filter. The MIB filter will be matched to the “MIB Filter Variable” of the previous form, in our case, this is “Dialer1”, which is set to the router as the ISDN interface.

That’s it. After this configuration, the custom poller will be activated according to the defined policy: It will run only for the node group (Collection) you have specified, will poll only the interfaces that their description is “Dialer1”, which in our case are ISDN interfaces, and whenever the Operational Status (ifOperStatus) of these interfaces is set to “1”, which is “Up”, a new incident will be created and the node will turn orange on the map (Major status). Straightforward? No. Does it work? Yes.