Recently I had the opportunity to troubleshoot SCOM at a client, here’s my recent experience with APM.
My client had two web servers, not joined to the domain. Both of these servers hosted an application that we wanted to instrument with APM. 01 wasn’t working with APM but was working with other OpsMgr alerts and rules. 02 was working for APM and for other alerts and rules.
- These servers are not domain joined (verified the runas account, cert thumbprint, communication port, etc…)
- The two servers are exact clones of one another ( same serialnumber listed in SCOM, not ideal but unsure if this is causing the issue or not)
- OpsMgr agent installs on both 02 – gets the instrumentation 01 – does not
- Both servers show the same discovery information from the IIS MP
- Once I setup the .NET template, here’s a list of high-level troubleshooting steps
- Created a custom group containing only web01 and scoped a new .NET monitor to the group
- On the .net template, I removed the group and flushed the health services
- Tried putting Web01 in maintenance mode for 10 min
- Verified discovered inventory on the health service
- Viewed the failed / applied rules & monitors (no failures)
- setup a new .NET template from scratch
The entire time, I’m closely watching the event logs on Web01 for any signs. Come to find out there was a bug in 2012 SP1 where, if the discovery of the server name shows up in lowercase and in uppercase throughout the OpsMgr Console it may cause APM some heartburn. So I upgraded to 2012 SP1 UR 4, which contains the fix – Read this here’s the link to UR4
Unfortunately, the issue wasn’t resolved, so I went through the steps once again with no positive result. Next I enabled Verbose tracing and created the issue again, stopped verbose tracing, and went through the logs. Decided to reapply the UR4 server update, ran LODCTR /R on Web01, waited 5 minutes, and all was fixed.