This post has been republished via RSS; it originally appeared at: Microsoft MVP Award Program Blog.
Editor's note: The following post was written by now former Office Servers and Services MVP Hilton Giesenow as part of our Technical Tuesday series. Albert Duan of the MVP Award Blog Technical Committee served as the technical reviewer for this piece.
Given my complete lack of knowledge and understanding of physics, I ordinarily wouldn’t venture far into the realm of thermodynamics in an IT article. But considering that my 5 year old recently assured me she’s certain her dad is much smarter than your average rocket scientist, I’m willing to go out on a limb.
The definition of entropy - which one of the laws of thermodynamics - entails an ever-increasing decline into disorder. And as with the universe, it’s also so with our software systems. All systems will eventually exhibit failures is they’re left to themselves and given enough time. Our responsibilities as administrators of these systems is to correct these failures, of course. But it’s also to prevent them from occurring altogether.
That’s where monitoring steps in.
In earlier times, it was easy enough to keep an eye on a single server or small group of servers manually. But these days even a reasonably small organization can easily have tens - if not hundreds or thousands - of servers that need to be monitored. This will likely be the case for some time to come, even in an IaaS cloud scenario.
This means that having a decent monitoring tool or group of tools is critical. However, there are a wide variety of such tools available, which cover all manner of aspects of our environments - from servers to workstations, to various other devices and networking gear. Given this heterogenous nature, it’s not easy to decide on the appropriate tool.
There are many good articles online that review, assess and compare these tools. However each company and situation is slightly different. One way to deal with the challenge of choice is to identify the elements that are essential in your environment, and to match the available options against these.
The purpose of this article is to present and examine these important elements to assist you in choosing an appropriate tool. The elements that follow will hopefully help you to make informed, educated decision - and possibly even provide some additional elements that had not yet been considered by your company.
You should consider the scope of what you need to monitor. Are you looking just at operating system and application monitoring, or do you also need to monitor the network and its health every step of the way? Are your colleagues mostly in a single location, on the same site as your servers, or is your company's workforce largely mobile, possibly utilizing mostly cloud services? Are they mostly desk-bound, but spread across multiple geographic and legislative regions? Understanding the breadth of monitoring required may help you choose a particular tool. But it may also make you lean towards multiple tools – some more broadly network focused and others more server, workstation, or application friendly.
Do you need a tool with smarts in a particular area, or just one with general needs? If you need particular depth - such as in Microsoft CRM or SharePoint - then ask yourself whether the tool offers sufficient insight into the health of the sub components (like the Search subcomponents or distributed cache cluster members in SharePoint) or if it only provides basic Windows event log, memory, disk and drive space monitoring. If your company has critical internal custom web applications, can the tool monitor end-to-end transactions and the health of individual subcomponents like web services and message queues?
The mix of platforms, devices, operating systems and vendors can be extremely diverse in certain organizations, whereas in others it remains locked down and uniform. Do your needs include both Unix and Windows servers and workstations? How about Apple desktops and devices? Are work or even personal phones allowed on the network at all? If so, some or all of these may need to be monitored in some way.
When selecting any tool, it’s important to consider the support offered by the vendor. This could be particularly relevant if you’re looking at an open source tool vs. a paid one based on the same engine, for instance. In addition, does the vendor offer on-site support? Do they offer this in all your company’s operating regions? Is there at least support in your time zone? What about your language? If you’re choosing a popular tool, this may not be as critical.
In many cases, monitoring multiple platforms requires a component to be installed directly onto each node. Agents often appear as Windows Services or similar, but they can also take platform-specific forms - such as stored procedures in SQL Server or WSP solutions in SharePoint. This can provide greater insights as these components - or ‘agents’ - can give unfettered access to any aspect of the system in question and can monitor on a more extensive basis. For instance, dedicated agents can assess CPU usage more consistently and over a longer period, than a point-in-time scanning tool that investigates less frequently.
However like any piece of software, these agents need to be installed and upgraded over time. This can mean a lot of work, particularly in a large environment. Furthermore, given the privileged access they require, the agents themselves could potentially prove to be a security attack vector. And not only do the agents require maintenance, troubleshooting and upgrades, but they can hamper platform and application maintenance, patching and upgrades of the target platform itself due to potential conflicts and incompatibilities.
As an alternative, certain tools instead employ an “agentless” approach, whereby nothing needs to be physically installed onto the target system, and instead a remote execution approach is utilized. A tool that users PowerShell Remoting would be an example of this in the Windows space, as would other WMI-based tools. One added value in a PowerShell scenario is that the ‘Just Enough Administration’- or JEA framework - can be overlaid to reduce the attack surface.
A monitoring tool needs to be able to report on overall health, urgent warnings and errors, utilization and capacity trends. However the monitoring tool may send too many unnecessary notifications, particularly when trouble strikes in the monitored platform. This could be particularly troublesome if these notifications incur cost, like SMS transmissions. As such, it is worth exploring and comparing the tools under investigation in terms of their ability to tune the volume of communication.
Most monitoring tools can send email alerts, popups, SMS, and alerts through pagers, etc. When choosing a tool, ensure that it can utilize your required channels -especially as useful new technologies and methodologies arise like ChatOps. You might want a tool that can send push notifications or post to a Slack or Office 365 Teams channel. If so, include these criteria in your evaluation. Also examine if there’s extra cost for these, or if they require 3rd party services like SMS communication often does.
As many - perhaps even most - organizations explore where the cloud fits into their IT agenda, it is worth examining what prospective tools can offer your organization. Do you want to use the cloud just for email, or for other critical and real-time services and repositories?
It’s here internet access monitoring becomes more important, as do tools that can scan and access the availability of these cloud offerings themselves. The vendor may provide a dashboard to make the service highly available, but this won’t necessarily tell you that a particular region - or a subcomponent your company relies upon. So a 3rd party monitoring tool can help identify and manage issues.
Installation and Maintenance Complexity
Each tool has its own installation and operational complexities and requires expertise. However some are more complex than others. It is worthwhile to compare a tool’s capabilities with the skill level and experience your team possesses to handle those capabilities. The size and complexity of your environment may also dictate and justify whether administration of the monitoring tool or tools is a part, or full-time job. Some tools - while extremely powerful - are far from simple to configure and maintain. Monitoring tools that are cloud based can alleviate some of this, but perhaps might raise other security and connectivity concerns.
Certain tools may function perfectly well over a LAN, but poorly over a WAN or with occasionally connected devices. It is worth comparing your organization’s regional and geographical footprint against what the tool can offer, what it requires, and how it functions. This in turn ties into the two previous points – a cloud-based monitoring tool could be an important part of the overall monitoring story, requiring less work to install and maintain but also better support for mobile devices and cloud services.
Ticketing System Integration
Just like with notifications, it is worthwhile to explore whether the monitoring tool integrates with your IT organization’s helpdesk or ticketing system. If so, incidents can be automatically tracked, appropriately followed up, discussed and even assessed for frequency of occurrence. This can help reveal faulty hardware, software and devices that should be replaced to improve operational efficiency.
Some tools can report on things in real-time. While this is useful, other tools provide a record of certain key metrics over time. This lets the tool provide visibility into trends and even assist in ensuring sufficient capacity of resources like drive space, CPU and memory over time and with sufficient warning.
Making the choice
This article has intentionally avoided promoting any particular tool or vendor. Rather it has sought to mention some important points to consider in your environment, keeping in mind both current and future states. But ultimately, your organization will need to make a tool selection. It might turn out that no one single tool quite fits the bill or matches every need. In this case, you may well end up with a few tools at play. Some might be on premises and some even in the cloud, some paid and perhaps even some free or open source. That’s certainly feasible, provided you have the capacity and ability to manage each tool appropriately.
That said, the selection process itself should ideally be quantitative rather than qualitative. To this end, it will be worthwhile to compile a list of your key requirements versus “nice to haves” and to weigh the relative importance of each. The next step is to score and rank each tool in question against the model. That way you will have a clearer ability to choose, to explain your choice, and to re-assess your decision over time as your environment changes, and the the tools themselves develop.
Hopefully this article has provided some useful topics and considerations to help you choose an effective and appropriate monitoring tool that will in turn help you to provide a more stable and better managed environment to your users. Thankfully if all else fails, at least there’s PoShMon, a fully functional PowerShell agentless monitoring tool available on GitHub.
*Note: I’d like to give a big thank you to the colleagues who have provided some important additional perspectives and insights for this article, in particular Lourens Pienaar and Mark Prevost.
Hilton Giesenow is an industry veteran of almost 20 years and a former 12-time Microsoft MVP with experience across varied IT, development and consulting roles, geographies and client industries and types. These days Hilton can mostly be found helping customers understand and craft strategies to successfully plan for, implement and adopt Office 365, SharePoint and Azure. You can find his SharePoint podcast at http://www.TheMossShow.com/ (now retired) and company details at http://www.expertsinside.com/.