With the three pillars of observability considered more or less canon for IT monitoring philosophy these days, and logging being one of those pillars, it’s important to understand the tools, systems, and solutions that help IT practitioners do a good job of aggregating, normalizing, and leveraging logs in all their forms to provide improved insight into application and system health.
The Difference Between On-premises and Cloud-based Log Management
Before diving into the tools, it’s important to clarify what’s meant by “log monitoring” for two reasons: first, because logs are present in several different forms on a variety of different systems around the enterprise. And second, those logs can be a rich source of insight for everything from security events to through application health and up to customer experience. More than that, however, is the fundamental difference in how and why on-premises logging (and thus, the tools that collect and display that data) is performed versus their cloud-based counterparts.
The differences between on-prem and cloud-based log management tools was described particularly well by Charity Majors in a recent blog post (https://charity.wtf/2019/02/05/logs-vs-structured-events/) but can be summarized as follows.
On-premises systems work on the presumption that:
- We clearly understand the systems involved in our application infrastructure.
- We clearly understand the substance and nature of the log messages being sent.
- Logging is “expensive” both in terms of hardware (disk storage) and performance (every log message takes processing time away from application processing).
- Logs are an information source of last (or nearly last) resort. They tell us forensically (after the crash) what happened.
- The systems that logging informs us about are fundamentally permanent. They don’t disappear often (or ever).
However, cloud-based log systems work on an entirely different set of presumptions:
- We don’t always (and sometimes cannot) know all the services and systems involved in our application infrastructure.
- In fact, we may not have access to all the systems or sub-systems. Therefore, the only window into the systems will be via the performance of the code.
- Disk, processing, and log performance are cheap to the point of having effectively zero cost. The processing impact of logging is completely decoupled from application processing.
- The systems in the cloud can be (and usually are) extremely ephemeral. By the time a human looks at a log, that system is likely gone.
The entire point of cloud-based logging is to be ever-present, infinitely scalable (both in terms of data storage and data flow), and have sophisticated analytics capabilities baked into the log management tool so that the messages, telemetry, and insight can be surfaced quickly and reliably with minimal human intervention.
Whew! That’s a lot of log monitoring philosophy right there… but it’s important to know because that’s what fed into this list.
WARNING: “Unlimited” Licensing Still Doesn’t Break the Law of Physics
One of the biggest mistakes I see in log monitoring implementations is the belief that a single system can scale infinitely as long as the license says it’s “unlimited.” Let’s be clear: your license level does not mean you can stuff a five-pound sack with 10 pounds of cra…ckers. So, something I advise everyone to consider is a “log file filtration layer”—especially important when it comes to trap and syslog.
Without going too deep into it, you want to make sure that whatever solution you choose will let you put multiple processing servers behind a single IP address, and balance the incoming load. Just acquired a new company and your logging doubled? No problem! Throw a few more servers behind that load balancing solution and you’re off to the races. Without some means of doing this, you are going to end up maxing out any system.
SolarWinds Security Event Manager (updated 5/24/19: formerly known as Log & Event Manager, or LEM, the tool was recently rebranded after undergoing a significant user interface and a number of other improvements) straddles the line between a “simple” log file aggregation and management platform and a full Security Information and Event Management (SIEM) solution. It combines the ability to receive messages from a variety of sources and normalize and aggregate them together with a powerful analytics engine that helps identify potentially system-impacting events. In addition, you can use Security Event Manger to validate compliance, thanks to reporting purpose-built for HIPAA, PCI DSS, SOX, DISA STIG, and more.
Kiwi Syslog Server acts as a syslog and trap receiver, using rules to filter those messages based on source, keywords, and other patterns, then processing them in a variety of ways. You can receive messages from an unlimited number of sources and have a dozen processing options at your disposal, including transparent forwarding, storing in a database, running an external program or API, and more.
Remember when I mentioned earlier about a “filtration layer” that can scale out to handle a greater number of messages? This is my top choice in log management tools for the job. With the ability to handle up to 2 million messages per hour, one installation will be more than enough for many environments. But if not, you can always add another.
A relative newcomer on the scene, Log Analyzer was released mid-2018 and acts either as an add-on to the existing suite of SolarWinds monitoring tools or as a standalone logging solution.
Focusing on syslog and trap, the strength of this tool lies in its visualization capabilities and powerful searching engine, which can filter and search past events or perform those same actions on the logs as they are received in real-time.
ManageEngine is another trusted name for monitoring professionals. With the ability to collect, manage, analyze, correlate, and search through the 700 sources of log data and handle up to 25,000 messages per second, it’s worth a look. With the ability to do forensic analysis of past events as well as leverage real-time pattern matching, it has the potential to minimize security breaches. It comes preconfigured with over 30 rules to identify brute force attacks, account lockouts, data theft, web server attacks, and more. Finally, the log parser is highly customizable.
For many IT practitioners, IPSwitch’s WhatsUp Gold is their first experience with a log monitoring tool. WhatsUp Log Management Suite is an automated tool that collects, stores, archives, and saves system logs, Windows events, and W3C/IIC logs. On top of that, it performs ongoing pattern analysis, so it can trigger alerts based on abnormal activity. The types of events tracked include access rights and file, folder, and object privileges. It can also use collected data for compliance reports for HIPAA, SOX, FISMA, PCI, MiFID, or Basel II. In actuality, WhatsUp Log Management Suite is really a set of four integrated applications:
- Event Archiver
- Event Alarm
- Event Analyst
- Event Rover
LogDNA is available in either a cloud-based or a self-hosted version, depending on your preference. It scales to “hundreds of thousands of logs per second,” generating terabytes of data per day, all the while offering complete security of that data as well as real-time log analysis. Both the company and the LogDNA product itself are SOC2, PCI, and HIPAA compliant as well as Privacy Shield certified.
Best for Real-time Analysis: Papertrail
SolarWinds® Papertrail™ cloud-hosted log management can be installed typically in under a minute and connected to the servers you want to monitor even faster than that. The most compelling aspect of this tool is the ability to do lightning-fast searches of log events in real time, as well as the live tail feature.
But the clean interface doesn’t hurt either. Papertrail lets you interact with your data via the browser, command line, or an API. That said, Papertrail is for the IT pro who’s not interested in flashy extras and wants a straightforward log file analyzer and aggregator.
Best for Long-term Trending and Analysis: Loggly
SolarWinds Loggly® log management and analysis tool was created by—and for—DevOps practitioners. It’s got robust analytic capabilities, but the UI was built to be as easy to navigate and use as possible. It was also built to efficiently cull through enormous volumes of log data, and surface trends that might be overlooked over time by other tools.
These features make it a perfect tool for the kinds of troubleshooting performed as part of customer support, rather than pure development with mocked-up data. As you would expect from such a tool, there’s a full REST-ful API that comes along with it, and therefore it can be integrated into other tools in a multitude of ways.
Best for Do-it-yourselfers: ELK Stack
“ELK” is the combination of three open-source tools: Elasticsearch, Logstash, and Kibana. Logstash processes the data on the server side and then funnels it to Elasticsearch, which provides the search and analytics capabilities. Finally, Kibana allows users to visualize that data in charts and graphs. If it sounds like the IT version of building a prefab furniture piece, you wouldn’t be far off. That said, many people enjoy the feeling of total control and ownership. And the cost (free, if you don’t count the doesn’t actual cloud hosting charges) doesn’t hurt.
Best for Security and Compliance: Sumo Logic
Originally created to be a SaaS version of Splunk, it has since evolved into an enterprise-class log management tool in its own right. With the ability to analyze log data in real-time and apply machine-learning, Sumo excels at finding root causes for specific errors or events and therefore makes it a perfect fit for the security and compliance use-case. Just remember that it’s agent-based, so that is going to add some overhead to the ongoing management and deployment.
The Mostly Un-Necessary Summary
Log monitoring is a vast and varied sub-specialty within the monitoring discipline, and there are solutions out there to fit almost any use case. If you are just beginning your search for the right tool for the job, I hope this has given you a head start. If you already have a log management tool and are either considering a change or addition, I recommend trying a log tool like SolarWinds Security Event Manager primarily due to its focus on helping IT departments easily manage security and compliance with an easy-to-use event log monitoring solution.