Software and infrastructure monitoring is a must in any modern environment. Monitoring not only helps you spot issues and bugs, but done correctly, it can help you optimize your software’s performance. The actual implementation, however, will differ depending on your infrastructure. For example, monitoring containerized applications requires a slightly different approach than your typical virtual machine-based monitoring. In this post, you’ll learn about container monitoring best practices.
1. Don’t Focus Too Much on Individual Containers
When it comes to monitoring, the biggest difference between containers and virtual machines is that your focus should move from individual containers to the whole cluster. Containerized applications are commonly built as microservices; therefore, individual containers will only tell you a small fraction of your infrastructure’s performance.
In some cases you’ll still need to look at specific containers’ metrics, but this is more to debug specific issues. To get the general overview of your application, you’ll have to focus on the whole cluster, or at least groups of containers.
In most cases you’ll probably run more than one container of the same microservice for availability and scalability reasons. Then again, looking at individual containers can give you misleading information. So, focus on monitoring a specific group of containers as one unit.
2. Don’t Monitor Containers Like VMs
We already mentioned monitoring containers differs from monitoring virtual machines. The biggest difference is how you should look at resource consumption. For virtual machines, it’s pretty straightforward. If the machine has, for example, two CPUs and 8GB of RAM, then you know exactly what 80% CPU and 90% RAM usage means.
For containers, it’s a little bit more complicated. By default, containers can use any available resources from the underlying host machine. So if you’d have one container of the same example machine, the percentage usage would still be essentially the same. But normally you’ll have dozens or more containers on one machine. In this case, the raw percentage usage doesn’t tell you much. You need to put that data into perspective.
In addition, when you use a container orchestration system, your containers will probably have some resource usage limits set. Then, “percentage usage of resources” is no longer the percentage of resources available on the underlying machine but percentage in relation to limits. Therefore, when monitoring for your containers’ resource consumptions, you need to ensure you understand what these values relate to.
3. Pay Attention to Network Traffic
When it comes to containers, network traffic becomes much more complex than in a traditional monolithic application. It’s important to understand the network flow of your containers and monitor them accordingly. Traditionally when you have one application running on one virtual machine and another application running on another virtual machine, you can easily grasp the idea of network saturation just by looking at network metrics of the traffic flowing between these two machines.
With containers, however, you’ll have a similar amount of network traffic flowing between containers on the same machine as between different machines. In some cases, you may even have significantly more traffic flowing from one container to another than from one machine to another. Therefore, you shouldn’t limit your network monitoring to the traffic flowing between machines. You need to monitor traffic between containers, whether they’re on the same or different machines.
Another aspect of container network monitoring is microservices usually talk to each other via REST APIs. If you monitor the HTTP response codes, you need to be aware lots of 5xx errors won’t necessarily mean customer impact. These errors can come from inter-container traffic. Of course, they should still be investigated, but importantly, HTTP traffic will no longer mean customer-facing traffic only.
Finally, since containers can create pretty complex network meshes, it’s important to have a good overview of which microservice talks to which. This isn’t too important in traditional infrastructure monitoring because you usually know where traffic is flowing based on virtual machine names or network segments. For containers, however, you should include service maps to visualize traffic flowing between microservices. This can give you a much better overview of the network traffic and help you spot traffic that shouldn’t happen or traffic that should be—but isn’t—flowing.
4. Don’t Monitor Containers Alone
With all this effort on monitoring containers, you shouldn’t make the mistake of focusing too much on containers and forgetting about the rest of the infrastructure. At the end of the day, containers are usually managed by some orchestrator and are running on some underlying servers. If you monitor only the containers alone, you’ll see only part of the truth.
For example, if you see container X is using much less memory than it should (much lower than its limit), you could start debugging the code of the microservice running in it. But the reason for the usage may be simply due to lack of free memory available on the underlying machine. It’s important to put all the containers monitoring data into perspective.
Ideally, monitor all the layers of your infrastructure with the same tool, which can help you correlate the data from different layers. And if you want to take your monitoring to the next level, you should also include logging in the same solution. Containers create a few challenges for logging, too, and log aggregation and correlation can improve your monitoring efforts. You can learn more about the benefits of combining the two in this post.
5. Don’t Overload Yourself With Useless Alerts
Containerized systems also require different approaches for alerting. When your virtual machine restarts, you definitely want to be notified. When your container restarts, however—not necessarily. Part of the container orchestrator’s job is to move containers to different nodes based on different factors. Container restarts are really not uncommon or abnormal, and you don’t need to be notified about them. You’d quickly get bombarded by alerts. However, it doesn’t mean you should stop monitoring restarts at all. If the same container gets restarted many times in a short period of time, then you should be alerted.
The same applies to resource usage. You’ll probably have an autoscaler taking care of scaling your containers up and down. You shouldn’t be notified as soon as some container starts using a lot of resources. The autoscaler will take care of that. It usually takes a few seconds or minutes for the autoscaler to do its job.
As you can see, while the general idea of monitoring stays the same, there are a few things to keep in mind when monitoring containers. In the same way you shouldn’t think of containers as “little VMs,” you shouldn’t monitor them like VMs. In this post, you learned about some general best practices for monitoring containers. If you want to learn more best practices for monitoring infrastructure in general, check out this blog post. And if you want to get your hands dirty and build your perfect monitoring solution straight away, sign up here for a free trial of SolarWinds Observability.
This post was written by Dawid Ziolkowski. Dawid has 10 years of experience as a Network/System Engineer at the beginning, DevOps in between, and Cloud Native Engineer recently. He’s worked for an IT outsourcing company, a research institute, telco, a hosting company, and a consultancy company, so he’s gathered a lot of knowledge from different perspectives. Nowadays he’s helping companies move to cloud and/or redesign their infrastructure for a more Cloud-Native approach.