Collecting Performance Data From Unix Systems Using Nagios Plugins

Knowledge Base & Community Wiki

Collecting Performance Data From Unix Systems Using Nagios Plugins

in

One Possible Approach for Data collection from Unix / Linux Systems for purposes of Visualization, Forecasting & Modelling – In this section we will consider an approach for data collection from a Linux based system for purposes of Visualization, Modelling & Forecasting. The intention here is to identify what data is required and then put in place an approach that allows collection, aggregation and shipping of the relevant data to a system which will allow us to export the information for purposes of Visualization, Modelling and Forecasting.

  1. Identify infrastructure workload metrics required for data collection
  2. Identify application workload metrics required for data collection
  3. Deploy the relevant monitoring tools required for purposes of collecting data
  4. Instrument the system (Application, infrastructure, etc.) and set collection frequency
  5. Configure aggregation of data for purposes of shipping to central logging or reporting engine
  6. Ship the data to the central logging and reporting engine
  7. Export data from the central logging and reporting engine

There are many different approaches you could follow and there’s no particular hard and fast rule that you ought to follow the approach we have recommended above. The approach we have recommended above has been proven to work and consists of using tools that will not cause you to burn a hole in your pocket.

Diving Into The Detail – Now that we’ve take a high level look at the steps involved it’s time to dive in. Let’s look at each of the above steps in a bit more detail.

  1. Identify infrastructure workload metrics required for data collection – The infrastructure metrics you might want to consider here are –
    • CPU Utilization/Unit time
    • Memory Utilization/Unit time
    • Disk IOPS/Unit time
    • CPU IOWait/Unit time, etc.
  2. Identify application workload metrics required for data collection – This will vary depending on the nature of your application however, what you are really looking to collect data for is those aspect of your application that are responsible for consuming system resources. e.g.
    • Orders/Unit time
    • Messages/Unit time
    • Reports Run/Unit time
    • Transactions Submitted/Unit time, etc.
  3. Deploy the relevant monitoring tools required for purposes of collecting data – To be able to collect this data you could either deploy commercial monitoring tools for which there are a plethora of options out there or you could instead deploy open source monitoring tools. This article recommends use of “Nagios Plugins” to capture the relevant monitoring metrics which present a very low overhead.
  4. Instrument the system (Application, infrastructure, etc.) and set collection frequency – Once you have installed your monitoring tools (“Nagios Plugins” in our case) you are good to configure the monitoring tools to collect data at the relevant frequency. In our cases since we have used Nagios’s Open Source monitoring plugins we are required to write our own data collection scripts and configure crontab on Linux to run these monitoring plugins at a regular time interval. The time interval in our case is set to 30s.
  5. Configure aggregation of data for purposes of shipping to central logging or reporting engine – This is the interesting part. We have many option that you might want to consider and they include Splunk, Sumologic or ELK (Elasticsearch, Logstash, Kibana). Configuration of either of the tools is out of the scope of this article. But please refer to the other articles at this website to understand how you might go about configuring the tools to consolidate data from a given host and ship them to the central logging / reporting console.
  6. Ship the data to the central logging and reporting engine – This is where you will deal with configuration of the agents on the various monitored hosts to collect the data generated by the various Nagios Plugins being executed by Crontab and ship that data to the central Splunk, Sumologic or ELK instance.
  7. Export data from the central logging and reporting engine – Once you have data within your central console what’s required is for you to run the relevant queries and export the data into a CSV format which will be consumed by the tool for purposes of Visualization, Modelling & Forecasting.

Setup & Configuration of tools for Data Collection on Unix / Linux Systems – In the approach above we have referred to a few different tools for purposes of monitoring the system, obtaining the relevant system performance metrics and then shipping of the data to a central logging and reporting console. Let’s look at each of them in a bit more detail –

Nagios Plugins – If you are using Ubuntu Linux here are the packages you would need to install on your system. If you are using some other version of Linux, then you would need to find the equivalent packages for your system.

nagios-plugins – Plugins for nagios compatible monitoring systems (metapackage)

nagios-plugins-basic – Plugins for nagios compatible monitoring systems

nagios-plugins-standard – Plugins for nagios compatible monitoring systems

To install these packages on a Ubuntu Linux system you would use the following commands –

bash# apt-get install nagios-plugins

bash# apt-get install nagios-plugins-basic

bash# apt-get install nagios-plugins-basic

The approach that we have recommended does not need installation and configuration of the Nagios monitoring engine as such. You are free to play around with the Nagios monitoring engine if you would like but for purposes of collecting the system performance metrics we need for Visualization, Modelling & Forecasting the Nagios Plugins should suffice.

Crontab Configuration – Now that you have installed the Nagios Plugins onto your system the next step is to configure the Linux Scheduler (Crontab) to execute these plugins and generate the relevant data which can be dumped into a file that gets picked up by Splunk, Sumologic or ELK.

Here’s the commands which you will need to enter into your contrab configuration. The configuration provided below is for collection of data for system load, disk utilization, swap utilization and memory utilization every minute.

* *   * * *   root   /usr/lib/nagios/plugins/check_load -w 3.6,2.8,2.0 -c 4.0,3.2,2.4 >> /opt/perfstats/cpu_load.txt

* *   * * *   root   /usr/lib/nagios/plugins/check_disk -w 30\% -c 10\% -p / >> /opt/perfstats/disk_free.txt

* *  * * *   root   /usr/lib/nagios/plugins/check_disk -w 30\% -c 10\% -p /data >> /opt/perfstats/disk_free.txt

* *   * * *   root   /usr/lib/nagios/plugins/check_swap -w 20\% -c 10\% >> /opt/perfstats/swap_free.txt

* *   * * *   root   /usr/lib/nagios/plugins/check_mem.sh -w 95 -c 98 >> /opt/perfstats/mem_free.txt

You will need to customize the commands above for your own installation. The custom commands for your particular installation is out of the scope of this tutorial.

Sample Data CollectedHere’s a view of the data that we’ve collected for the various system parameters using the above configuration i.e. Nagios Plugins and Crontab.

CPU Load from /opt/perfstats/cpu_load.txt

OK – load average: 0.17, 0.21, 0.31|load1=0.170;3.600;4.000;0; load5=0.210;2.800;3.200;0; load15=0.310;2.000;2.400;0;

OK – load average: 0.14, 0.20, 0.31|load1=0.140;3.600;4.000;0; load5=0.200;2.800;3.200;0; load15=0.310;2.000;2.400;0;

OK – load average: 0.16, 0.19, 0.30|load1=0.160;3.600;4.000;0; load5=0.190;2.800;3.200;0; load15=0.300;2.000;2.400;0;

OK – load average: 0.16, 0.20, 0.30|load1=0.160;3.600;4.000;0; load5=0.200;2.800;3.200;0; load15=0.300;2.000;2.400;0;

OK – load average: 0.44, 0.27, 0.32|load1=0.440;3.600;4.000;0; load5=0.270;2.800;3.200;0; load15=0.320;2.000;2.400;0;

 

Disk Free from /opt/perfstats/disk_free.txt

DISK OK – free space: / 31883 MB (40% inode=85%);| /=46723MB;55590;71473;0;79415

DISK OK – free space: / 31883 MB (40% inode=85%);| /=46723MB;55590;71473;0;79415

DISK OK – free space: / 31883 MB (40% inode=85%);| /=46723MB;55590;71473;0;79415

DISK OK – free space: / 31883 MB (40% inode=85%);| /=46723MB;55590;71473;0;79415

DISK OK – free space: / 31827 MB (40% inode=85%);| /=46779MB;55590;71473;0;79415

DISK OK – free space: / 31778 MB (40% inode=85%);| /=46828MB;55590;71473;0;79415

DISK OK – free space: / 31776 MB (40% inode=85%);| /=46831MB;55590;71473;0;79415

 

Mem Free from /opt/perfstats/mem_free.txt

Memory: OK Total: 3950 MB – Used: 3661 MB – 92% used|TOTAL=3950;;;; USED=3661;;;; CACHE=2387;;;; BUFFER=357;;;;

Memory: OK Total: 3950 MB – Used: 3663 MB – 92% used|TOTAL=3950;;;; USED=3663;;;; CACHE=2387;;;; BUFFER=357;;;;

Memory: OK Total: 3950 MB – Used: 3664 MB – 92% used|TOTAL=3950;;;; USED=3664;;;; CACHE=2387;;;; BUFFER=357;;;;

Memory: OK Total: 3950 MB – Used: 3661 MB – 92% used|TOTAL=3950;;;; USED=3661;;;; CACHE=2388;;;; BUFFER=357;;;;

Memory: OK Total: 3950 MB – Used: 3661 MB – 92% used|TOTAL=3950;;;; USED=3661;;;; CACHE=2303;;;; BUFFER=357;;;;

Memory: WARNING Total: 3950 MB – Used: 3781 MB – 95% used!|TOTAL=3950;;;; USED=3781;;;; CACHE=2354;;;; BUFFER=358;;;;

Memory: OK Total: 3950 MB – Used: 3748 MB – 94% used|TOTAL=3950;;;; USED=3748;;;; CACHE=2300;;;; BUFFER=358;;;;

 

Swap Free from /opt/perfstats/swap_free.txt

SWAP OK – 65% free (646 MB out of 1003 MB) |swap=646MB;200;100;0;1003

SWAP OK – 65% free (646 MB out of 1003 MB) |swap=646MB;200;100;0;1003

SWAP OK – 65% free (646 MB out of 1003 MB) |swap=646MB;200;100;0;1003

SWAP OK – 65% free (646 MB out of 1003 MB) |swap=646MB;200;100;0;1003

SWAP OK – 65% free (646 MB out of 1003 MB) |swap=646MB;200;100;0;1003

SWAP OK – 65% free (646 MB out of 1003 MB) |swap=646MB;200;100;0;1003

SWAP OK – 65% free (646 MB out of 1003 MB) |swap=646MB;200;100;0;1003

SWAP OK – 65% free (646 MB out of 1003 MB) |swap=646MB;200;100;0;1003

SWAP OK – 65% free (646 MB out of 1003 MB) |swap=646MB;200;100;0;1003

Log file monitoring – Monitoring of system performance metrics mentioned above e.g. CPU Utilization, Memory Utilization, IO Wait, Disk IOPS, etc. is definitely required to build your performance models. This aspect of the workload that you’ve been collecting is also referred to as Infrastructure Workload.

A second and equally important aspect of your workload is the Application workload. Application workload is that part of your application that is responsible for generating work on the system and hence resulting in the consumption of compute, storage, network resources. For a web server the application workload would be web server hits / unit time while for an application server it could be shopping cart orders submitted / per unit time, etc.

Collecting of application workload data requires a similar approach and slightly different tooling. Let’s look at the following log files which have application workload data in them and let’s see what options do we have at our disposal with regards to collecting relevant application workload data –

  • Apache –
    • /var/log/apache/access.log
    • /var/log/apache/error.log
  • Varnish –
    • /var/log/varnish/varnishncsa.log
    • /var/log/varnish/access.log

The Apache log is a rich source of web hits and Apache errors. Mining the Apache access log should give us a good understanding of the rate of web server hits. Varnish is an application proxy and is mostly configured as a reverse proxy for websites to cache a lot of commonly downloaded static content. Varnish logs are similar in format to Apache’s access logs and provide a rich source of information with regards to users accessing the system.

To be able to mine the data from various log files you could potentially write your own log monitoring solution for your given application or you could just consider using one of the following log file analysis and visualization tools e.g.

  • Splunk
  • Sumologic
  • ELK (Elasticsearch, Logstash, Kibana)

Each of the above solutions offer the capability to capture data from log files by constantly monitoring the files for changes, aggregating them into a central repository and then offering the user the ability to use a query language to access visualizations, statistics, etc. for their data. These solutions also offer the user the ability to export the data from their systems in CSV format. These CSV files can then be imported into VisualizeIT for purposes of further visualization, modelling and forecasting.

Now that you have come this far you should have achieved 5 of the 7 steps documented below –

  1. Identify infrastructure workload metrics required for data collection
  2. Identify application workload metrics required for data collection
  3. Deploy the relevant monitoring tools required for purposes of collecting data
  4. Instrument the system (Application, infrastructure, etc.) and set collection frequency
  5. Configure aggregation of data for purposes of shipping to central logging or reporting engine
  6. Ship the data to the central logging and reporting engine
  7. Export data from the central logging and reporting engine

In the next section let’s look at what tools we might have at our disposal to pick up the collected data and ship them to a central instance where they can be viewed, manipulated and exported for purposes of importing into VisualizeIT.

Data Aggregation and Visualization using ELK, Splunk or Sumologic – The two of the 7 last steps can be accomplished using tools like ELK (Elasticsearch, Logstash, Kibana), Splunk, Sumologic, etc. These are very powerful tools that give you the ability to aggregate data from disparate sources into a central location and then visualize the data with the ability to export the data in a CSV format for purposes of further visualization, modelling and forecasting.

splunk_4CPU Utilization Data – Aggregated & Visualized in Splunk

 

Sumologic_2Varnish Logs (Website Visits/Hour) Data – Aggregated & Visualized in Splunk

 

For further information on how to install ELK (Elasticsearch, Logstash, Kibana), Splunk, Sumologic please refer to the relevant articles at the VisualizeIT wiki.

Conclusion – The aim of this article was to help you obtain relevant Application workload metrics and Infrastructure workload metrics through easily available, affordable and accessible tools. Like we’ve said many times before, there is no right way to collect data from the Application workload drivers and Infrastructure workload drivers that are important to you. The key is to understand what are your main Application Workload drivers, Infrastructure Workload drivers and instrument the relevant systems to obtain performance metrics which can be extracted for import into VisualizeIT for purposes of Visualization, Modelling & Forecasting.

Modelling Solution: VisualizeIT offers access to a bunch of Analytical Models, Statistical Models and Simulation Mcropped-visualize_it_logo__transparent_090415.pngodels for purposes of Visualization, Modelling & Forecasting. Access to all the Analytical (Mathematical) models is free. We recommend you try out the Analytical models at VisualizeIT which are free to use and drop us a note with your suggestions, input and comments. You can access the VisualizeIT website here and the VisualizeIT modelling solution here –VisualizeIT.

This entry was posted in   .
Bookmark the   permalink.

Admin has written 0 articles

VisualizeIT Administrator & Community Moderator