Wednesday, 24 August 2016

Analyzing Apache Logs using ELK

Analyzing Apache Logs using ELK

In our previous post, we looked at few simple steps to get an ELK stack up and running on an EC2 instance in AWS. In this tutorial I'll be showing you how to setup a simple Logstash pipeline to send your Apache web server logs over to the ELK instance.

To get started, you will need an AWS instance with Apache web server installed in it. In my case, I have launched an instance (hostname:: client1) in the same region and AZ as that of my ELK server instance.

The first thing that you should always do before you start working with your EC2 instances is to update its packages. In my case, Im using a Ubuntu 14.04 AMI so the commands will be Ubuntu native.
# sudo apt-get update

Next, install the Apache web server packages if not already installed.
# sudo apt-get install apache2


With Apache installed, let's run a simple test to check whether Apache's error and access logs are working as expected or not. In the same terminal, type in the the following command to simulate someone accessing your web server:
# curl localhost

NOTE: You can alternatively open up a web browser and type in your Apache Web Server instance's Public IP address or DNS as well.

You should see a similar output with a bunch of <html> tags. This is actually the welcome page of your Apache web server.


Next up, tail into your Apache web server's access logs and you should see one entry created as shown below:

# sudo tail /var/log/apache2/access.log 

This means that our Apache web server and its logs are working.


Next up, we install Logstash on our client Apache instance. Logstash will basically act as a spout consuming the apache access logs from the client, processing them based on certain filters and then forwarding them to our previously created ELK instance for further analysis.

# sudo apt-get install openjdk-7-jre-headless

NOTE: You can optionally install Oracle Java here as well instead of using the default OpenJDK packages.


Verify whether Java was successfully installed or not:
# java -version


With Java installed, we can now move forward with Logstash's installation. First we add the 
# echo "deb http://packages.elastic.co/logstash/1.5/debian stable main" | sudo tee -a /etc/apt/sources.list

NOTE: You can substitute 1.5 with 2.1 to install the latest Logstash packages.


Make sure you update the packages on the system using the update command:
# sudo apt-get update


Install the Logstash package:
# sudo apt-get install logstash


For this tutorial, we will use Logstash to forward the Apache web server instance's access.log file to our ELK server.
Here's a snippet of the configuration pipeline:

# sudo vi /etc/logstash/conf.d/apache-access.conf


input { 

  file { 
    path => '/var/log/apache2/access.log' 
    } 
  } 
filter { 
   grok { 
     match => { "message" => "%{COMBINEDAPACHELOG}" } 
     } 
   } 
output { 
   stdout { codec => rubydebug } 
   }


The snippet file is pretty straightforward. The conf file will watch the Apache access log file (path) for events, parses them with a special filter pattern using grok (COMBINEDAPACHELOG) and will print those events to standard console output (stdout).



Run Logstash using the following command:

# sudo /opt/logstash/bin/logstash -f \
/etc/logstash/conf.d/apache-access.conf

You should see the message as "Logstash startup completed". 


To test whether its working, open up a new terminal window for your Apache web server instance and try the "curl localhost" command once again. You should see logs getting printed on the terminal as shown below: 


Now to send these logs over to the ELK server instance, all you need to do is edit the apache-access.conf file and update the output section as shown below:

output {
   elasticsearch { "host" => "<IP_ADDRESS>" } #Provide the IP address of your ELK instance
 }


Restart Logstash to make the changes take effect:
# sudo service logstash restart


Open up Kibana and hit the Discovery tab as shown below. You should now see your Apache web server access logs streaming in as shown below: 



Well that's about it from this post for now.. in the next post, I'll be showing you how to setup a Production scale ELK server on AWS using EC2 instances, so stick around.. more coming your way soon!