Monday, 13 May 2013

Monitoring Clouds with Ganglia- Part 1

Monitoring Cloud Instances with Ganglia

Monitoring Cloud environments

Today’s modern IT environment consists of a vast pool of heterogeneous infrastructure resources that can be present either on-premise or in the Clouds. Monitoring such vast environments can prove to be a real challenge owing to the ever increasing complexity and growth of resources that occurs at a really fast paced scale.

A typical Cloud-based Data Center can comprise of thousands of physical servers spread across multiple geographic locations, each running a hypervisor with one or more virtual machines on it. Each of these virtual machines further connects to a virtual or physical switch, storage arrays etc. Real-time performance monitoring and visibility becomes quite essential in such diverse and complex environments. 

For this reason, we need a scalable and robust tool that can effectively monitor dynamic workloads on the Clouds. Enter Ganglia..
Introducing Ganglia

Ganglia is an open-source monitoring tool that was initially designed to monitor high performance computing systems such as grids and clusters. It is highly scalable by design and allows IT admins to get a complete holistic view of the IT environment’s performance.

Ganglia leverages multicast based protocols to listen and advertise the state of the machines within the cluster it is monitoring. It uses XML for data representation, XDR (External Data Representation) format for transporting the metric data and an open source storage and visualization tool in the form of RRDtool. A combination of these tools and frameworks makes Ganglia a truly concurrent and robust monitoring tool.

The Ganglia monitoring system collects and processes metric data using two daemons or services; namely gmond and gmetad.

(Figure1-Ganglia Architecture)

gmond (Ganglia Monitoring Daemon)

gmond is a simple daemon that runs on every host that has to be monitored within a cluster. It is designed to have very little overhead on the host it is monitoring. It is very easy to install and configure and supports Linux as well as Windows operating systems (as per its latest release v3.5.7).

gmond primarily monitors and announces state change of the host to the gmetad daemon using XML over unicast or multicast channels.

gmetad (Ganglia Metadata Daemon)   

This daemon is primarily responsible for polling gmond daemons across a specific cluster, gather the XML data, parse it and save the data in the round-robin database (RRD). This data can then be advertised to a client over a TCP socket. gmetad daemon is designed to collect metric data from multiple gmond as well as other gmetad daemons. 

This type of scenario is best suited in Cloud environments where we can have multiple cluster of servers spread across geographical regions. Each cluster can have at least one gmetad daemon that polls the cluster state to a central gmetad daemon which is responsible for data aggregation and presentation.

Ganglia also comes with a unique PHP frontend that enables IT admins get a complete diagnostic and real-time information of the state of your clusters. And since Ganglia stores metric data over a period of time, the IT Admin can now view historic data of the clusters and hosts ranging from past hour, day, week, month and even year.

In the next PART of this series we are going to install and configure Ganglia for monitoring the compute instances launched in Eucalyptus Private Cloud... so stay tuned !!