(SPECS) MODAClouds monitoring
This document is part of the SPECS deliverable D3.1, and thus superseded by it.
Overall the MODAClouds FP7 project has a broad scope, ranging from tools to support the application design phase and libraries to ease the development, to deployment on various cloud providers, followed by monitoring and self-adaptation at runtime, and even migration from one provider to another. However at run-time the accent falls on monitoring and self-adaptation, focusing mainly on performance and availability aspects. [MODAClouds]
The MODAClouds monitoring system is part of a MAPE loop (i.e., monitoring / analysis / planing / execution), which in essence follows these lines [D6.2]:
- the developer (or even the operator) of the application describes a set of performance goals, called policies; (in fact target SLA's of the application in question;)
- based on these policies, a set of monitoring rules are generated, that stipulates metrics and alert thresholds;
- then at deployment, from these rules the monitoring system enables (and configures) probes which collect the needed information;
- at runtime these probes gather and forward the readings towards a collection system;
- in parallel with the collection, a set of analyzer modules process the readings and generate alerts in case the performance goals are not met;
- in turn the alerts are consumed by the self-adaptation system, which decides the best action to be taken;
- which is then enacted by the execution system;
As such the two main components of the MODAClouds monitoring system are: the data collectors and the data analyzers [D6.3.1]. To these we add another set of components which play a secondary role: an event streaming system and a historical data store. This architecture is also depicted in the figure below.
In what follows we give a short summary of the main components, the data collectors and the data analyzers, which could be of interest to the SPECS project.
Because MODAClouds targets with equal importance both PaaS and IaaS deployments, one of the main challenges is finding a common abstraction that can work both for deployment models. Moreover taking into account that (application) modeling is one of the key elements of the design-time solution, the monitoring component should leverage the design-time information to provide better targeted and fine-grained information (i.e., a white-box approach).
As hinted in the previous paragraphs the data collectors offer the required probes to gather monitoring data for various target resources, which range from VM's, data services (such as databases or middlewares), application services (such as web services), or even application sub-components. However the main role of these collectors is to provide a uniform semantic and syntax for all the gathered data, and through the usage of RDF it even allows further semantic annotation. [D6.2]
The next figure depicts how a monitoring reading is modeled.
The actual metrics are split into a couple of categories [D6.2]:
- infrastructure level metrics, such as those offered by Amazon CloudWatch, namely readings about VM resources like CPU, memory, disk and networking; moreover this category covers also readings coming from the cloud fabric, such as the VM hypervisor or overall networking infrastructure;
- container level metrics, especially language-specific VM's (runtimes) such as those exposed by JMX;
- application level metrics, covering operation throughput, latency or error rates, or other specialized readings;
The employed technologies range from wrapping external monitoring solutions (such as Amazon CloudWatch), querying the available monitoring interface (such as performance metrics of MySQL), log parsing (such as latency in case of Apache Tomcat), or even insertion of probes in deployed code (in case of Google AppEngine deployments) [D6.3.1]:
- JMX for Java-based applications;
- collectl or SIGAR for Linux-based deployments;
- specialized MySQL data collector;
- specialized Apache Tomcat data collector;
- Flexiant (IaaS) data collector;
- Amazon CloudWatch (IaaS) data collector;
- Amazon Spot Instances cost collector;
- various Amazon EC2 cost collectors;
Based on the raw monitoring data, e second set of MODAClouds monitoring components come into play, namely the data analyzers. The scope of these range from simple data aggregation (such as window-based averages), to statistical analyzers or even forecasting analyzers. [D6.2] From an architectural point of view the data collectors and analyzers, together with their upstream clients, form a data pipeline tied together by a continuous RDF querying engine, namely C-SPARQL.
- deterministic data analyzers (DDA), which heavily leverage the C-SPARQL continuous querying capabilities;
- statistical data analyzers (SDA), which provide estimation or forecasting of monitored metrics; (the code is in large part based on Matlab;)
Besides the challenges raised by the various analysis mechanisms, we could also list the following:
- correlation of readings coming from multiple sources, but targeting the same metric; (such as response time for a particular service deployed on multiple VM's;)
- efficiently consolidating the readings coming from multiple cloud deployments, into a single coherent stream;
- enriching the raw monitoring data, and especially aggregated data, with correlated events;
Looking at the MODAClouds monitoring system from the perspective of reusing it into the context of the SPECS project, the following can be said:
- the monitoring data model should be closely investigated, as its rich semantic (being based on RDF) could be useful to convey SPECS SLA and security related information; (for example one could provide, besides the actual request execution time, also the identifier of the user requesting the operation, or even a list of sensitive resources that have been accessed while fulfilling the request;)
- unfortunately the chosen data model, or better said its serialization format, namely XML, is heavy both in terms of processing, transmission or storage; (especially since in the context of SPECS we intend to have monitoring granularity at the user / request level, and not small-window aggregates at component level;) however switching to an alternate serialization format is achievable;
- the actual data collectors could be reused, especially the specialized ones such as VM-related data collectors for providers such as Amazon or Flexiant;
- of special interest are also the statistical data analyzers (SDA), especially the estimation and forecasting ones, as they could provide alerts towards the SPECS enforcement platform, before an event actually happens;
- unfortunately the data analyzers, especially the deterministic ones (DDA), are tightly coupled to the chosen RDF continuous querying system, namely C-SPARQL; thus adopting them implies either adopting C-SPARQL, or updating (if possible) the analyzers to use alternative messaging systems; (however the statistical data analyzers are not tied to C-SPARQL;)
- obviously, having all the monitoring code available as open-source, is a big plus;