* gather metrics data from instances and store them into multiple sinks (e.g., save them in plain CSV files or in a Redis data store, or publish them to Apache Kafka)
* create/manage policies in order to prevent faults (i.e., "if the CPU utilization is higher than XX %, then clone it")