Sunday, April 5, 2015

Redis for Service Monitor Visibility

From #redis on FreeNode (made anonymous and more succinct):
Q: Is redis a good choice if I need a solution to order my result and filter it, or should I use mysql for this job?  I'm developing an interface to show Nagios status information and for this I need to order by state (up, down, unknown) or state since and so on.  I played with Memcached, then found Redis, but I'm not sure if it is a good idea to use Redis for a job like this.

An answer lead to this being implemented in MySQL instead.  So the original poster added a desire to protect MySQL from 12K inserts (without specifying the period).

12K inserts per minute is barely doable for a single MySQL server with the table being indexed for the queries specified.  However, with doubt about the period, a MySQL solution will likely need to be modeled to match the write and read patterns as well or performance will suffer in time.  So data modeling should be performed.

A: For monitoring data, as with all data in Redis, model the data as it will be accessed.  There is highly likely a tool that already provides the feature set of monitoring, likely using a timeseries data store (see Circonus instead of Nagios).  However, this seems like a good case for understanding how to model data when using Redis.

I'd imagine the following queries: (1) What host/service are in the state {up|down}?  (2) What hosts are hosting service?  (3) What is the data for a host/service, such as {host, service, os_version, service_version, state, last_monitor_time, last_up_time, last_down_time}?  (4) What are the last n status monitor messages received for a host/service?  In short, 1) is a couple sets for {all|down} with {up} being derived by diff, 2) is a set, 3) is a hash, 4) is a list of finite size left pushing + right popping on insert.

In longer form, (Using #{key_part} to indicate templated values)...
1) could be modeled as sets "mon:hostservices:all" and "mon:hostservices:down" with "mon:hostservices:up" being achieved by a set difference and the host/service being either added or removed from the "mon:hostservices:down" when a monitor event is processed.  Items in the set are in the form of the identifier used in the item's data (see 3).
2) is modeled similar to "...all" in 1 and achieved by adding to the "mon:hostservices:#{service}" when a monitoring event is processed, adding the host as the item value.  The interesting bit is how host/services are removed.  A reaper process could be employed to iterate over all host/services, removing "mon:hostservices:all" and "mon:hostservices:down" and "mon:hostservices:#{service}" set entries that have a last_monitor_time (see 3) that is expired (longer ago than considered active).
3) could be modeled as a hash "mon:hostservices:#{host}:#{service}" with the fields set straightforwardly from the monitor event, with the exception of last_up_time or last_down_time which the field is chosen based on whether the event is an up or down state indicator.
4) could be modeled as a list "mon:hostservices:#{host}:#{service}:events" with finite size, pushing the message on the left-hand side while popping the right-hand side of the set.

The above model is intentionally simplified.  Additional queries may also be desired, such as service availability should be modeled as timeseries counters for downtime.  Timeseries counters are may be implemented using zset with a partitioned timestamp as the member and the score being the counter value.  The key for the zset in this case would be "mon:hostservices:#{host}:#{service}:counters:downtime".  The downtime counter would be incremented whenever a monitoring event indicated a downtime and the increment by would be the monitoring period.  Again, a reaper may be employed to remove data that is older than desired to be retained.

While this may be a good case for understanding data modeling in Redis, the cost of development and operating an in-house developed solution is highly likely more than the cost of licensing and implementing a monitoring service that provides the desired functionality.  The benefit of modeling, though, should help in selection, so is a fruitful exercise.