Riemann is an up and coming event stream processor. It is one of those projects that you take a look at and think this is nice, but not ready yet. A year has passed since I made that observation. This is a project followed by high profile monitoring folks. This very dynamic system has a similar philosophy as Graphite. Riemann filters, combines and acts on flows of events.
I like elegant. Riemann is very elegant, it acts as a stream processor like ESPER, but built with a monitoring angle to make it more intuitive and easy to work with. Riemann listens and acts, it does not have a scheduler like Shinken, Zabbix or OpenNMS. It depends on application libraries, clients, scripts or agents to forward data to it.
A lot of the things I have privately ranted about in Shinken, which don’t take me wrong, I am a great fan of, are gracefully handled in Riemann. Notably the way rules are defined/managed, including tags as part of the event stream, full use of all cpus, not having to predefine all possible events to monitor them, not interrupting monitoring flow to load new rules, streamed updates to dashboards, flexible dependency model and more.
The open-source Swiss knife of the future is really shaping up with Riemann dealing with events, Graphite for time-series and logstash for logs (I am a fan of Splunk, but there is no current integration between the two).
For internal linux/UNIX, applications and cloud-type application monitoring Riemann is IMO a viable proposition with version 0.2. For general purpose monitoring, not so much for network devices and Windows. [update: Riemann 0.2.2 is a solid proposition, with many core and dashboard improvement]
What Riemann needs to succeed in general purpose monitoring
For it to succeed in general purpose monitoring I believe it needs :
- Integration with a solid Windows agent to receive streamed data (like statsd or diamond) but for Windows; update: Riemann-health for windows Seems like a good start, but modifying it requires to modify C# code, recompile and redeploy. :-/ Support for things like process monitoring, executing local commands, scripts and more
- Integration with a solid SNMP polling system that would use Riemann as its back end; Collectd is there, but alternatives would be welcome. Something like a stripped down version of Cricket that just sends data to Riemann and Graphite.
- Professional full time backing (Kyle Kingsbury, the main author just accepted a full time job…) Good for him, but so-so for the project. More developers are contributing to the project which is good. I could envision starting a company to implement Riemann based systems
- A dashboard element for displaying streamed text events/logs (On the road map, per Kyle K.)
- A dashboard element or alternate dashboard for displaying graphical status data. Based on vector graphic backgrounds and status widgets sitting on top of the image to show changes. Think of Nagvis and the plethora of industrial HMIs like Wonderware, OSIsoft PI processbook, etc.
- Ability to drill down into dashboards to view dependency relations (Think Big Brother/Xymon dashboard drill downs) Though this may be impossible simply due to the nature of how Riemann works?!
- A security model for the data acquisition and API
- Cookbook type recipes that can be used to to understand how a typical monitoring workflow might handle calculations, thresholds, rate limiting, dependencies(parent/child), state changes and finally output to graphite/request-tracker/email/pagerduty. Single examples are nice, but how does it *really* come together and how to maintain it.
- Having some place to store and retrieve textual events
- Role based controls for the dashboard
- An HA strategy
Side note: Riemann is a project with immense potential for lightweight industrial controls monitoring. Having a listening module for modbus or even OPC UA would really light a fire under the archaic beasts that the industrial monitoring/controls suppliers are using. Ok, getting back to Windows.
Windows sources of events
Windows operating system status and events are typically monitored via these methods
- Windows event log
- Can be handled via agents like NSCA++, OSSEC, SNARE, custom agents for SIEM/logging software which would need to be able to forward the data to Riemann in a second step
- Windows PDH API (performance monitor counters)
- Handled by local agents (riemann-health-windows) or via WMI, but this becomes polling based monitoring
- Custom event logs
- Can be handled via agents like NSCA++, OSSEC, another SNARE agent, custom agents for SIEM/logging software which would need to be able to forward the data to Riemann in a second step
- External monitoring via the network of services
- Centralized polling from collectd could do the job but alternatives would be nice;
- Shinken/Munin/other as the active poller and then streaming events to Riemann to dashboard??
- Scripted monitoring via a local agent
- Local agent, like ???
A great project needs a leader and contributors
This project has great potential and I sincerely hope that the main developer continues to work on it and that companies continue to contribute (money, resources, dev offers, etc.) to his effort to develop it.