Removing spikes from RRD databases

rrdtool logoRRDs are fixed size databases for storing time series data. They collect information given to them and normalize it to permit trending over long periods of time.

Spurious data may inadvertently make it’s way into a database. Treating this data is possible using the following means:

  • Set the rrd-min and/or rrd-max variable(s) for each datasource when creating new RRD databases
  • Use rrdtool dump to export the RRD database to XML format, edit out the spurious values and import the data back into the RRD database
  • Use rrd tune to apply rrd-min and/or rrd-max variable(s) to an existing RRD database. All values outside the minimum or maximum defined bounds will be set to NaN.
rrdtool tune <file> --maximum <ds>:<value>
  • Use the perl script removespikes.pl. This would remove all spikes within 1% of the datapoints in the rrd file. If 1% does not fix them, modify the % value up until all the spikes are removed. Though this may eat up some valid values in the process, use with caution!
perl removespikes.pl -l 1 fastrouter_ethernet0_1.rrd
  • Use rrd_editor, a cross platform win32 or perl/tk tool to seek and remove spikes in an RRD. I have not used the tool, but according to comments it works as advertised. It also lets you easily add or remove RRAs and datasources from an RRD, which is a golden feature for many of us.
  • Use killspike2 an RRD spike removal script distributed as part of the Cricket network management system. I have not used the script, but it is known to work.

As with any solution, automation and prevention are the keys to a fluid system.

genDevConfig will automatically set rrd-min and rrd-max values for all config-tree targets it creates for Cricket.