DNS event trending as an NSM component – Acktomic.com

Making use of time series trending for network security monitoring. A short history with examples.

Time series trending tools like: Cricket, Cacti and Torrus focus on performance and availability.

Operational teams use these day in day out, the tools are there to organize and display reams of data in a quick and painless way.Time series trending is a source of indicators.

Strengths of time series trending

detailed baseline
displays seasonality
highlights anomalies
illustrates subtle changes
provisioning planning

Strengths of time series trending using RRD databases

fast visualization – anyone having had to suffer SQL based trending tools can attest
low maintenance
graphing flexibility
capabilities can be extended
no vendor lockin or annual maintenance fees

Making use of these strengths to identify threats is a secret recipe. Yeah, reaaally.

Here are some DNS based trends that can help quantify, understand and defend or cleanup against extrusion attemps or malware/botnet command and control communications.

Trending the number of hits against security related blacklisted entries
Trending the number of hits against .cn, .ru, etc. specific domains
Trending the number of hits against honeypot entries
Trending the number of hits by source IP

The data value can be extended by doing a bit of munging on the interesting bits.

Google foo, to automate searchs of the IP addresses or DNS names to see if they are related to specific malware or botnets. This will help prioritize the cleaning efforts.
Anomaly detection such as Holt-Winters smoothing with confidence intervals to identify anomalies in seasonality. Which, for the non-initate means: sudden traffic pattern changes such as traffic going 0 during normal hours or high usage during off-hours. Things that may not trigger a threshold, but that are out of seasonality.

The information could also be reported in dashboards on a Splunk server for operational teams or management. Trending data over the long term also unshackles the administrator from defining hard limits in time of day use for what is normal and what is not. You can still define hard limits which can generate security events, but analysis is made much easier by seeing the whole time series.

Once identified the vector of the outbreak needs to be cleaned before nasty malware can move in (think Zeus variant).