Last Updated: 2013-08-30 18:22:24 UTC
by Kevin Liston (Version: 1)
Two weeks ago I rambled a bit about trying to dig a signal out of the noise of SSH scans reported in to Dshield (https://isc.sans.edu/diary/Filtering+Signal+From+Noise/16385). I tried to build a simple model to predict the next 14-days worth of SSH scans and promised that we'd check back in to see how wrong I was.
Looks like I was pretty wrong.
I have built and trained the model to do a tolerable job of describing past performance and wondered if we let it run if it'd do any better at predicting future behavior than simply taking the recent average and projecting that out linearly. I fed the numbers into the black box and click "publish" on the article before I really took a close look at what it was spitting out. There was a spike in the 48-hours between turing the model and publishing and it's imapct on the trend was a bit.. severe.
None of the approaches did an amazing job at predicting the total number of 6423, although it's amazing at how badly the Exponential model did. I have had really good results using that method with other data. I encourage you to give it a try on other problems.
|Method||SSH scan source total for 14-days||Error (%)|
|Exponential Smoothing||19963||13540 (210%)|
|7-day average projection||7197||774 (12%)|
|30-day average projection||7054||631 (10%)|
|MCMC estimate||5390||1033 (16%)|