« HTML Comments: comedy gold? | Main | The Art of Software Security Assessment -- Page corruption »

Statistical Relational Anomalies

Some of my best ideas start to form when I'm not actually thinking about anything in particular. Oddly enough, these times typically include when I'm in the bathroom or at the gym.

Today was one of those days. Much like the Internet, my brain is a series of tubes. In its default mode it doesn't do a whole hell of a lot, but given the right stimulation, those tubes start interconnecting and stuff gets done. Kinda like that Sesame Street game I use to play back in the early 90s either on HP-UX, Domain/OS or some PC clone. It started today when I was continuing or evaluation of Websense's Security Suite -- they have report called "Outliers". This basically shows hosts whose particular type of activity doesn't roughly match the same activity for other similar hosts. I thought this was interesting but that the results must not be considered in a vacuum. Later on at the gym, I was taking off my shoes and realized that the same two shoes I had been wearing all day, those that I had repeatedly looked at at various points throughout the day, were in fact two different shoes. It just so happens that I owe two pairs of black skate shoes that look similar.

These two seemingly unrelated thoughts got me thinking. Traffic anomalies are very interesting. I remember years ago when SPADE was very popular within the snort community. Sadly, this product went completely down the tubes, and now only exists at archive.org in various forms. While it required a lot of tuning and had some inherent level of false positives, it always yielded interesting data over time. The thing about SPADE, and any products like it to date, are always based on reported anomalies in a single given characterstic. Bandwidth deviations, new source and destination host pairs, previously unseen server ports, etc. While these are all well and good, and oftentimes tell you something you otherwise wouldn't have known, they are still only ever looking at things in one dimension.

This got me to thinking. What sorts of things could you discover by examining traffic patterns as they relates to other characteristics of traffic? Specifically, what would anomalies tell you?

The best example that I could quickly come up with is DNS requests in relation to requests for the results of those DNS requests. As an example, the normal traffic pattern for HTTP requests is to first do a lookup for the DNS A record of the host you are interested in, which will be followed shortly thereafter by a request to port 80/tcp on one of the IP addresses returned from the previous query. For basically any type of traffic, you can say that the ratio of DNS requests for a given A record to the number of requests to the results of that A record is a function of DNS TTL, client-side caching, and phase of the moon. This basically means that you really can't make any assumptions about hosts in the middle of the traffic spectrum, but those on the extreme ends are almost certainly of interest.

Use HTTP as an example. Those hosts whose DNS request to HTTP traffic ratio for a given A record are roughly the same are probably not of interest. For those that have a very high DNS request to HTTP request ratio could indicate:


  • Faulty DNS setup, or broken caching

  • Host is actually a monitoring system of sorts (especially if you consider PTR lookups as relative to traffic to that IP)

Hosts that have a very low DNS requests to HTTP request ratio could indicate:


  • Hardcoding of A records in /etc/hosts or equivalent

  • Use of a rogue DNS server

  • Accessing of a host by IP address, which, IMO, is always suspicious

There is a body of knowledge out there already that touches on this subject to some extent. For example, it is fairly well known that it is a bad idea to configure your monitoring system (IDS, syslog monitoring, event correlation, etc) to do automatic or even semi-automatic A or PTR record lookups. The primary reason for this is performance, but it could also be a good tip to an attacker that some sort of analysis is going on. Say, for example, that I attack example.com from spoofed.org, which current has a forward DNS (A record) entry of 66.92.51.226. If example.com has some sort of monitoring system that in some way does a reverse DNS (PTR) record lookup against 66.92.51.226, or against spoofed.org, that is a good indication to me that someone may be on to me.

Thinking about other statistical relation anomalies that could proove interesting, I came up with:


  • Inbound SSH connections as related to seemingly related outbound traffic in a given timeframe. A high ratio could typically indicate a bastion/jump host or a shellserver of sorts, but those that have a lower ratio could indicate a compromised SSH server.

  • Inbound HTTP requests as relative to outbound SMTP connections. In a timeframe where the ratio is highly skewed, you could say that all is normal. After all, your average HTTP server should rarely initiate outbound SMTP. However, if the ratio approaches 1:1 during a given timeframe, especially when the number of inbound HTTP clients is small as compared to the number of SMTP desitnations, you should start to examine what services that HTTP server is exposing -- I'd suspect a rogue formail.pl or a busted 'email a friend / contact us' script.

These sorts of questions could easily be answered given the right front-end or database to something like sancp. There are probably a number of canned reports that could be run and give useful results regardless of the environment, but otherwise it could act as the a tool for finding "hosts of interest" in the right hands.

Comments (1)

Pete:

You've stumbled on to sequence analysis which to be honest I don't think enough people have looked into. Man that would make a good talk.

I've got a few home grown scripts that count the number of times an A record is requested and then not followed by a connection (UDP,TCP, or ICMP) from the host that made the A record request. It detects alot of the Dan Kaminsky style DNS tunnels, even when you add sleeps in to slow the packet rate down.

Some of my thoughts on statistical detection

Post a comment

(If you haven't left a comment here before, you may need to be approved by the site owner before your comment will appear. Until then, it won't appear on the entry. Thanks for waiting.)