Sunday, November 25, 2007

Event Correlation on a Budget

Log management and its wiser, old brother, event correlation, are processes that anyone in the security space is likely very familiar with. I've been dealing with them since day 0, but in the past year or more things have taken a more serious turn. Previously, logs had been used as a last resort and the people capable of wrangling them were much revered. Now there are plenty of standards, books, products and companies that attempt to make sense of your logs, and for good reason -- they are important. Logs will alert you to situations that most traditional monitoring systems would be blind to. Proper log management is necessary if legal action is necessary. There is interesting shit in logs. Really. Look some time.

Lets be honest, though. Even wrangling the logs from your little desktop can be a complicated process -- they'll generate hundreds of logs per day. A relatively unused server will generate upwards of a megabyte of logs per day. An active web, mail or shell server? Millions of entries, several gigabytes of logs in a single day. Now combine the logs from across your entire organization. Information overload.

There are plenty of products you can drop a pretty penny on that will, without a doubt, bring you leaps and bounds from where you very likely sit right now. Some organizations have no log management. Some have centralized logging, but very few have anything further. If you are lucky, some hotshot has a script that tails the logs and looks for very specific error messages which will save your tail.

I am a firm believer in the school of thought that before you go out and drop any sort of significant cash on a security product, you have to go out and really get your hands dirty. For me, that often means seeing what free solutions currently exist, or, worst case, roll your own.

In terms of free (as in beer) solutions, swatch, logwatch, SEC and OSSEC are among the top, the later two being the most powerful. Swatch suffers from lacking any real correlation abilities. Logwatch has some of these capabilities but suffers from essentially using a pile of large, horrifically ugly perl scripts to parse the logs. I've written many ugly perl scripts, and I fear for anyone who is not perl savvy and has to maintain a logwatch setup. SEC and OSSEC have very similar capabilities, though OSSEC is more targeted towards host-based intrusion detection (HIDS) by way of correlating security events within logs. It is a great approach, it is just not the solution that I decided to write about.

What follows is an abridged example of how I used SEC to get some very much needed event correlation up and running in an environment that has anywhere between 500M and 50G of logs per day, depending on how you look at things and who you ask :). I say "abridged" because this ruleset is far from complete. In fact, if you take it as is and set it loose on your logs, you will get a metric crapload of emails about things you probably already know of or otherwise don't care about. The reason here is two-fold. One, I don't want to give away all of my secrets. Two, I cannot tell you what log messages you should or should not care about. That is up for you to learn and decide accordingly.

Save the snippet below as you SEC configuration file and then point SEC at some of the logs you are concerned with. It will give you a base from which you can:

  • Explicitly ignore certain messages
  • Alert on certain messages
  • Do minimal correlation on a per-host, per-service basis

Good luck and enjoy!

# ignore events that SEC generates internally
type=suppress
ptype=RegExp
pattern=^SEC_INTERNAL
# ignore syslog-ng "MARK"s
type=suppress
ptype=RegExp
pattern=^.{14,15}\s+(\S+)\s+-- MARK --
# ignore cron,ssh session open/close
# Nov 23 00:17:01 dirtbag CRON[26568]: pam_unix(cron:session): session opened for user root by (uid=0)
# Nov 23 00:17:01 dirtbag CRON[26568]: pam_unix(cron:session): session closed for user root
# Nov 25 16:19:30 dirtbag sshd[13072]: pam_unix(ssh:session): session opened for user warchild by (uid=0)
# Nov 25 16:19:30 dirtbag sshd[13072]: pam_unix(ssh:session): session closed for user warchild
type=suppress
ptype=RegExp
pattern=^.{14,15}\s+(\S+)\s+(cron|CRON|sshd|SSHD)\[\d+\]: .*session (opened|closed) .*
# alert on root ssh
ptype=RegExp
pattern=^.{14,15}\s+(\S+)\s+(sshd|SSHD)\[\d+\]: Accept (password|publickey) for root from (\S+) .*
desc=$0
action=pipe '$0' /usr/bin/mail -s '[SEC] root $3 from $4 on $1' jhart
# ignore ssh passwd/pubkey success
#
# Nov 24 17:09:22 dirtbag sshd[8819]: Accepted password for warchild from 192.168.0.6 port 53686 ssh2
# Nov 25 16:19:30 dirtbag sshd[13070]: Accepted publickey for warchild from 192.168.0.100 port 57051 ssh2
type=suppress
ptype=RegExp
pattern=^.{14,15}\s+(\S+)\s+(sshd|SSHD)\[\d+\]: Accepted (password|publickey) .*
#############################################################################
# pile up all the su, sudo and ssh messages, alert when we see an error
# stock-pile all messages on a per-pid basis...
# create a session on the first one only, and pass it on
type=single
ptype=RegExp
continue=TakeNext
pattern=^.{14,15}\s+(\S+)\s+(sshd|sudo|su|unix_chkpwd)\S*\[([0-9]*)\]:.*
desc=$0
context=!$2_SESSION_$1_$3
action=create $2_SESSION_$1_$3 10;
# add it to the context
type=single
ptype=RegExp
continue=TakeNext
pattern=^.{14,15}\s+(\S+)\s+(sshd|sudo|su|unix_chkpwd)\S*\[([0-9]*)\]:.*
desc=$0
action=add $2_SESSION_$1_$3 $0;
# check for failures.  if we catch one, set the timeout to 30 seconds from now,
# and set the timeout action to report everything from this PID
type=single
ptype=RegExp
pattern=^.{14,15}\s+(\S+)\s+(sshd|sudo|su|unix_chkpwd)\S*\[([0-9]*)\]:.*fail(ed|ure).*
desc=$0
action=set $2_SESSION_$1_$3 15 (report $2_SESSION_$1_$3 /usr/bin/mail -s '[SEC] $2 Failure on $1' jhart)
#
##########
##########
# These two rules lump together otherwise uncaught messages on a per-host,
# per-message type basis.  The first rule creates the context which is set
# to expire and email its contents after 30 seconds.  The second rule simply
# catches all of the messages that match a given pattern and appropriately
# adds them to the context.
#
type=Single
ptype=RegExp
pattern=^.{14,15}\s+(\S+)\s+(\S+):.*$
context=!perhost_$1_$2
continue=TakeNext
desc=perhost catchall starter for $1 $2
action=create perhost_$1_$2 30 (report perhost_$1_$2 /usr/bin/mail -s '[SEC] Uncaught $2 messages for $1' jhart)
type=Single
ptype=RegExp
pattern=^.{14,15}\s+(\S+)\s+(\S+):.*$
context=perhost_$1_$2
desc=perhost catchall lumper for $1 $2
action=add perhost_$1_$2 $0
#
###########
###########
# These two rules catch all otherwise uncaught messages on a per-host basis. 
# The first rule creates the context which is set to expire and email its
# contents after 30 seconds.  The second rule simpy catches all of the messages
# that match a given pattern and appropriately adds them to the context.
#
type=Single
ptype=RegExp
pattern=^.{14,15}\s+(\S+)\s+\S+:.*$
context=!perhost_$1
continue=TakeNext
desc=perhost catchall starter for $1
action=create perhost_$1 30 (report perhost_$1 /usr/bin/mail -s '[SEC] Uncaught messages for $1' jhart)
type=Single
ptype=RegExp
pattern=^.{14,15}\s+(\S+)\s+\S+:.*$
context=perhost_$1
desc=perhost catchall lumper for $1
action=add perhost_$1 $0
#
###########
###########
# These last two rules act simlar to the above sets, the only exception being that
# they are designed to catch bogus syslog messages.
type=Single
ptype=RegExp
pattern=^.*$
context=!catchall
continue=TakeNext
desc=catchall starter
action=create catchall 30 (report catchall /usr/bin/mail -s '[SEC] Unknown syslog message(s)' jhart)
type=Single
ptype=RegExp
pattern=^.*$
context=catchall
desc=catchall lumper
action=add catchall $0
#
###########

Thursday, November 15, 2007

Comcast Information Stupidhighway

Over the past day or two I've been receiving an increasing amount of attention from the suit against Comcast that names their bittorrent throttling and forgery, among other things, as being against Federal computer fraud laws.

As much as I'd like to be getting this sort of publicity, I'll just come right out and say it. I, Jon Hart, am not the Jon Hart that is currently suing Comcast. In fact, I have never been a customer of Comcast's ISP business, nor do I have any intention of doing so. Furthermore, I'd become homeless on the beach here in sunny Santa Monica and just steal wireless from one of the hundreds of homes that leak 802.11 out to the sand prior to ever stooping so low as to become a Comcast customer. I've seen it done. Heck, I don't discriminate -- AT&T/Timewarner, RoadRunner, and Verizon, you know you are not innocent either.

These ISPs make a healthy profit, and can we blame them for some of their practices? Yes and no.

ISPs make money because they oversubscribe. This is the practice of selling more bandwidth than they have available. Even if you ignore the theoretical limits of the physical medium over which Comcast's cable travels, there are some obvious problems. Comcast offers download speeds from 1-16Mbs. If you combine all of Comcast's customers in a given city or region, if they were all pulling 8-16Mbs, Comcast would become a pool of molten silicon fairly quickly. They rely on the fact that most consumers will never push anything near their capacity, much less for an extended period of time. However, once you get enough customers doing enough ungodly things online at 2am, the chances of Comcast's backbone running into issues skyrocket. The right solution to ensure that you still make your profit is just degrade the "offending" customer's services enough that the probability of exceeding capacity is low enough that the risk is acceptable.

Comcast and other ISPs take things to a new level. Its one thing to rate limit, as this just has to happen for a business based on oversubscription to profit. Forging TCP RST packets or, worse yet, injecting fake messages to tear down or prematurely terminate traffic (primarily P2P) is just downright dirty. Their use of Sandvine allows them to continue to profit while continually bringing on new customers at the expense of a healthy, unadulterated Internet experience. Think thats the worst of it? It isn't. There have been plenty of rumors and various bits of proof that Comcast has been deploying its own nasty brand of DNS redirection, a la SIte Finder, over the past year or so.

Are there other options out there? Sure. DSLExtreme and Speakeasy offer no-bullshit connectivity. They give you bandwidth and leave you the hell alone. In fact, they encourage you to share your connectivity with others, run servers off of you DSL connection and filter none of your inbound or outbound traffic. Its like the Wild West and it is great.

Why do Comcast and others continue to exist? Because far too many Americans are easily duped by all of this "triple-play powerplay", "powerboost", "the fastest Internet", candy coated, re-branded bullshit. Aside from the molestation that happens on Comcast, you still connect to the same Internet that I do. None of the features are unique or better than what already exists everywhere else. Comcast's "The Fan"? Its called Youtube. Sharing Photos? Flickr. And for the love of God, if I see one more "portal" that reminds me of AOL's purple-panel-of-doom, I'm going to hurl.

Do yourself a favor. Support lawsuits like this. Voice your concern and rage to your ISP and talk to others about it. If all else fails, ditch Comcast.

Sunday, November 11, 2007

SSL Certificates on Sourcefire DC/3D Systems

I'm in the process of getting an internal CA off the ground, and on Thursday I found myself totally stumped for a good 6 hours with the Sourcefire systems.

I called support and asked if there was a proper, supported way to get a more official SSL certificate on the devices. To my surprise, there isn't. If you are stuck using a self-signed certificate from a CA you have no business trusting, its almost not even worth using SSL to protect the traffic on the web-based management interface. Since its just all Apache under the hood anyway, I asked if I could do it myself. After several minutes I was told that this is fine, but they could not support anything related to SSL after I went down that road.

Easy enough, I thought. Generate a large private key, generate a CSR, submit the CSR, sign the CSR and drop the new key and certificate on the Sourcefire device in /etc/ssl/server.key and /etc/ssl/server.crt, respectively. Restart apache.

To my total surprise this did not work. Re-copy the correct key and certificate, just to be sure. Same deal. Run the usual certificate and key verification:

openssl x509 -noout -modulus -in server.crt | openssl md5
openssl rsa -noout -modulus -in server.key | openssl md5

Sure enough, the MD5s are the same. What gives? Tweaking the log levels indicated that I had I definitely had the wrong key, but my commands above proved otherwise. I continued to get the following error message:

x509 certificate routines:X509_check_private_key:key values mismatch
unable to set private key

Lies.

What could it be? A private key that was too large? I tried 4096, 2048 and 1024, and oddly enough 1024 seemed to work. I was furious. Did Sourcefire really configure and ship a system that, for whatever reason, would only function with 1024 bit private keys? I took a coffee break, as I could not fathom how this was possible.

Freshly caffeinated, I took another stab at it. I put my original, 4096 bit key and corresponding certificate back in place and then started disabling the various SSL options that Sourcefire had enabled in Apache's config until I had more information. At one point I screwed up the gcache settings sufficiently enough to kill Apache. When I fixed that and started Apache, things mysteriously started working. This was the key clue. Looking at the init script that ships with Sourcefire, a 'restart' simply sends a HUP, whereas on most other systems it does a synced 'stop' followed by a 'start'.

I have not been able to prove this, but I imagine that either Apache or gcache is caching either the private key or the certificate from its last successful start, but not caching the other. The result is that, despite what you see on disk, the key and certificate that are being used do not match. Believe what you are eyes are showing you, but do not believe what Apache is telling you. A stop and a start are needed.

This is not a Sourcefire specific problem, so I hope someone stumbles upon this and it fixes their problem.

Thursday, November 1, 2007

Splunk 0+1 Day -- Good vendor relationships

A few days ago as part of my many responsibilities I stumbled upon what appeared to be a directory traversal. At first I did not believe it, partially because the specific class of vulnerability is quite dated and partially because I couldn't believe that I had been working with this product for so long and hadn't stumbled upon it.

Unfortunately, Splunk is about 170Mb of python, shell scripts and stripped binaries, so debugging this was no easy task. I actually thought I got lucky and traced the problem to an old version of TwistedWeb that Splunk uses to power the web server, but it turns out I was wrong. Furthermore, finding a python HTTP project that used an equally old version of TwistedWeb was basically impossible, so I was stranded.

At around the same time I opened an FYI ticket with Splunk giving them a heads up. Not only did they appreciate the information, they followed the TwistedWeb ticket and quickly determined that this was their bug, not Twisted's. A patch was quickly published and it is an easy update for anyone who is crazy enough to expose their Splunk server to the outside world.

This a great example of two things.

First, old bugs die hard. This was a classic example of a URI encoded directory traversal (%2e%2e%2f %2e%2e%2f %2e%2e%2f, aka '../../../../'), which Wikipedia describes fairly well. Exploiting this requires no authentication and simply requires that you have an HTTP client and the ability to reach the Splunk server.

Second, it is important to have a mechanism within your organization that allows security information to be channeled to the correct people in a timely manner. How many times can you recall where you call or email vendor XYZ and they basically refuse to speak to you unless you are a paying customer? Not addressing security issues in your products not only makes you look bad in the eyes of the security community, it also damages your brand and puts your paying customers at risk. Splunk is a great example of process done right.

Security is you friend.