Friday, January 2, 2009

Hawler, the Ruby crawler, 0.3 released

I received an email yesterday from ET LoWNOISE, a Metasploit contributor, regarding adding proxy support to Hawler. Apparently the hope is to be able utilize Hawler for the crawling duties within WMAP, the new web application scanning framework in Metasploit.

Since it has been several months since I've had to do anything to Hawler, I figured this was a good time to go in an do some much needed cleanup and improvements. Chief among the changes are:

  • Proxy support ("-P [IP:PORT]")
  • Documentation cleanup
  • Support crawling frame and form tags
  • Add a useful default banner to calling scripts if none provided
  • Print out defaults when help is called

Thanks to ET for his proxy contributions.

As usual, the following will get you up and running with Hawler:

gem install --source http://spoofed.org/files/hawler/ hawler

Using Hawler? Comments? Complaints? Suggestions? Drop me a line - I'd like to hear it.

2 comments:

postmodern said...

Hey, nice to see Hawler continue to grow into a handy website crawling toolkit.

I've also written a web spidering library, named Spidr. What could be useful to your Hawler project is that I've also written a Web-Spider Obstacle Course for web-spiders. This course provides various foul HTML pages which the spider must navigate properly. There's also a JSON file that describes which links have to be followed/ignored/not-followed. I use the JSON file along with RSpec to test Spidr's ability at navigating rough HTML.

More of my code can be found on GitHub.

Jon Hart said...

@postmodern:

The obstacle course is a great idea! I took a quick pass through it and it looks like the only part Hawler gets slightly confused on is the empty href tricks. I'm torn as to whether this a bug in Hawler or the fault of URI::merge, which is responsible for making new URIs out of the page being processed and the newly encountered "link". I've worked around this in the latest commit.

Also, its good to see familiar names working on cool projects. Congrats on Spidr!