This blog has moved to http://teabass.com

There will be no more posts or comments on this site, please update your bookmarks and feeds to http://teabass.com.

Reddit Scraping with Ruby, Hpricot and Builder

Thursday, February 26th, 2009

I’ve been looking into adding Reddit support to Feeddit and I thought I’d share a bit of code that I whipped up.

Whilst investigation how to get a similar system as the digg categories on feeddit for reddit using the unoffical API, I ran into an issue.

Unlike Digg, Reddit’s categories system is user generated, with subreddits, which doesn’t have a nice way of getting an xml file of all the subreddits, so using hpricot I made this little script to scrap http://reddit.com/reddits and return a nice xml doc of all of them.

Last time I ran this it found 13,100 subreddits, which makes for a very large xml file!!

I’ve decided against using this code in production, infavour of the user entering the name of the subreddit they wish to subscribe to but it’s still pretty cool, and could be useful for anyone looking to do some screenscraping in ruby or how to build xml.

It’s hosted on github so feel free to fork and improve it as you see fit too.

Leave a Comment