Wynn Netherland changelog.com/posts

rawler: Crawl your website and find broken links with Ruby

Need a quick-and-dirty way to find broken links on your web site? Rawler from Oscar Del Ben is a Ruby gem that gives you a command line tool to crawl your site, looking for errors.

Install via Rubygems:

gem install rawler

For usage, just execute the command rawler –help:

~ » rawler --help                                                                ~ 255 ↵ 
Rawler is a command line utility for parsing links on a website

Usage:
      rawler http://example.com [options]

where [options] are:
  --username, -u <s>:   HTT Basic Username
  --password, -p <s>:   HTT Basic Password
       --version, -v:   Print version and exit
          --help, -h:   Show this message

Point Rawler to your URL and you’ll get a list of followed links and their HTTP status codes:

~ » rawler https://changelog.com                                               ~ 130 ↵ 
301 - https://changelog.com/episodes
200 - https://changelog.com/archive
200 - https://changelog.com/
200 - https://chrome.google.com/extensions/detail/oiaejidbmkiecgbjeifoejpgmdaleoha
301 - http://github.com
200 - http://stylebot.me
200 - http://twitter.com/stylebot
200 - https://changelog.com/tagged/css
200 - https://github.com/handlino/CompassApp

...

The roadmap includes:

  • Follow redirects, but still inform about them
  • Respect robots.txt
  • Export to html

If you want to help out, fork the project and contribute.

[Source on GitHub]


Discussion

Sign in or Join to comment or subscribe

Player art
  0:00 / 0:00