I recently stumbled on GotToZ.com, which really is an waste of time but I looked like a challenge, so I tried it for some minutes. The purpose of the site is to tangle through a web of sites, each representing a letter of the alphabet, finally reaching Z. I got up to Y manually, but then I decided Ruby could do that much better than me. I encourage you to try it manually first, though. (Not like you’re wasting enough time already…)
require 'open-uri'
@pages = {}
@count = Hash.new 0
def track(page)
return if @count[page] > 2
@count[page] += 1
a = (@pages[page] ||= [])
open(page).read.scan(
/\074A href="(http:\/\/.*?.com\/)".*?\076[A-Z]\074/m) { |e|
a.push e.first
}
a.uniq!
a.sort!
STDERR.puts page
a.each { |x| track x }
end
track 'http://www.amongothers.com/'
puts "digraph {"
@pages.each { |k, v|
v.each { |l| puts %Q[ "#{k}" -> "#{l}";] }
}
puts "}"
Run it, possibly a few times because I think the Z site is added randomly (that’s why every page is fetched up to three times, too), and save the standard output into a file. Now, you have a nice graph you can run GraphViz on and do nifty diagrams, like this (click for full 3884x3434 view, be careful):
NP: Dire Straits—Walk Of Life