Using NodeJS to build crawler tools
静雅思听Podcast资源抓取 - a crawler which download podcast mp3 in batch from justing.com.cn
CB is a crawler utility that implements a couple of site categories to be crawled. It also includes scenarios and configurations to handle ajax and some other goodies. Work in progress.
Project web crawler in Python Beautiful Soup library Loading the daily liturgy of the www.cnbb.org.br of Brazil
Serritor is an open source web crawler framework built upon Selenium and written in Java. It can be used to crawl dynamic web pages that require JavaScript to render data.