Where communities thrive

  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
    Tom Canac
    I'm looking for a crawler that can be capable of interpreting front end javascript (to crawl ajax content). I'm not sure this can be achieved with node-crawler yet, but the zombie integration may solve this. Do you have any news about this ? Is the project still developped ?
    Mike Chen
    I comment on the issue you mentioned.
    Abdul Diaz
    Hey I have a question
    Mike Chen
    Wow, can't believe I haven't logged in gitter for half a year
    Orlin Bobchev
    Thank you all for your work :)
    Mike Chen
    You're welcome
    Fábio Ap. Oliveira Silva
    hi guys
    can you help me with a question about the crawler?
    Fábio Ap. Oliveira Silva
    const Crawler = require('crawler');
    const $ = require('cheerio');
    const debug = require('debug')('amazonScraper:crawler');
     * Método que retorna a uri para o scraper
     * @param {String} asin 
    const URI = (asin) => `https://www.amazon.com/gp/video/detail/${asin}`;
     * Classe Scraper
     * Classe com os métodos scrapMovieById e scrapShowById
     * que procura filmes ou séries pelo ASIN e retorna os dados
     * do filme ou série informados
    class Scraper {
         * Realiza o scrap da página
         * @param {String} asin 
        scrapMovieById(asin) {
            const self = this;
            return new Promise(function(resolve, reject) {
                const crawler = new Crawler({
                    rateLimit: 500,
                    retries: 3,
                let movie;
                    uri: URI(asin),
                    callback: function(error, res) {
                        if (error) {
                        const $ = res.$;
                        movie = self.parseMovieData($);
                        movie.program.asin = asin;
                        debug('Movie dentro do Crawler:scrapById: %O', movie);
        parseMovieData($) {
            const description = $('div[data-automation-id="synopsis"]').text();
            const releaseYear = $('span[data-automation-id="release-year-badge"]').text();
            const genres = $('dt[data-automation-id="meta-info-genres"]').next().children('a')
                .map(function() {
                    return $.trim($(this).text());
            const title = $('h1[data-automation-id="title"]').text();
            const images = $('div.av-fallback-packshot > img')
                .map(function() {
                    return $.trim($(this).attr("src"));
            const keywords = $('meta[name="keywords"]').attr("content");
            const cast = $( 'th:contains("Starring"), th:contains("Supporting actors")' ).next().text().split(',')
            const duration = parseInt($('div.av-badges > span').eq(2).text().split(' ')[0]);
            const movie = {
                program: {
            debug('Movie em crawler:parsevideodata: %O', movie);
            return movie;
    module.exports = Scraper;
    the method scrapMovieById isnt returning the parsed info correctly, does someone have any idea? Am I using it correctly?
    Mike Chen
    pls do not use direct
    use queue instead
    is anyone here?
    Mike Chen
    what's the matter
    Douglas Ferguson
    Howdy, I'm doing a simple test and crawler seems to stop on the first page I give it, I must be missing something really simple. Is there a toggle for it to recurse? Or do I need to enable javascript execution or something?
    I'm using a simple instance of Crawler() with a call back that just prints the url to the console console.log(res.options.uri + " " + $("title").text())
    and I then have c.queue('http://www.amazon.com');
    And it stops after printing out http://www.amazon.com
    Mike Chen
    @thedug what else do you expect?
    Jacob Bogers
    nice tool
    Kingsley Richard
    Please I need help guys I'm new here
    How can I create my own app store or my own app like Facebook
    var Crawler = require("crawler");
    var fs = require("fs");
    Greetings, does this project also includes sheduling and task list?
    hi all, how can i disable ssl verifying while requesting url?
    years and years