Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
    logii
    @logii
    any ideas on how to send the useragent?
    Nick
    @NicholasGlazer
    I have an error "write after end" https://gist.github.com/nickovruchsky/18947f53064e71fc105f
    give me advice pls
    Cosmin Onea
    @cosminonea
    .concurrency() does not seem to work
                 ^
    TypeError: undefined is not a function
               .concurrency(3)
                 ^
    TypeError: undefined is not a function
    Cosmin Onea
    @cosminonea
    my mistake I was calling the concurrency on the wrong object
    Tamara
    @tammyta

    Hello,
    i get

    phantomjs-node: You don't have 'phantomjs' installed

    do i need to install anything besides x-ray and x-ray-phantom?

    i imagine phantomjs, but i'm not able to get it to work
    Wah Loon Keng
    @kengz

    PhantomJS

    PhantomJS is a dependency used by any web-scraper. It is not a Node module, thus its binaries must be downloaded from its site, and the path be exported using PATH on the terminal. Open up your bash profile with nano

    nano ~/.bash_profile

    and add the path for PhantomJS to it.

    ### for phantomjs
    export PATH=/usr/local/phantomjs/bin:$PATH

    If you like to code and run from the Sublime console, modify its Node build system to use shell_cmd instead of cmd, so that it runs the shell terminal from the console. Here the complete Node.sublime-build file:

    {
        "shell_cmd": "node \"${file}\"",
        "selector": "source.js",
        "env": {
            "PATH":"/usr/local/phantomjs/bin"
        }
    }
    Tamara
    @tammyta
    ok, i think i got it working, but it's not working, i mean, it's working but i can't scrap the site cause it;s all javascript, do i have to do something besides using phantom as driver?
    package.jason
    {
      "name": "scraper",
      "version": "1.0.0",
      "description": "scraper",
      "main": "index.js",
      "scripts": {
        "test": "echo \"Error: no test specified\" && exit 1"
      },
      "author": "tammyta",
      "dependencies": {
        "x-ray": "^2.0.2",
        "x-ray-phantom": "^1.0.1"
      }
    }
    index.js
    var Xray = require('x-ray');
    var x = Xray();
    
    x('https://empeopled.com/t/empeopled_news', '.topic-content', [{
      title: '.post-title',
      link: '.post-link@href',
    }]).write('results.json')
    Tamara
    @tammyta
    oops
    index.js for real this time
    var phantom = require('x-ray-phantom');
    var Xray = require('x-ray');
    
    var x = Xray()
      .driver(phantom());
    
    x('https://empeopled.com/t/empeopled_news', '.topic-content', [{
      title: '.post-title',
      link: '.post-link@href',
    }]).write('results.json')
    Wah Loon Keng
    @kengz
    have you tried out the simple examples on the main page?
    Tamara
    @tammyta
    yes, and others examples work for me, the problem is with this specific page
    also, when i try to grab the whole body i get http://pastebin.com/bTjiWvSB
    inigo
    @bogini
    @tammyta "Please enable javascript and reload"
    The page requires loading JavaScript
    use the Phantom module to load the page in a virtual browser
    Tamara
    @tammyta
    i know, that's why i was wondering if i had to do something else than .driver(phantom()); to do that
    Michael Jeffrey Wu
    @mistermjtek
    hi, just started working with x-ray and am wondering how i could only obtain the first <a> tag in a div when the div has multiple <a> tags in it
    Carlos Serrano
    @paceUlibDev

    Anyone else ever get this error when trying to use the phantom driver?

    phantom stderr: 'phantomjs' is not recognized as an internal or external command operable program or batch file.

    Carlos Serrano
    @paceUlibDev

    Well, I addressed my original problem by installing phantomjs. I had been under the assumption that [ x-ray + x-ray-phantom ] was a port of phantom rather than a wrapper, silly me. But now I am getting a new error:
    NETWORK_ERR: XMLHttpRequest Exception 101

    Trying to work through that now, but would appreciate any ideas on resolving it.

    Ishan 'Fishy' Marikar
    @ishan-marikar
    Hi c:
    Where can I find the documentation for x-ray? :/
    Christabella Irwanto
    @christabella
    Hey, did anyone manage to follow multiple links (do nested xray() calls) for multiple selectors in a page? Although the example code in the documentation works, it's only for a single selector:
    x('http://google.com', {
      main: 'title',
      image: x('#gbar a@href', 'title'), // follow link to google images
    })
    Whereas when you do it on multiple selectors it doesn't work anymore:
    xray(http://m.bnizona.com/index.php/promo/index/16, 'ul.list2 li', [{
            title: 'span.promo-title',
            details_link: 'a@href',
            title_of_followed_link: xray('a@href', 'title')
    }])
    Razak Wasiu
    @madibalive
    same here
    soo mad
     app.get('/test', function(req, res) {
            x('http://o2tvseries.com/search/list_all_tv_series','.data_list .data',[{
                title: 'a',
                url:'a@href',
                info:x('a@href','.tv_series_info','.serial_desc')
            }])(function(err, obj) {
                res.json(obj)
            })
        })
    no info
    Christabella Irwanto
    @christabella
    yea?? I guess it's a bug then, it's been reported as an issue in Github (something along the lines of "[{}] not working"), just wondering if anyone managed to solve it
    Razak Wasiu
    @madibalive
    checking on 2.0.2
    Christabella Irwanto
    @christabella
    for now i'm just getting by with manual for loops
    Razak Wasiu
    @madibalive
    is used to work for on , v2 i think but i lost my files , and same good ,i get nothingh
    --edited
    how you using for loops for it
    Christabella Irwanto
    @christabella
    ah i see, did your above code snippet work? i shouldn't think so right?
    put all the @href-selected urls into a JSON file or into a Javascript object/array, then do x(individualURL, '.tv_series_info', '.serial_desc') for each URL in the object/array
    Razak Wasiu
    @madibalive
    but how do i map it back to same
    forEachLoop{
        var moreInfo = doforeach(url[1]);
        // how toput it back into the same 
         mainArray[1].moreinfor = moreInfo  // appending here 
    }
    //but what happens when one return error , other issue arise
    doesnt seem fail proof
    Razak Wasiu
    @madibalive
    for those with broken crawler ,return to the v 2.0.2
    Razak Wasiu
    @madibalive
    hello
    Ghost
    @ghost~56cf3ddfe610378809c3791a
    guys i have this problem: x-ray is scraping same data two times from one url; you know how this could happen?
    Marshall Ford
    @marshallford
    Anyone have a working example for using something like pify to convert the x-ray callback into a promise?
    Morgan O'Neal
    @moneal
    For testing can I use a local html file instead of making remote requests?
    Geoff Holden
    @brightloudnoise
    Is it possible to use x-ray to scrape the json response from an API?
    Daniel Lathrop
    @lathropd
    @brightloudnoise : Sorry for the long delay. No, it currently isn't. You probably figured that out already, but I didn't want to leave you hanging.
    @moneal Obviously this response is waaay overdue. Yes. Just read the file in using fs and go from there.
    @marshallford Long delayed response, but in the past few years we have indeed embraced promises pretty well.