Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
    mahmoudnabil
    @mnabil
    i'll see thanks charles
    Charles Green
    @charlesgreen
    cheers.
    Umair Ashraf
    @umrashrf
    Hi, is there a way to send logs from Lua to Scrapy using Scrapy-Splash plugin?
    Charles Green
    @charlesgreen
    @umrashrf I’m not sure out of the box how to get logs from Lua but looking at the code I don’t think it would take a lot of effort to add a log in the lua_runner (I don’t know about getting that type of support added but at least locally in a forked version). What type of output are you looking for? errors? etc. https://github.com/scrapinghub/splash/blob/master/splash/lua_runner.py
    Abd ar-Rahman Hamidi
    @hbakhtiyor
    @nramirezuy just interested, not fully understand
    Nicolás Ramírez
    @nramirezuy
    @hbakhtiyor In order to solve the capcha I need to send it back to the spider, but to do this I can't just return the value, because this closes the tab. So I had to build a service where I was able to send the image via http with LUA.
    After that I got to a part of the page were I needed to load a different frame to query on it, but this isn't supported. You can return the value of the frame back to Python, but for some reason I wasn't able to do so. So I ended up switching to Selenium, BTW Firefox Selenium wasn't able to get to this frame either, but luckly Chrome was. (:
    Charles Green
    @charlesgreen
    Hi All, what is the best way to remove or redefine a JavaScript function from the DOM before the page is rendered? splash:runjs ?
    I’m currently doing the following in my Lua script.
         if string.find(splash.args.url, “thepage.html") ~= nil then
            assert(splash:runjs(“timeOutCheck = function(){return;}"))
        end
    Charles Green
    @charlesgreen
    The function I am trying to replace does a few checks and if they fail it does a location.href redirect.
    Umair Ashraf
    @umrashrf
    @charlesgreen if it's in a .js file then may be you can stop this resource from being downloaded?
    Charles Green
    @charlesgreen
    Hi Umair, Thanks for your reply. It’s embedded in the page.
    Actually, the function checks for the existance of a form. If not there then it uses location.href to redirect the browser.
    might work
    Charles Green
    @charlesgreen
    I believe the set_content would clear the page that gets rendered however, perhaps I can stop the page from rendering javascript
    Umair Ashraf
    @umrashrf
    You can replace the page contents with the same contents replacing JS
    using regexes or selectors
    not sure if there is selectors support in Splash
    Charles Green
    @charlesgreen
    thank you. will keep working on it.
    Umair Ashraf
    @umrashrf
    no problem, good luck
    Charles Green
    @charlesgreen
    thank you. it’s much appreciated. I’ll post an update with the solution.
    Charlie Smith
    @chuckus
    Hi all, I’ve been working on a drop-in replacement for splash that utilises google chrome, in particular, the devtools api to implement the splash HTTP api. It can be easily deployed as a docker container and the repo is hosted at https://github.com/chuckus/chromewhip. It’s still well in early alpha but the container has working functionality. My motivation was simply to start getting some practice with asyncio. Any suggestions, comments or improvements, please file an issue or pull request :)
    Charles Green
    @charlesgreen
    @chuckus sounds very cool. Will take a look.
    cadabrum
    @cadabrum
    Hello! What are system requirements for running Splash container in production? I run into memory leaking with 3.0 docker container, oom killing it with 6581516kB taken after 120k+ processed requests.
    Any advice for reducing memory consumption?
    I've requested splash:html only, without processing any images.
    Moataz Hisham
    @mtzhisham
    Hi, i was wondering how to use proxy with scrapy-splash while also using render.html as an endpoint, splash is running through the docker container
    Wenxing Zheng
    @wenxzhen
    Dear all, how to capture the traffic request to the internet from Splash especially when the request goes out with proxy?
    habout632
    @habout632
    Hi #hi
    Somebody there #
    splash chrome
    The difference between them
    Charles Green
    @charlesgreen
    @habout632 I'm a few days late to reply. Can you give a bit more context? What differences would you like to know? Have you read the docs?
    mahmoudnabil
    @mnabil
    HI guys ? , is scrapinghub hiring remote software engineers ? :D
    Silvano Cerza
    @Alien1993
    Hey there, has anyone ever had an issue with Splash Docker container crashing with a 139 error code?
    I'm using it with the scrapy-splash plugin and it fails randomly while scraping
    Silvano Cerza
    @Alien1993
    I tried an higher verbosity with -v2 and this is always the last line splash_1 | 2017-10-17 16:06:27.016674 [render] [140648928249504] [lua_runner] send (lua) (b'return', <Lua table at 0x32e1750>)
    Ankit Yadav
    @yadavankit
    hi can anyone help me with a problem? Its quite urgent actually
    Ankit Yadav
    @yadavankit
    Hey there, how can I make sure that the image fully loads (not only just the img tag in DOM)
    Silvano Cerza
    @Alien1993
    @yadavankit have you tried using a lua script to wait?
    function main(splash)
        splash:go("http://example.com")
        splash:wait(0.5)
        return {html=splash:html()}
    end
    Something like this
    Nicolás Ramírez
    @nramirezuy
    Hi, does anyone knows how to fully disable caching? I'm interested on full HARs every time I request a website.
    Trey Saddler
    @tosaddler
    Greetings all. I'm having an issue with the splash docker container. I'm trying to attach a custom module through Docker's volume mounting option and the documentation says that you should run it with the argument "--lua-sandbox-allowed-modules" but Docker is saying that isn't a valid argument. Any ideas? I've disabled sandbox mode for now using the "--disable-lua-sandbox argument" but I'd like to get the other way working.
    Tiago Rodrigues
    @TiagoMRodrigues
    greetings, if you open this website https://istoe.com.br/moodys-faz-relatorio-positivo-sobre-corte-no-compulsorio-de-bancos-brasileiros/ you will notice that the spinner in your browser keeps spinning after the page appear to be full rendered (there is a broken link for an image and the page tries to download it for 90 seconds). but the splash gives timeout anyway, is there a way to render as is at the timeout cut point?
    Avery1012
    @Avery1012
    Hi is it possible to upgrade python version from 3.5.2 in scrapinghub/splash docker image to python 3.6.5 ?
    artndes9
    @artndes9
    Hey all