Lightweight, scriptable browser as a service with an HTTP API
People
Repo info
Activity
mahmoudnabil
@mnabil
it opens on the vm but not the remote machine.
Charles Green
@charlesgreen
Sorry, I’m not really following what you are trying to do. Are you saying the javascript is not rendering on Splash or you’re not getting back the response that you expect?
mahmoudnabil
@mnabil
its ok ,im just not getting the response that i expect
Charles Green
@charlesgreen
if you print(response.text) are you able to find the elements you are looking for? If you inspect the page can you tell me the element ID and I can also check.
mahmoudnabil
@mnabil
@charlesgreen if you open the url i sent you from US or US proxy , u'll find a 'shop now' panel on the right side
Charles Green
@charlesgreen
I’m in Japan… don’t have a proxy setup
mahmoudnabil
@mnabil
this panel is only shown when javascript is allowed.
its ok , problem is im using splash and it still doesn;t show the panel.
Charles Green
@charlesgreen
is it depended on headers at all?
mahmoudnabil
@mnabil
how do i know that
Charles Green
@charlesgreen
If you are able to view the website in Chrome or using a proxy like Charles proxy or Postman then you can see the headers. If both the browser and splash are running on the same machine then I would guess there is where the difference is but it’s just a guess.
mahmoudnabil
@mnabil
i'll see thanks charles
Charles Green
@charlesgreen
cheers.
Umair Ashraf
@umrashrf
Hi, is there a way to send logs from Lua to Scrapy using Scrapy-Splash plugin?
Charles Green
@charlesgreen
@umrashrf I’m not sure out of the box how to get logs from Lua but looking at the code I don’t think it would take a lot of effort to add a log in the lua_runner (I don’t know about getting that type of support added but at least locally in a forked version). What type of output are you looking for? errors? etc. https://github.com/scrapinghub/splash/blob/master/splash/lua_runner.py
Abd ar-Rahman Hamidi
@hbakhtiyor
@nramirezuy just interested, not fully understand
Nicolás Ramírez
@nramirezuy
@hbakhtiyor In order to solve the capcha I need to send it back to the spider, but to do this I can't just return the value, because this closes the tab. So I had to build a service where I was able to send the image via http with LUA. After that I got to a part of the page were I needed to load a different frame to query on it, but this isn't supported. You can return the value of the frame back to Python, but for some reason I wasn't able to do so. So I ended up switching to Selenium, BTW Firefox Selenium wasn't able to get to this frame either, but luckly Chrome was. (:
Charles Green
@charlesgreen
Hi All, what is the best way to remove or redefine a JavaScript function from the DOM before the page is rendered? splash:runjs ?
I’m currently doing the following in my Lua script.
ifstring.find(splash.args.url, “thepage.html") ~= nil then
assert(splash:runjs(“timeOutCheck = function(){return;}"))
end
Charles Green
@charlesgreen
The function I am trying to replace does a few checks and if they fail it does a location.href redirect.
Umair Ashraf
@umrashrf
@charlesgreen if it's in a .js file then may be you can stop this resource from being downloaded?
You can replace the page contents with the same contents replacing JS
using regexes or selectors
not sure if there is selectors support in Splash
Charles Green
@charlesgreen
thank you. will keep working on it.
Umair Ashraf
@umrashrf
no problem, good luck
Charles Green
@charlesgreen
thank you. it’s much appreciated. I’ll post an update with the solution.
Charlie Smith
@chuckus
Hi all, I’ve been working on a drop-in replacement for splash that utilises google chrome, in particular, the devtools api to implement the splash HTTP api. It can be easily deployed as a docker container and the repo is hosted at https://github.com/chuckus/chromewhip. It’s still well in early alpha but the container has working functionality. My motivation was simply to start getting some practice with asyncio. Any suggestions, comments or improvements, please file an issue or pull request :)
Charles Green
@charlesgreen
@chuckus sounds very cool. Will take a look.
cadabrum
@cadabrum
Hello! What are system requirements for running Splash container in production? I run into memory leaking with 3.0 docker container, oom killing it with 6581516kB taken after 120k+ processed requests.
Any advice for reducing memory consumption?
I've requested splash:html only, without processing any images.
Moataz Hisham
@mtzhisham
Hi, i was wondering how to use proxy with scrapy-splash while also using render.html as an endpoint, splash is running through the docker container
Wenxing Zheng
@wenxzhen
Dear all, how to capture the traffic request to the internet from Splash especially when the request goes out with proxy?
habout632
@habout632
Hi #hi
Somebody there #
splash chrome
The difference between them
Charles Green
@charlesgreen
@habout632 I'm a few days late to reply. Can you give a bit more context? What differences would you like to know? Have you read the docs?
mahmoudnabil
@mnabil
HI guys ? , is scrapinghub hiring remote software engineers ? :D
Silvano Cerza
@Alien1993
Hey there, has anyone ever had an issue with Splash Docker container crashing with a 139 error code?
I'm using it with the scrapy-splash plugin and it fails randomly while scraping