Where communities thrive

  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
Repo info
    Dominik Ottenbreit
    hi there @coder46 any chance you are around?
    @phobik sorry any chance you are around?
    Tharunkumar V
    I want to scrap data from ajax website
    what should i do??
    D Rajiv Lochan Patra
    regex for a number
    at end of string
    Harikrishnan Shaji
    Hey how do I allow scrapy item json output to allow null values ?
    Mike Winkelmann
    Hi does anyone have experiences with a deployment of a scrapy spider with dependencies? I thinks I have done everything correctly in relation to the documentation. I will deploy a spider with my own dependency and external dependencies from pypi. Everything fro my will be included in the EGG-INFO stuff but nothing will be automatically downloaded :/ Can anyone help?
    Shashank Sharma
    Hey, can anyone help me with scrapy returning data in organized way
    Quentin Durantay
    Hey everybody, does somebody know how to get a url after a redirect? I'm currently scraping a href that redirects to another url, and would like to store the letter
    Charles Green
    @VonStruddle The response.url param in the callback should give you what you are looking for.
    Hi All
    Quentin Durantay
    @charlesgreen Actually it doesn't, but I've found a trick. Just use requests to GET the response.url, and load it back, as requests follows redirect by default
    Charles Green
    @VonStruddle good to know. Thanks for sharing.
    Charles Green
    @poulinjoel please check stackoverflow. I posted a reply.
    Jeremy Jordan
    Hi, I am running a spider with Scrapy but after it finishes crawling it can't seem to terminate. Log stats just recursively report that it is scraping 0 pages/minute. When I try to quit with Ctrl-C, it fails to shut down gracefully and I have to quit forcefully with Ctrl-C again. Any clue what is happening?
    Charles Green
    @jeremyjordan_twitter did you find the issue? Sorry, late to reply. Sounds like the final step did not return or yield an Item or a request signaling the end of the run.
    I'm trying to login facebook with scrapy shell
    anyone know how to do it?
    Jay Kim (Data Scientist)
    Hi everyone. I joined this room first time today, nice to meet you all
    does someone use portia to scrape?
    Hey guys. I wrote a free proxy server based on Tornado and Scrapy. Try it if you need a proxy pool for your scrapy project.
    @Karmenzind thanks for sharing
    Hello! I am using scrapy-cluster to web-scrape a very diverse, unstandardized set of websites. The current setup works for 90-95% of cases, but has issues with some specific sites. When I inspected a few of these problematic websites individually, the Google Chrome console showed some errors (in the web design?) such as “Uncaught ReferenceError: require is not defined”, “Uncaught TypeError: Cannot read property”, “A parser-blocking, cross site (i.e. different eTLD+1) script… is invoked”. I want to improve the accuracy of this method. Is there any middleware or tool I can use to bypass these errors and scrape these websites?

    from bs4 import BeautifulSoup as soup # HTML data structure
    from urllib.request import urlopen as uReq # Web client

    URl to web scrap from.

    in this example we web scrap graphics cards from

    page_url = ""

    opens the connection and downloads html page from url

    uClient = uReq(page_url)

    parses html into a soup data structure to traverse html

    as if it were a json data type.

    page_soup = soup(, "html.parser")

    finds each product from the store page

    containers = page_soup.findAll("div", {"class": "item-container"})

    name the output file to write to local disk

    out_filename = "graphics_cards.csv"

    header of csv file to be written

    headers = "brand,product_name,shipping \n"

    opens file, and writes headers

    f = open(out_filename, "w")

    loops over each product and grabs attributes about

    each product

    for container in containers:

    # Finds all link tags "a" from within the first div.
    make_rating_sp ="a")
    # Grabs the title from the image title attribute
    # Then does proper casing using .title()
    brand = make_rating_sp[0].img["title"].title()
    # Grabs the text within the second "(a)" tag from within
    # the list of queries.
    product_name ="a")[2].text
    # Grabs the product shipping information by searching
    # all lists with the class "price-ship".
    # Then cleans the text of white space with strip()
    # Cleans the strip of "Shipping $" if it exists to just get number
    shipping = container.findAll("li", {"class": "price-ship"})[0].text.strip().replace("$", "").replace(" Shipping", "")
    # prints the dataset to console
    print("brand: " + brand + "\n")
    print("product_name: " + product_name + "\n")
    print("shipping: " + shipping + "\n")
    # writes the dataset to file
    f.write(brand + ", " + product_name.replace(",", "|") + ", " + shipping + "\n")

    f.close() # Close the file

    can anyone plz run this code once and tell me the error that was generated as 'nonetype' obj is not subscriptable
    Edmondo Porcu
    Guys, assuming I want to unit test my scrapers, how do I do it? Can I download the html page and test locally?
    Anh Nguyen
    Yeah I have the same question
    Normally i use beautiful soup. Just start out with scrapy
    Eaves Cat
    Hello, Anyone here?
    I have a question about lua script when I used scrapy_splash, I can't resolve it , could someone help me please.
    This is my lua script, and the error reported like this >>
    hello sir
    i have a problem with web scraping
    could you like to help me?
    hello everyone!!!
    please help me~~~
    i have a problem with web scraping


    Can someone help me with my scrapy spider ? I'm beginner in python. I have wrote a scrapy spider which retrieve 100 urls from a rest api, scrape each url and extract the data, then post the items to another rest endpoint through the pipelines.

    The problem is it is very slow, for 100 urls the jobs sometimes take 1 minutes but sometime 10 minutes to finish. All the urls are on diffrents domains/websites so there is no probleme gettings ban. Each websites receive only one request.

    What could be the possible issue ?

    Thank you.

    Thiago Marcello
    how to loggin scrapy?
    Vishesh Mangla
    unable to login through scrapy