Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
    Zhiming Wang
    @zmwangx
    And the only reason it doesn't show up currently is because its outermost div class is g mnr-c g-blk rather than g — i.e., the URL post processing code has no effect on that (youtube.com certainly can't be a relative link). When I parse classes appropriately, the result does show up regardless.
    Zhiming Wang
    @zmwangx
    I spent more time investigating those top "card result", and the more I investigate, the more I think they should be included. The reason is that sometimes it contains the "official result", and all further results are non-official (the card result is not duplicated). For instance, if you google "gangnam style", the card result gives you the official MV, and the second result (the first one if you exclude the card result) is some ridiculous fanmade crap titled "PSY- Gangnam Style (Official Music Video)"... And if you google "star wars rogue one trailer", same story.
    Terminator X
    @jarun
    Sure thing! I'm OK with exploring new stuff.
    I think we can easily test these for an hour each and figure out what we want to do.
    I'ld be really happy if we have to parse less. Currently it takes longer than it should and the lesser the condition checks, the faster it would work. So please continue.
    As we are already fetching gzip compressed results, parsing is the place we should check out for performance improvements.
    Zhiming Wang
    @zmwangx
    The bottleneck here is most likely IO (networking), so I don't think parsing performance matters at all. It's just a straightforward one-pass parser.
    IMO we should simplify the logic for robustness and maintainability.
    Currently parser logic is a mess. If you actually follow the parser logic, you'll see that we basically have a blob of poorly connected GOTOs.
    What I've done so far is to rework the parser logic for extensibility and maintainability, and thoroughly document it. In that process I noticed that some tests can be dropped and some can be made more rigorous.
    Terminator X
    @jarun
    The bottleneck here is most likely IO (networking) - with my network speed, I don't think so. I guess I should do a profiling. We can add this to the task list as well.
    Currently parser logic is a mess. - I'm afraid it's true. The problem is none of the guys who worked on the parser (including me or Narrat) had the time to re-work it. So we built-up on it.
    I noticed that some tests can be dropped and some can be made more rigorous - awesome. We'll figure out regressions, if any, during testing.
    Zhiming Wang
    @zmwangx

    No need to profile. Comment out resp_body = gzip.GzipFile and parser.feed(resp_body) for no IO and no parsing, and in Zsh

    > time ( repeat 10 googler --np hello >/dev/null )
    ( repeat 10; do; googler --np hello > /dev/null; done; )  0.60s user 0.20s system 43% cpu 1.865 total

    Uncomment resp_body = gzip.GzipFile for IO but no parsing:

    ( repeat 10; do; googler --np hello > /dev/null; done; )  0.70s user 0.22s system 15% cpu 5.780 total

    Now uncomment parser.feed(resp_body) for IO and parsing (this is from my new parser logic branch, which even has slightly more overhead):

    ( repeat 10; do; googler --np hello > /dev/null; done; )  0.97s user 0.22s system 20% cpu 5.760 total

    As you can see, parsing time is negligible; IO is not.

    Note that I'm on Wi-Fi right now; might be better on Ethernet. But it's by no means slow Wi-Fi. According to speedtest-cli: latency 4.856 ms, Download: 189.38 Mbit/s, Upload: 165.39 Mbit/s.

    Terminator X
    @jarun
    We make gzip compression optional.
    Today.
    Ahh sorry
    Can we have the numbers with fetching results without gzip compression?
    Zhiming Wang
    @zmwangx
    I'm in a meeting right now. Will get back to you later.
    Zhiming Wang
    @zmwangx
    Not much different without gzip:
    ( repeat 10; do; googler --np hello > /dev/null; done; )  0.86s user 0.22s system 19% cpu 5.572 total
    Zhiming Wang
    @zmwangx

    Actually I made a mistake when measuring "no IO and no parsing" time. I only commented out the reading from socket (downloading) part; opening connection and waiting time were included. The correct time is something like

    ( repeat 10; do; googler --np hello > /dev/null; done; )  0.53s user 0.21s system 93% cpu 0.797 total

    so IO time is even longer, almost 500ms. In fact, I verified with Chromium's network tab, and indeed on my current Wi-Fi it takes about 500ms in total (waiting time included, ~100ms) to finish the GET request.

    Terminator X
    @jarun
    OK... in that case we are fine with gzip.
    Zhiming Wang
    @zmwangx
    By the way, in the end I still omitted the "card result" (https://github.com/jarun/googler/blob/master/googler#L317-L319) because sometimes it is duplicated (Wikipedia result could be duplicated, for instance), plus the fact that it doesn't have an abstract.
    Did you test for any regressions? I already did randomized tests, but given that Google doesn't serve the same thing to everyone, we might still need more testing.
    Terminator X
    @jarun
    By the way, in the end I still omitted the "card result"
    okies
    Did you test for any regressions?
    I will in the coming weekend.
    too busy with office due to an ongoing release :)
    Terminator X
    @jarun
    Can you check with -d switch if your query is being redirected?
    OK leave it...
    Zhiming Wang
    @zmwangx
    Redirected to?
    Terminator X
    @jarun
    if i run a google search, it redirects to server for India
    in regular browser
    wanted to check something
    figured it out anyway
    stoneluo86
    @stoneluo86
    how can i use proxy in googler?
    Zhiming Wang
    @zmwangx
    You either need to use a global/command level proxy (that is, outside googler), or see jarun/googler#37
    Terminator X
    @jarun
    Use tsocks
    The last time I heard, googler works fine on it.
    stoneluo86
    @stoneluo86
    does that mean my proxy was blocked by google
    Zhiming Wang
    @zmwangx
    Yes, Google blocks most free proxies you can find on the web. Not exactly surprising.
    stoneluo86
    @stoneluo86
    i understand,tks
    Zhiming Wang
    @zmwangx
    No problem.
    stoneluo86
    @stoneluo86
    i add cookie header and proxy works
    Terminator X
    @jarun
    awesome! you mean in Set-Cookie conn.request, right?
    stoneluo86
    @stoneluo86
    some time work,Occassionally fail.I use chrome works fine through proxy,so i add all request header from chrome to conn.request
    Terminator X
    @jarun
    Can you please share the details here... From different countries and config it may be different. Will help us to investigate this and optionally add some ourselves.
    Terminator X
    @jarun
    For*