Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
    Erik Fonselius
    @Fonsan
    Hey everyone, I am so excited to see this project, I noticed that a few of the initializers were out of date, posted an issue here: https://issues.apache.org/jira/browse/ARROW-10587
    I am looking to perform binary searches on columns, since bsearch_index is tightly coupled to ruby Array it is not usable. Is there a faster/better alternative to dropping in https://github.com/boggle/bin_search or similar?
    Erik Fonselius
    @Fonsan
    [100] pry(main)> Benchmark.realtime {slow_array.each { } }
    => 1.2109970001038164
    [101] pry(main)> Benchmark.realtime {slow_array.values.each { } }
    => 0.008631999953649938
    slow_array = Arrow::Int64Array.new(100_000.times.map { rand(10_000) })
    Sutou Kouhei
    @kou
    Why do you perform binary search?
    Is "binary" search required?
    Erik Fonselius
    @Fonsan
    I am playing around with the openstreetmaps data set, one part of it is “nodes” which are id, lat, long and another part is “ways” which have lists of node refererences [123, 124 … ] . In order to construct a the lat long coordinates for a road you must look them up in the nodes table
    Which has billions of entries
    Sutou Kouhei
    @kou
    slow_array.each needs C data to Ruby conversion for each value.
    slow_array.values.each converts C data to Ruby by slow_array.values and .each is Array#each. Array#each doesn't need to convert C data to Ruby for each value.
    Erik Fonselius
    @Fonsan
    Indeed, this is what i suspected. If it is intended it should be added to the backlog for things to go in the documentation.
    Quite hefty with 3 orders of magnitude difference for iterating :)
    Sutou Kouhei
    @kou
    Generally, users should not iterate each value.
    Even on Ruby's Array.
    Users should use compute functions implemented C++ for performance.
    But it's WIP.
    Erik Fonselius
    @Fonsan
    Where can I read more about it?
    Erik Fonselius
    @Fonsan
    thanks
    I take the Group with the WIP comment is the first attempt at this?
    *Group class
    Sutou Kouhei
    @kou
    Yes. But it's difficult...
    kojix2
    @kojix2
    Sorry if this is a misguided comment. If you're looking for a fast array like NumPy, I recommend numo-narray. NArray contains the major features of NumPy and is capable of performing calculations at almost the same speed. Numo-narray is used in Andrew Kane's machine learning projects and Rumale. Of course, NArray is not a substitute for Apache Arrow, but it works fine for some purpose.
    Erik Fonselius
    @Fonsan
    thanks, I will have a look at it :)
    Tjad Clark
    @tjad
    @mrkn Regarding memory issues with GC. I think we can only try our best to avoid this problem arising. I think that a huge cause for slow execution of MXNet is Ruby's GVL. I am looking at getting MXNet Runtime to run outside the GVL, in the hope that mxnet can then process ops fast enough and fully independent of GC.
    I'm not seeing an easy route forward using callbacks.
    Kenta Murata
    @mrkn
    @tjad The problem is due to async calculation in MXNet, that is symbol APIs, isn't it? If so, we need to introduce a memory management system for keeping parameters of async call.
    Currently, I'm busy for working on Ruby 3 related jobs, so I don't have much time for working on MXNet. Maybe I can help you by coding after releasing Ruby 3.
    Tjad Clark
    @tjad
    @mrkn Thank you. Yes we ideally would require memory management. However, it seems like callbacks are NOT passed back through imperative_invoke. If this is true, we would not be able to perform memory management reliably. The effort would be large and unnecessary. If callbacks are not passed back through imperative_invoke, I find it strange, that would mean no language binding is currently doing that required memory management.
    As an alternative, I thought we could speed up mxnet processing by taking it out of context of the GVL. I think this is definitely required either way.
    Perhaps I should open a separate issue for taking MXNet out of GVL context ? I would need this specifically as I have 2 CPU intensive libraries. one is Eventmachine, the other is MXNet, and under the GVL these libraries would contend for CPU time, and ultimately slow each other down. I have observed this already.
    Tjad Clark
    @tjad
    For more information, they both contend as they both have event loops or task loops which are designed with expectation of having enough CPU time. The GVL is a huge limitation here when 2 such libraries are running in a single process, and each library is dependent on the processing speed of the other
    In my case, when MXNet slows down(takes long with processing), my Eventmachine eventloop fills up, and when my eventmachine eventloop is busy, my MXNet task queue slows down, this because they are contending for CPU time under the GVL
    Kenta Murata
    @mrkn
    @tjad You mean that other Ruby threads cannot run during MXImperativeInvoke is calling?
    If so, we can simply enclose the MXImperativeInvoke call by rb_thread_call_without_gvl.
    Tjad Clark
    @tjad
    @mrkn Correct. That's some great insight - I've been wondering where rb_thread_call_without_gvl would fit in.
    1 reply
    @mrkn as an aside: I also just caught up with some Ruby 3 features which may change the game for concurrency a bit in the future.
    https://rubykaigi.org/2019/presentations/yukihiro_matz.html
    Kenta Murata
    @mrkn
    @tjad It is better if we could support Ractor.
    Tjad Clark
    @tjad
    Sounds good. Probably a major version change for mxnet.rb ha ha
    Tjad Clark
    @tjad
    In Ruby 2.x, it is not at all possible to truly "free" mxnet's engine from the GV as I imagined, not without creating a ruby mxnet engine implementation. Not desirable.
    This said, giving mxnet engine true parallel capability through Ruby 3.x Ractors as per your suggestion is the only route forward.
    Is it possible to permanently disable a Ractor's Lock mechanism ?
    Tjad Clark
    @tjad
    I've ugraded my mxnet.rb / eventmachine application to use ruby 3.x . It's actually running! Thanks @kou for keeping numo-narray updated!
    This aligns my short term path with Ractor implementation. Moving ahead with mrkn/mxnet.rb#53
    Erik Fonselius
    @Fonsan
    Having trouble installing head, apache-arrow works fine but apache-arrow-glib fails
    20:20 fonsan@Eriks-MBP:~/code/arrow/ruby/red-arrow d1091bbe3(ruby-refactor-table-initialize✗)(ruby-2.7.2)
     $ brew install apache-arrow-glib --head
    Updating Homebrew...
    ==> Auto-updated Homebrew!
    Updated Homebrew from 12495bc80 to 6d850a97a.
    No changes to formulae.
    
    ==> Cloning https://github.com/apache/arrow.git
    Updating /Users/fonsan/Library/Caches/Homebrew/apache-arrow-glib--git
    ==> Checking out branch master
    Already on 'master'
    Your branch is up to date with 'origin/master'.
    HEAD is now at a7e02c4 ARROW-10639: [Rust] Added examples to is_null kernel and simplified signature.
    Entering 'cpp/submodules/parquet-testing'
    Entering 'testing'
    /Users/fonsan/Library/Caches/Homebrew/apache-arrow-glib--git/cpp/submodules/parquet-testing
    /Users/fonsan/Library/Caches/Homebrew/apache-arrow-glib--git/testing
    ==> ./configure --prefix=/usr/local/Cellar/apache-arrow-glib/HEAD-a7e02c4
    Last 15 lines from /Users/fonsan/Library/Logs/Homebrew/apache-arrow-glib/01.configure:
    2020-11-19 20:20:42 +0100
    
    ./configure
    --prefix=/usr/local/Cellar/apache-arrow-glib/HEAD-a7e02c4
    
    
    READ THIS: https://docs.brew.sh/Troubleshooting
    
    Please create pull requests instead of asking for help on Homebrew's GitHub,
    Twitter or any other official channels.
    brew install apache-arrow-glib --head  5.27s user 7.80s system 79% cpu 16.497 total
    20:20 fonsan@Eriks-MBP:~/code/arrow/ruby/red-arrow d1091bbe3(ruby-refactor-table-initialize✗)(ruby-2.7.2)
    This is on Big Sur
    Erik Fonselius
    @Fonsan
    I am not sure where I should be looking for the error logs
    Sutou Kouhei
    @kou
    Ah, we can't use configurehttps://github.com/Homebrew/homebrew-core/blob/master/Formula/apache-arrow-glib.rb#L27 for --head.
    configure isn't included in repository. It's an auto generated file.
    We should change to use meson + ninja in Homebrew like https://github.com/Homebrew/homebrew-core/blob/master/Formula/glib.rb#L55-L57 .
    Could you send a pull request for it to Homebrew?
    Erik Fonselius
    @Fonsan
    I will have a go at it
    Erik Fonselius
    @Fonsan

    @kou Homebrew/homebrew-core#65245

    When I run the tests in red-arrow I get Error: test_uint8(RawRecordsTableStructArrayTest): TypeError: uninitialize GLib::Object for several of them.

    Erik Fonselius
    @Fonsan
    I am not sure this is due to faulty configuration in my homebrew pull or if it is due to another issue