Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
    Kenta Murata
    @mrkn
    You can convert an Arrow::Table to Python’s pyarrow.Table by to_python provided by red-arrow-pycall https://github.com/red-data-tools/red-arrow-pycall
    Or, using Arrow::Table#each_record_batch and then using Arrow::RecordBatch#each, you can iterate records in a table.
    Sutou Kouhei
    @kou
    What do you want to do by iterating it?
    If Apache Arrow provides an operation what you want, you should use the operation. For example, Apache Arrow provides sum: https://github.com/apache/arrow/blob/master/c_glib/test/test-int32-array.rb#L56
    If you need to iterate all records, table.raw_records.each will be the fastest way.
    https://diary.kitaitimakoto.net/2019/12/21.html
    This Japanese article may be helpful for sum case.
    Phil
    @QuakePhil
    Thank you once again 8)
    Bhargav Parsi
    @bhargav265_twitter

    Hello, I am having an issue installing red-arrow in the centos 7 OS.
    I tried doing

    yum install -y https://apache.bintray.com/arrow/centos/$(cut -d: -f5 /etc/system-release-cpe)/apache-arrow-release-latest.rpm
    yum install -y --enablerepo=epel arrow-devel # For C++
    yum install -y --enablerepo=epel arrow-glib-devel # For GLib (C)
    yum install -y --enablerepo=epel parquet-devel # For Apache Parquet C++
    yum install -y --enablerepo=epel parquet-glib-devel # For Parquet GLib (C)

    But it fails with some libraries not installed.

    /opt/rubies/ruby-2.6.3/lib/ruby/gems/2.6.0/gems/extpp-0.0.8/ext/extpp/libruby-extpp.so: undefined reference to `rb_ivar_set'
    /opt/rubies/ruby-2.6.3/lib/ruby/gems/2.6.0/gems/extpp-0.0.8/ext/extpp/libruby-extpp.so: undefined reference to `rb_check_typeddata'
    /opt/rubies/ruby-2.6.3/lib/ruby/gems/2.6.0/gems/extpp-0.0.8/ext/extpp/libruby-extpp.so: undefined reference to `rb_iv_set'
    /opt/rubies/ruby-2.6.3/lib/ruby/gems/2.6.0/gems/extpp-0.0.8/ext/extpp/libruby-extpp.so: undefined reference to `rb_sym2id'
    /opt/rubies/ruby-2.6.3/lib/ruby/gems/2.6.0/gems/extpp-0.0.8/ext/extpp/libruby-extpp.so: undefined reference to `rb_gc_writebarrier_unprotect'
    /opt/rubies/ruby-2.6.3/lib/ruby/gems/2.6.0/gems/extpp-0.0.8/ext/extpp/libruby-extpp.so: undefined reference to `rb_block_call'
    /opt/rubies/ruby-2.6.3/lib/ruby/gems/2.6.0/gems/extpp-0.0.8/ext/extpp/libruby-extpp.so: undefined reference to `rb_protect'
    /opt/rubies/ruby-2.6.3/lib/ruby/gems/2.6.0/gems/extpp-0.0.8/ext/extpp/libruby-extpp.so: undefined reference to `rb_str_new_static'
    /opt/rubies/ruby-2.6.3/lib/ruby/gems/2.6.0/gems/extpp-0.0.8/ext/extpp/libruby-extpp.so: undefined reference to `rb_ary_detransient'
    /opt/rubies/ruby-2.6.3/lib/ruby/gems/2.6.0/gems/extpp-0.0.8/ext/extpp/libruby-extpp.so: undefined reference to `rb_define_method'
    /opt/rubies/ruby-2.6.3/lib/ruby/gems/2.6.0/gems/extpp-0.0.8/ext/extpp/libruby-extpp.so: undefined reference to `rb_define_class'
    /opt/rubies/ruby-2.6.3/lib/ruby/gems/2.6.0/gems/extpp-0.0.8/ext/extpp/libruby-extpp.so: undefined reference to `rb_cData'
    /opt/rubies/ruby-2.6.3/lib/ruby/gems/2.6.0/gems/extpp-0.0.8/ext/extpp/libruby-extpp.so: undefined reference to `rb_intern2'
    /opt/rubies/ruby-2.6.3/lib/ruby/gems/2.6.0/gems/extpp-0.0.8/ext/extpp/libruby-extpp.so: undefined reference to `rb_scan_args'
    /opt/rubies/ruby-2.6.3/lib/ruby/gems/2.6.0/gems/extpp-0.0.8/ext/extpp/libruby-extpp.so: undefined reference to `rb_funcallv'
    /opt/rubies/ruby-2.6.3/lib/ruby/gems/2.6.0/gems/extpp-0.0.8/ext/extpp/libruby-extpp.so: undefined reference to `rb_call_super'
    /opt/rubies/ruby-2.6.3/lib/ruby/gems/2.6.0/gems/extpp-0.0.8/ext/extpp/libruby-extpp.so: undefined reference to `rb_ivar_get'
    /opt/rubies/ruby-2.6.3/lib/ruby/gems/2.6.0/gems/extpp-0.0.8/ext/extpp/libruby-extpp.so: undefined reference to `rb_intern'
    /opt/rubies/ruby-2.6.3/lib/ruby/gems/2.6.0/gems/extpp-0.0.8/ext/extpp/libruby-extpp.so: undefined reference to `rb_gc_unregister_address'
    /opt/rubies/ruby-2.6.3/lib/ruby/gems/2.6.0/gems/extpp-0.0.8/ext/extpp/libruby-extpp.so: undefined reference to `rb_data_typed_object_wrap'
    /opt/rubies/ruby-2.6.3/lib/ruby/gems/2.6.0/gems/extpp-0.0.8/ext/extpp/libruby-extpp.so: undefined reference to `rb_cObject'
    /opt/rubies/ruby-2.6.3/lib/ruby/gems/2.6.0/gems/extpp-0.0.8/ext/extpp/libruby-extpp.so: undefined reference to `rb_obj_class'

    These are the errors i found in the log. Any leads would be super helpful

    Sutou Kouhei
    @kou
    Could you open an issue at https://issues.apache.org/jira/browse/ARROW ?
    I want you to upload full log but Gitter isn't suitable for it.
    Could you show how did you install red-arrow? (You just show only how you installed Apache Arrow C++ and Apache Arrow GLib.)
    Could you show how did you install Ruby?
    Bhargav Parsi
    @bhargav265_twitter
    Yes! Thanks for getting back to me. I will upload the full log in in the issues.
    I am trying to install red-arrow by gem install red-arrow
    For ruby, we used rvm and the ruby version is 2.6.3
    Bhargav Parsi
    @bhargav265_twitter
    This is how we install ruby in our containers
    ARG ruby_version=2.6.3
    ARG use_jemalloc=0
    RUN if [ “$use_jemalloc” = “1" ]; then curl -L https://github.com/fullstaq-labs/fullstaq-ruby-server-edition/releases/download/epic-1/fullstaq-ruby-${ruby_version}-jemalloc-rev0-centos-7.x86_64.rpm -o fullstaq-ruby-${ruby_version}-jemalloc-rev0-centos-7.x86_64.rpm \
          && yum install -y fullstaq-ruby-${ruby_version}-jemalloc-rev0-centos-7.x86_64.rpm \
          && mkdir /opt/rubies \
          && ln -s /usr/lib/fullstaq-ruby/versions/${ruby_version}-jemalloc /opt/rubies/ruby-${ruby_version} \
          && ln -s /opt/rubies/ruby-${ruby_version} /opt/rubies/default \
          && yum clean all; \
        else yum install -y ruby`echo ${ruby_version} | tr -d \.` \
          && ln -s /opt/rubies/ruby-${ruby_version} /opt/rubies/default \
          && mkdir -p /usr/lib/fullstaq-ruby \
          && yum clean all; \
        fi
    Sutou Kouhei
    @kou
    yum install -y ruby`echo ${ruby_version} | tr -d \.`
    doesn't work on CentOS 7.
    How do you prepare Yum configuration?
    Or are you using use_jemalloc=1?
    Bhargav Parsi
    @bhargav265_twitter
    It is this line which causes the error https://github.com/red-data-tools/extpp/blob/9e25f1de6bbc21f0f664997bbc81ebed58c8215f/lib/extpp/compiler.rb#L111 Btw, I have tried in macos and linux and the installation works. The problem is only with centos 7
    Sutou Kouhei
    @kou
    I want to know how to install Ruby.
    Let's discuss on JIRA: https://issues.apache.org/jira/browse/ARROW-10309
    Erik Fonselius
    @Fonsan
    Hey everyone, I am so excited to see this project, I noticed that a few of the initializers were out of date, posted an issue here: https://issues.apache.org/jira/browse/ARROW-10587
    I am looking to perform binary searches on columns, since bsearch_index is tightly coupled to ruby Array it is not usable. Is there a faster/better alternative to dropping in https://github.com/boggle/bin_search or similar?
    Erik Fonselius
    @Fonsan
    [100] pry(main)> Benchmark.realtime {slow_array.each { } }
    => 1.2109970001038164
    [101] pry(main)> Benchmark.realtime {slow_array.values.each { } }
    => 0.008631999953649938
    slow_array = Arrow::Int64Array.new(100_000.times.map { rand(10_000) })
    Sutou Kouhei
    @kou
    Why do you perform binary search?
    Is "binary" search required?
    Erik Fonselius
    @Fonsan
    I am playing around with the openstreetmaps data set, one part of it is “nodes” which are id, lat, long and another part is “ways” which have lists of node refererences [123, 124 … ] . In order to construct a the lat long coordinates for a road you must look them up in the nodes table
    Which has billions of entries
    Sutou Kouhei
    @kou
    slow_array.each needs C data to Ruby conversion for each value.
    slow_array.values.each converts C data to Ruby by slow_array.values and .each is Array#each. Array#each doesn't need to convert C data to Ruby for each value.
    Erik Fonselius
    @Fonsan
    Indeed, this is what i suspected. If it is intended it should be added to the backlog for things to go in the documentation.
    Quite hefty with 3 orders of magnitude difference for iterating :)
    Sutou Kouhei
    @kou
    Generally, users should not iterate each value.
    Even on Ruby's Array.
    Users should use compute functions implemented C++ for performance.
    But it's WIP.
    Erik Fonselius
    @Fonsan
    Where can I read more about it?
    Erik Fonselius
    @Fonsan
    thanks
    I take the Group with the WIP comment is the first attempt at this?
    *Group class
    Sutou Kouhei
    @kou
    Yes. But it's difficult...
    kojix2
    @kojix2
    Sorry if this is a misguided comment. If you're looking for a fast array like NumPy, I recommend numo-narray. NArray contains the major features of NumPy and is capable of performing calculations at almost the same speed. Numo-narray is used in Andrew Kane's machine learning projects and Rumale. Of course, NArray is not a substitute for Apache Arrow, but it works fine for some purpose.
    Erik Fonselius
    @Fonsan
    thanks, I will have a look at it :)
    Tjad Clark
    @tjad
    @mrkn Regarding memory issues with GC. I think we can only try our best to avoid this problem arising. I think that a huge cause for slow execution of MXNet is Ruby's GVL. I am looking at getting MXNet Runtime to run outside the GVL, in the hope that mxnet can then process ops fast enough and fully independent of GC.
    I'm not seeing an easy route forward using callbacks.
    Kenta Murata
    @mrkn
    @tjad The problem is due to async calculation in MXNet, that is symbol APIs, isn't it? If so, we need to introduce a memory management system for keeping parameters of async call.
    Currently, I'm busy for working on Ruby 3 related jobs, so I don't have much time for working on MXNet. Maybe I can help you by coding after releasing Ruby 3.
    Tjad Clark
    @tjad
    @mrkn Thank you. Yes we ideally would require memory management. However, it seems like callbacks are NOT passed back through imperative_invoke. If this is true, we would not be able to perform memory management reliably. The effort would be large and unnecessary. If callbacks are not passed back through imperative_invoke, I find it strange, that would mean no language binding is currently doing that required memory management.
    As an alternative, I thought we could speed up mxnet processing by taking it out of context of the GVL. I think this is definitely required either way.
    Perhaps I should open a separate issue for taking MXNet out of GVL context ? I would need this specifically as I have 2 CPU intensive libraries. one is Eventmachine, the other is MXNet, and under the GVL these libraries would contend for CPU time, and ultimately slow each other down. I have observed this already.
    Tjad Clark
    @tjad
    For more information, they both contend as they both have event loops or task loops which are designed with expectation of having enough CPU time. The GVL is a huge limitation here when 2 such libraries are running in a single process, and each library is dependent on the processing speed of the other
    In my case, when MXNet slows down(takes long with processing), my Eventmachine eventloop fills up, and when my eventmachine eventloop is busy, my MXNet task queue slows down, this because they are contending for CPU time under the GVL
    Kenta Murata
    @mrkn
    @tjad You mean that other Ruby threads cannot run during MXImperativeInvoke is calling?
    If so, we can simply enclose the MXImperativeInvoke call by rb_thread_call_without_gvl.
    Tjad Clark
    @tjad
    @mrkn Correct. That's some great insight - I've been wondering where rb_thread_call_without_gvl would fit in.
    1 reply
    @mrkn as an aside: I also just caught up with some Ruby 3 features which may change the game for concurrency a bit in the future.
    https://rubykaigi.org/2019/presentations/yukihiro_matz.html
    Kenta Murata
    @mrkn
    @tjad It is better if we could support Ractor.
    Tjad Clark
    @tjad
    Sounds good. Probably a major version change for mxnet.rb ha ha
    Tjad Clark
    @tjad
    In Ruby 2.x, it is not at all possible to truly "free" mxnet's engine from the GV as I imagined, not without creating a ruby mxnet engine implementation. Not desirable.
    This said, giving mxnet engine true parallel capability through Ruby 3.x Ractors as per your suggestion is the only route forward.