Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
    Rahul Padmanabhan
    @rahul3
    or are there any other tests you would run?
    Aaron Radzinski
    @aradzinski
    You should also just run the tests from the project
    Mvn clean package verify
    Rahul Padmanabhan
    @rahul3
    sounds good
    I'm building old examples (previous versions) because the current ones, don't seem to be able to be built
    skhdl
    @skhdl
    About maven
    from folder "nlpcraft-examples/alarm" command "mvn clean package" works too
    Rahul Padmanabhan
    @rahul3
    Thanks Sergey. You were right, after a day of tearing my hair out, I realized it was a bad download of a dependency in the .m2 folder. Deleting ~/.m2/repository and re-building solved my problem.

    @skhdl and @aradzinski - Question for you: There are no jupyter dependencies in the python side.

    Do you also want this part of the project to be functional? I.e. enabling someone to run the jupyter notebook here: src/main/python/ctxword/jupyter ?

    I see the folder exists with the files, but nothing is there to make it run. The only dependency I remember seeing was matplotlib
    Aaron Radzinski
    @aradzinski
    I think it would be nice
    Rahul Padmanabhan
    @rahul3
    I'll open a ticket for that
    Rahul Padmanabhan
    @rahul3

    Created pull request for some inter-related JIRAs. Basically for Python. [apache/incubator-nlpcraft#18].

    It is pretty big and I added information in the pull request. Please review.

    Let me know if you want me to send an email on the record to the dev-group explaining how the python side would work from now. It is a pretty big difference from what was happening earlier.

    Aaron Radzinski
    @aradzinski
    I think in cases like this (big architectual change) - the email to dev list is an excellent idea
    we rarelly do it but I think it deserves the attention
    Rahul Padmanabhan
    @rahul3
    I just realized why this is guaranteed to fail in Windows. Working on it.
    The build, not the architectural change
    skhdl
    @skhdl
    I have added a few questions and comments for this PR
    Rahul Padmanabhan
    @rahul3
    Thanks. I’ll reply to them in the next 24/48 hours
    Rahul Padmanabhan
    @rahul3
    Replied to it. It is a long reply, don’t hate me please. I didn’t know how else to answer those questions without going into a little detail :)
    Rahul Padmanabhan
    @rahul3

    Prior to the pull request 18, there have been several issues on the Python side such as the ctxserver not starting. Bert failing on systems with CUDA (GPU) etc, system version of Python used etc. How do we proceed with the pull request 18?

    I ask because, it is difficult to fix everything at once, given that there are a lot of existing issues on the Python side. I would recommend that they be separate tickets etc. I've replied on the pull request in detail.

    skhdl
    @skhdl
    I agree with your position about bunch of small tasks instead of one big task, but if we merge some intermediate work result in master we can't release new product version
    until all related task are not ready. So I guess we should keep this part of work in separated branch now.
    Rahul Padmanabhan
    @rahul3

    Maybe I’m mistaken but I’m not sure if I understand the difference in working functionality of the python side before and after the pull request.

    Can you please raise tickets about what exactly is left? I’m not certain what the success criteria is.

    The build works in Maven from what I have checked.

    I had tried before and the ctxserver never worked ( even after manual start, it wouldn’t work on any system with a gpu and cuda) and spacy server had to be manually started up to function as well. Was the python side working for anyone prior to the pull request?

    what is different between the previous working state of the python side of the project before and after this change?

    skhdl
    @skhdl
    Master branch - ctxServer works for me for macos (without gpu), for @aradzinski for win (with gpu)
    Of course master branch version contains a lot of problems, if it doesn't work on your environment.
    skhdl
    @skhdl
    Also very important that python part is optional in master branch. Mentors can test it not having python env (python tests skipped for them)
    Apache checks work.
    Rahul Padmanabhan
    @rahul3
    Thanks for the confirmation. Let me know what exactly you need me to do on the pull request for it to be able to merge it and I’ll get to work.
    And try to finish it as fast as I can :)
    I still don’t think the python part should be optional given that the ctxserver needs to be used as a core part of the project though.
    skhdl
    @skhdl
    I'll review current PR state tomorrow and will prepare very short final remarks list (if it will be necessary)
    Note please that ctxserver is core part but optional
    Aaron Radzinski
    @aradzinski
    Gents - keep in mind that python part must be 100% optional from usability standpoint....
    NLPCraft is a Scala/Java project that has optional component that requires Python
    all default tool chain must work/build/compile without any Python-related installation or requirements
    Let's not lose the sight of that as we are improving our Python support
    Rahul Padmanabhan
    @rahul3

    Thanks for clarifying. It may be a good idea to have a build switch property that enables/disables python component builds in the future.

    On a différent note, if you could point me to any easy/medium complexity Scala tickets to work on, I would appreciate it.

    Aaron Radzinski
    @aradzinski
    how about NLPCRAFT-361
    skhdl
    @skhdl

    I still didn't do detailed code review.
    I just tried to review how it works and want to shortly describe my remarks which should be fixed in current PR.
    After we 'll do detailed code review code, etc

    If somebody has another opinion - please let's discuss.
    I can move this discussion to devList, @aradzinski - what do you think?

    1) python part should be optional.

    • required libs for project are just: java 11 and maven.
    • mvn clean package/verify/install should work as in master branch without python stuff.
    • Apache actions should work https://github.com/apache/incubator-nlpcraft/actions
    • ctx server and spacy are disabled by default.
      So, by default everything should be same as in master branch and requires installed java and maven only.

    2) python stuff usage.
    there are few python stuff installations ways:

    • via scripts,
    • via maven to simplify OS depended issues.
      if maven way is prefered - it should be processed with some parameter - like mvn clean package/verify/install -some_parameter_use_python (if it is possible with maven)
      • required libs for project: java 11, maven, python, conda (? -is it possible to install it on backstage?)

    (@rahul3, -Dauto-rm-existing-python-env= - is good thinks I guess, but it is about general approach)

    3) main current remarks about implementation.
    3.1) We can't keep such progress bar in out installation output
    (Look please at attached file)
    How to avoid it - another question, but our final installation product can't print it I guess

    3.2) Also we have to try to find some way to avoid warning like

    python/fasttext_module/fasttext/pybind/fasttext_pybind.cc:346:35: warning: comparison of integers of different signs: 'int32_t' (aka 'int') and 'std::__1::vector<long long>::size_type' (aka 'unsigned long') [-Wsign-compare]
        for (int32_t i = 0; i < vocab_freq.size(); i++) {

    I see that some of these things are out of our control - but we have to do some workarounds, change our approaches etc - we can't provide it to our users (Especially progressbar)

    3.3) start_server.sh
    Duplicated output - should be fixed (it is still here)

    CUDA is not available for Torch.
    CUDA is not available for Torch.
    [2021-08-23 12:52:42]: 80055 DEBUG Registering ctxserver blueprint
    [2021-08-23 12:52:42]: 80057 DEBUG Registering ctxserver blueprint

    3.3) server starts too slowly comparing to previous version

    I think 3.3 and 3.4 - significant issues and should be resolved in this ticket (we can't merge implementations with such degradation)

    Conclusions:
    I guess after these 6 points (1, 2, 3.1-3.4) resolving, we can start deep code review and detailed tests on various OS.
    Other improvements etc can be moved on next tickets.

    skhdl
    @skhdl
    current progress bar output.png
    skhdl
    @skhdl
    above - current progress bar output
    Aaron Radzinski
    @aradzinski
    we need to move this discussion to dev list - otherwise too many places to look
    I'm already getting lost
    skhdl
    @skhdl
    done, @rahul3 - do you receive devList email messages ?
    Rahul Padmanabhan
    @rahul3
    Yes I do
    Will check it in detail after work today . Say in 24/48 hours
    Aaron Radzinski
    @aradzinski
    👌
    Rahul Padmanabhan
    @rahul3
    Sorry. Need more time. Slammed at work. Will check by this weekend
    Aaron Radzinski
    @aradzinski
    no worry! Free time always ebbs and flows…
    Rahul Padmanabhan
    @rahul3
    FYI : I’m slammed with schoolwork as semester started (apart from my job) . I’ll resume this task and also contribute more when I have free time. Likely December but may be before.
    Aaron Radzinski
    @aradzinski
    N/p
    We are embarking on rework on core APIs- so it is a good timing