Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
    Ville Tuulos
    @tuulos
    @andrew-candela cool! Please let us know if you hit any issues or if you have ideas for improvement
    jangjs1991
    @jangjs1991
    @tuulos This is really awesome do you possibly have some timeline on it?
    Kemal Tugrul
    @kemalty
    Hi all - I have a question about running in AWS. I have a complex project structure. The main file that has FlowSpec is using some files from parent folders. In local, it all works fine thanks to PYTHONPATH env. However, when I move the flow to AWS, it cannot find the files in parent folders. I did some googling and it looks like Metaflow just moves the files that are placed in child folders. I am curious if is it possible to, basically move the whole project to AWS with metaflow, so I can use the files/scripts that are placed in parent directories?
    ayorgo
    @ayorgo

    Hi Metaflow,
    I'm having issues with the @conda and @conda_base functionality. It doesn't seem to work at all on my machine. I've tried it with different versions of numpy and pandas and different versions of python interpreter. I've also reinstalled my anaconda3 from scratch but it didn't help. The version of a library has some correlation with the version of python interpreter but no correlation with what I specify in the decorator.
    My code is

    from metaflow import FlowSpec, step, conda
    
    
    class CondaFlow(FlowSpec):
    
        @conda(libraries={'numpy':'1.15.4'})
        @step
        def start(self):
            import numpy as np
            print('numpy', np.__version__)
    
            self.next(self.end)
    
        @step
        def end(self):
            pass
    
    if __name__ == '__main__':
        CondaFlow()

    I run it as follows

    python conda_flow.py --environment=conda run

    which outputs

    Metaflow 2.0.3 executing CondaFlow for user:ayorgo
    Validating your flow...
        The graph looks good!
    Running pylint...
        Pylint not found, so extra checks are disabled.
    2020-04-09 11:09:17.711 Bootstrapping conda environment...(this could take a few minutes)
    2020-04-09 11:09:42.926 Workflow starting (run-id 39):
    2020-04-09 11:09:43.940 [39/start/127 (pid 18442)] Task is starting.
    2020-04-09 11:09:44.334 [39/start/127 (pid 18442)] numpy 1.16.2
    2020-04-09 11:09:44.416 [39/start/127 (pid 18442)] Task finished successfully.
    2020-04-09 11:09:45.438 [39/end/128 (pid 18462)] Task is starting.
    2020-04-09 11:09:45.786 [39/end/128 (pid 18462)] Task finished successfully.
    2020-04-09 11:09:45.787 Done!

    What am I doing wrong?
    Thank you

    Valay Dave
    @valayDave

    @ayorgo : I tried the same thing and it is working:

    Metaflow 2.0.3 executing CondaFlow for user:valay
    Validating your flow...
        The graph looks good!
    Running pylint...
        Pylint is happy!
    2020-04-09 03:44:40.925 Bootstrapping conda environment...(this could take a few minutes)
    2020-04-09 03:46:03.408 Workflow starting (run-id 1586429163403837):
    2020-04-09 03:46:04.088 [1586429163403837/start/1 (pid 24441)] Task is starting.
    2020-04-09 03:46:04.409 [1586429163403837/start/1 (pid 24441)] numpy 1.15.4
    2020-04-09 03:46:04.466 [1586429163403837/start/1 (pid 24441)] Task finished successfully.
    2020-04-09 03:46:05.136 [1586429163403837/end/2 (pid 24461)] Task is starting.
    2020-04-09 03:46:05.428 [1586429163403837/end/2 (pid 24461)] Task finished successfully.
    2020-04-09 03:46:05.429 Done!

    Can you do which python to check path of python ?

    ayorgo
    @ayorgo
    @valayDave I have an alias for python which points at /usr/bin/python3.7, trying it with python3 and python3.7 made no difference. Must be something wrong with my setup.
    Savin
    @savingoyal
    @kemalty Any way you can move your FlowSpec to the parent directory?
    @ayorgo I wasn't able to reproduce the issue as well
    Valay Dave
    @valayDave
    @ayorgo : You need your conda python in path
    Savin
    @savingoyal
    Can you print your $PATH?
    Valay Dave
    @valayDave
    @ayorgo : Can you put Conda in $PATH
    @ayorgo : As @savingoyal said
    Kemal Tugrul
    @kemalty
    @savingoyal Thank you for your reply :). Actually I can’t without breaking the repo apart. I also discovered that most of the files are not transferred to AWS instances via @batch. If there is a way to specify which directories or files to move to AWS instances using metaflow, this would solve a lot of our issues :)
    Savin
    @savingoyal
    @ayorgo Also, within your flow, in step start, can you print sys.executable?
    Savin
    @savingoyal
    @kemalty While undesirable, can you work around by symlinking the directories in a child folder of the flow?
    Kemal Tugrul
    @kemalty
    @savingoyal I see. But will Metaflow follow the link and move the file from wherever it is to the folder where symlink is?
    Savin
    @savingoyal
    @kemalty Let me quickly verify.
    @kemalty Looks like we don't at the moment, but we should add that capability at the minimum
    Kemal Tugrul
    @kemalty
    @savingoyal Thank you very much! That would be nice to have such capability. I need to move some custom files to EC2 instances to make my code work in AWS. So if there would be a way to move any file via metaflow, that would be great!
    Savin
    @savingoyal
    @kemalty You can give it a try with this PR - Netflix/metaflow#177
    Kemal Tugrul
    @kemalty
    @savingoyal wow you are fast :) . I will try that out and will let you know 👍
    Savin
    @savingoyal
    Sounds good. Another approach is to use git submodules.
    Kemal Tugrul
    @kemalty
    That looks like a way too :)
    Yuki Katoh
    @yukiego
    Hi all,
    I know airflow support is an open issue, but I just like to schedule a whole metaflow task in an airflow like BashOperator(bash_command='python blablah.py --environment=conda run') without breaking it down to separate airflow tasks.
    This ends up with an error "task log handler task does not support read logs" in airflow UI. Looks to me airflow can't find metaflow log signals by default.
    Are there any workarounds atm??
    Thank you!
    Savin
    @savingoyal
    @yukiego I am not super familiar with how airflow stores logs on it's end. Also, metaflow has a bunch of dependencies (metaflow pip package and miniconda), are you able to specify those before executing the flow? Maybe it will be helpful to open a github issue on Netflix/metaflow so that the wider community can pitch in.
    ayorgo
    @ayorgo

    @savingoyal, @valayDave thank you for your replies. It doesn't seem to matter whether I run it with ~/anaconda3/bin/python3.7 or /usr/bin/python3.7.
    The output of print(sys.executable) (right next to print('numpy', np.__version__)) is as follows

    /home/ayorgo/anaconda3/envs/metaflow_CondaFlow_linux-64_59404471a86321f42d4940162e998845a68b9e8b/bin/python

    If I activate the respective environment using conda activate it indeed has the specified version of numpy:

    (metaflow_CondaFlow_linux-64_59404471a86321f42d4940162e998845a68b9e8b) ayorgo:~$ conda list numpy
    # packages in environment at /home/ayorgo/anaconda3/envs/metaflow_CondaFlow_linux-64_59404471a86321f42d4940162e998845a68b9e8b:
    #
    # Name                    Version                   Build  Channel
    numpy                     1.15.4          py37h8b7e671_1002    conda-forge

    but somehow the code inside the step fails to see it even if I run my flow from within this activated environment.

    Savin
    @savingoyal
    @ayorgo What version of numpy do you see if you don't activate the environment. Basically, just enter the REPL for the interpreter /home/ayorgo/anaconda3/envs/metaflow_CondaFlow_linux-64_59404471a86321f42d4940162e998845a68b9e8b/bin/python and check the version of numpy.
    Also, what's your version of conda?
    And can you print the PYTHONPATH from within your step and outside your step?
    ayorgo
    @ayorgo
    $ ~/anaconda3/envs/metaflow_CondaFlow_linux-64_59404471a86321f42d4940162e998845a68b9e8b/bin/python
    Python 3.7.5 (default, Oct 25 2019, 15:51:11) 
    [GCC 7.3.0] :: Anaconda, Inc. on linux
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import numpy as np
    >>> np.__version__
    '1.16.2'
    Weird.
    Savin
    @savingoyal
    And what's your PYTHONPATH?
    ayorgo
    @ayorgo

    From outside the step

    ['/tmp/tmpfgjx75_a', '/home/ayorgo/source/metaflow-tutorials/11-conda', '/tmp/tmpfgjx75_a', '/home/ayorgo/anaconda3/envs/metaflow_CondaFlow_linux-64_59404471a86321f42d4940162e998845a68b9e8b/lib/python37.zip', '/home/ayorgo/anaconda3/envs/metaflow_CondaFlow_linux-64_59404471a86321f42d4940162e998845a68b9e8b/lib/python3.7', '/home/ayorgo/anaconda3/envs/metaflow_CondaFlow_linux-64_59404471a86321f42d4940162e998845a68b9e8b/lib/python3.7/lib-dynload', '/home/ayorgo/.local/lib/python3.7/site-packages', '/home/ayorgo/anaconda3/envs/metaflow_CondaFlow_linux-64_59404471a86321f42d4940162e998845a68b9e8b/lib/python3.7/site-packages']

    From inside the step

    ['/tmp/tmpxfv5dcyh', '/home/ayorgo/source/metaflow-tutorials/11-conda', '/tmp/tmpxfv5dcyh', '/home/ayorgo/anaconda3/envs/metaflow_CondaFlow_linux-64_3d3fb5f54327676ef9bf2dfb5b4d0a3b8244dc31/lib/python37.zip', '/home/ayorgo/anaconda3/envs/metaflow_CondaFlow_linux-64_3d3fb5f54327676ef9bf2dfb5b4d0a3b8244dc31/lib/python3.7', '/home/ayorgo/anaconda3/envs/metaflow_CondaFlow_linux-64_3d3fb5f54327676ef9bf2dfb5b4d0a3b8244dc31/lib/python3.7/lib-dynload', '/home/ayorgo/.local/lib/python3.7/site-packages', '/home/ayorgo/anaconda3/envs/metaflow_CondaFlow_linux-64_3d3fb5f54327676ef9bf2dfb5b4d0a3b8244dc31/lib/python3.7/site-packages']

    Which are nearly identical (down to the /tmp/... bits)

    Yuki Katoh
    @yukiego
    @savingoyal sure, will open an issue in github thank you!!!
    Savin
    @savingoyal
    @ayorgo what’s the PYTHONPATH on the terminal?
    ayorgo
    @ayorgo
    It's empty. I used sys.path in python instead.
    Savin
    @savingoyal
    Okay I see the issue - /home/ayorgo/.local/lib/python3.7/site-packages' is present in your sys.path before /home/ayorgo/anaconda3/envs/metaflow_CondaFlow_linux-64_59404471a86321f42d4940162e998845a68b9e8b/lib/python3.7/site-packages
    Give me some time to theorize why this might be happening
    Just curious, are you executing your flow from /home/ayorgo/? If so, can you move it to a different directory and give it another try?
    ayorgo
    @ayorgo
    Yes I run it from within home. Running it from /usr/11-conda/ doesn't change anything.
    Savin
    @savingoyal
    Can you do export PYTHONNOUSERSITE=True before running your flow?
    I am actually surprised by this behavior because we execute the user code with ../python -s to pre-empt this exact scenario
    ayorgo
    @ayorgo
    It gives
    Traceback (most recent call last):
      File "conda_flow.py", line 3, in <module>
        from metaflow import FlowSpec, step, conda
    ModuleNotFoundError: No module named 'metaflow'
    Error in sys.excepthook:
    Traceback (most recent call last):
      File "/usr/lib/python3/dist-packages/apport_python_hook.py", line 63, in apport_excepthook
        from apport.fileutils import likely_packaged, get_recent_crashes
      File "/usr/lib/python3/dist-packages/apport/__init__.py", line 5, in <module>
        from apport.report import Report
      File "/usr/lib/python3/dist-packages/apport/report.py", line 30, in <module>
        import apport.fileutils
      File "/usr/lib/python3/dist-packages/apport/fileutils.py", line 23, in <module>
        from apport.packaging_impl import impl as packaging
      File "/usr/lib/python3/dist-packages/apport/packaging_impl.py", line 24, in <module>
        import apt
      File "/usr/lib/python3/dist-packages/apt/__init__.py", line 23, in <module>
        import apt_pkg
    ModuleNotFoundError: No module named 'apt_pkg'
    
    Original exception was:
    Traceback (most recent call last):
      File "conda_flow.py", line 3, in <module>
        from metaflow import FlowSpec, step, conda
    ModuleNotFoundError: No module named 'metaflow'
    Savin
    @savingoyal
    thanks! this is as expected. Let me do some more digging
    ayorgo
    @ayorgo
    Same goes with python -s
    Savin
    @savingoyal
    Yeah internally we copy the metaflow package into a tmp folder and execute the user code by enabling PYTHONNOUSERSITE to provide isolation against site-packages installed within the user scope
    Savin
    @savingoyal
    To complete the thread, explicitly overriding PYTHONNOUSERSITE in lieu of python -s fixes the issue reported by @ayorgo - Netflix/metaflow#178
    Ville Tuulos
    @tuulos
    @jangjs1991 hard to give timelines because the work/life circumstances are quite unpredictable right now, but there's active development happening both for K8S and Sagemaker. Hopefully you will see at least PRs in the next few months
    Kemal Tugrul
    @kemalty

    Hey all - I have started to use metaflow recently. My projects are using pip as package handler instead of anaconda, so I wrote a pip handler for @batch steps in metaflow. Here are my functions, incase you also wanna use pip instead of conda (use at your own risk :) ):

    https://gist.github.com/kemalty/ab57aa6ea8e75ed9c70ddb0b4c5034ba

    Ville Tuulos
    @tuulos
    @kemalty thanks for sharing! We have been considering how to support pip within @conda better / safely