Where communities thrive

  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community

    I'm having issues with metaflow not finding previous runs. I am trying to do this via a jenkins pipeline. I have a training flow that I'm trying to reference in my inference flow. I am reading in the same config file. The strange part is that the same flows/runs are available before I read in the config file as are available after I've read in the config file. So before I read in the config file I get the following (setting namespace to none to check all available flows)

    get_namespace(): None
    list(Metaflow()): [Flow('training_flow_1')]
    metaflow_config.DATASTORE_SYSROOT_S3: None
    get_metadata(): local@/home/ec2-user/workspace/models-training_staging

    After I read in the config file

    get_namespace(): None
    list(Metaflow()): [Flow('training_flow_1')]
    metaflow_config.DATASTORE_SYSROOT_S3: s3://metaflow-staging-uat-metaflows3bucket-j8dasuvadiq/metaflow
    get_metadata(): local@/home/ec2-user/workspace/models-training_staging

    So the metadata is still showing local which I think may be related to the issue, but the DATASTORE_SYSROOT_S3 is updated after the config is read in so it definitely is reading in the file. But trying to find something run in a production namespace (i.e. that I ran via stepfunctions) returns an empty list.

    When I try to run my inference flow I get the following (again after reading in the config and setting namespace to none):

    get_namespace(): None
    metaflow_config.DATASTORE_SYSROOT_S3: s3://metaflow-staging-uat-metaflows3bucket-v49t2hau629c/metaflow
    get_metadata() local@/home/ec2-user/workspace/models-inference_staging
    list(Metaflow()) []

    So it seems the issue here is that even though the config file is read in and DATASTORE_SYSROOT_S3 is set correctly, the metadata points to a local folder, which is different between training and inference. So they are isolated. I tried setting the metadata manually by using the ServiceUrl from my cloudformation stack:


    but I get the error

    Metaflow service [https://tid44ehxm0.execute-api.us-west-2.amazonaws.com/api] unreachable.

    Any idea what's going on here? Again, from the metadata and the fact that the same flows are listed before and after I read in the config file it seems like it is somehow ignoring the config settings when reading/writing flows, so I am unable to find my training run when I'm running my inference flow. Thanks

    10 replies
    Snehal Shirgure
    Hello folks, has anyone looked into inspecting data and metrics from runs using a visualization tool such as ipywidgets in jupyter notebooks? Any suggestions/ideas on the same lines are welcome :)
    3 replies
    Malay Shah
    Hey guys, I am looking into the implementation of metaflow and how metaflow interacts with batch and other services of aws. I wanted to look at the code for that but could not find the code related to the same. Can anyone point me to the script or class that handles all the interaction with aws? Thank you very much.
    2 replies
    Are there any plans to allow batch steps to reuse the same container? At the moment it feels we get a massive slow down when moving from local runs to batch runs, because each step suddenly requires the whole batch scheduling, draw down of containers etc. I am keen to have the flexibilty of demanding extra resources etc for a particular step, but oftentimes consecutive steps don't need additional resources. [ obviously one can collapse these steps together, but then one loses the whole retry functionality ]
    19 replies
    Ayotomiwa Salau
    Hello, I was trying out metaflow in a notebook, I got this error "Flow('PlayListFlow') does not exist". I can't find a way to instantiate/create a flow in a notebook.
    2 replies
    Kyle Smith
    Hello, I'm doing some initial work for a manual deployment. It's important that we only use a private subnet. Is this feasible? Why does the default installation include a public subnet?
    3 replies
    Elham Zamansani
    Hey guys, I have a problem in importing environment decorator. I guess there is a bug there. Because when I import it as follows: from metaflow import FlowSpec, step, environment , it gives an error that environment is not callable and it makes sense, because when import like this, metaflow wants to read from environment.py script. I did a small test. If I change the name of environment in line 27 from environmnet_decorator.py to anything else and then import that, it works. Could you please check it or correct me if I miss sth regarding the import?
    9 replies
    Analysing job run times. Hi we would be interested in monitoring AWS batch run times ideally within cloudwatch. https://docs.aws.amazon.com/batch/latest/userguide/batch_cwet.html provides a very useful stream of information. metaflow provides eg run_id (https://github.com/Netflix/metaflow/blob/04881c58c22e4e7e66a4faa7f676fcfca454c027/metaflow/plugins/aws/batch/batch.py#L127) which appears in this stream. My question is how we can cross reference this to eg run parameters. so that we can get aggregate statistics for eg a given parameter.
    10 replies
    Robert Sandmann
    Dear metaflow team, first of all I want to thank you for this great piece of technology! It's quite amazing how insanely easy you make it for our data scientists to define their workflows, especially when you look at the internals on how you realize that, great work!
    A use case that we currently try to implement is using metaflow to setup a workflow for federated learning. That means that we do the client training in a foreach for every client which works great.The problem now is that we need to repeat these federated training rounds which introduces cycles into the DAG.
    Our current approach is to monkeypatch metaflow internals to allow cycles in the DAG and dynamically add new steps for every round using a custom FlowDecorator.
    This approach seems rather hacky (and is not yet quite working).
    Branch specific concurrency (Netflix/metaflow#172) or graph composition (https://github.com/Netflix/metaflow/issues/144) might make our lives easier.
    But I was wondering if you had any ideas on how to make this possible in the current state of metaflow. I'm grateful for any hints!
    2 replies
    Patrick John Chia

    Hello! @christineyu-coveo and I have been using metaflow recently and really enjoy it. We also face another issue related to using @batch and @environemnt.

    Consider the following

    def step_A(self):
    def step_B(self):

    Metaflow initializes decorators for all steps before running any step. For @environment this includes running step_init, where it updates the environment variables
    based on the vars passed in the decorator. Following the above flow, when we are running step_A, the environment decorator for step_B will also be initialzied, and an exception will occur because var_2 is None in the batch enviornment for step_A, since it was not included in the @environment decorator for step_A. Our current fix involves disabling enitrely step_init for @environment. While this works for our use case (i.e. >1 @batch steps, with use of @environment in either or both @batch steps), I suspect this might disable some of the other usecases of @environment. Do you have any alternate solutions to this problem? Prehaps batch decorator could be modified to also allow for inclusion of environemnt variables that we want to ship with the job.

    7 replies

    metaflow could not install or find cuda in GPU environment and pytorch could not use GPU at all, issue was marked as resolved on Netflix/metaflow#250 but I could not replicate it.

    sample code test_gpu.py I used

    from metaflow import FlowSpec, step, batch, IncludeFile, Parameter, conda, conda_base
    class TestGPUFlow(FlowSpec):
        @batch(cpu=2, gpu=1, memory=2400)
        @conda(libraries={'pytorch': '1.5.1', 'cudatoolkit': '10.1.243'})
        def start(self):
            import os
            import sys
            import torch
            from subprocess import call
            print(os.popen("nvcc --version").read())
            print('__Python VERSION:', sys.version)
            print('__pyTorch VERSION:', torch.__version__)
            print('__CUDA VERSION')
            print('__CUDNN VERSION:', torch.backends.cudnn.version())
            print('__Number CUDA Devices:', torch.cuda.device_count())
            call(["nvidia-smi", "--format=csv",
            print('Active CUDA Device: GPU', torch.cuda.current_device())
            print('Available devices ', torch.cuda.device_count())
            print('Current cuda device ', torch.cuda.current_device())
            print(f"GPU count: {torch.cuda.device_count()}")
        def end(self):
    if __name__ == "__main__":

    cmd line I used

    USERNAME=your_name CONDA_CHANNELS=default,conda-forge,pytorch METAFLOW_PROFILE=your_profile AWS_PROFILE=your_profile python test_gpu.py --datastore=s3 --environment=conda run --with batch:image=your_base_image_with_cuda_support

    metaflow output

    2021-03-10 18:38:13.783 [82/start/876 (pid 8796)] [2bb1b538-fe24-4174-9066-94fe53629e38] | NVIDIA-SMI 450.51.06    Driver Version: 450.51.06    CUDA Version: N/A      |
    2021-03-10 18:38:13.783 [82/start/876 (pid 8796)] [2bb1b538-fe24-4174-9066-94fe53629e38] |-------------------------------+----------------------+----------------------+
    2021-03-10 18:38:13.784 [82/start/876 (pid 8796)] [2bb1b538-fe24-4174-9066-94fe53629e38] | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
    2021-03-10 18:38:13.784 [82/start/876 (pid 8796)] [2bb1b538-fe24-4174-9066-94fe53629e38] | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
    2021-03-10 18:38:13.784 [82/start/876 (pid 8796)] [2bb1b538-fe24-4174-9066-94fe53629e38] |                               |                      |               MIG M. |
    2021-03-10 18:38:13.785 [82/start/876 (pid 8796)] [2bb1b538-fe24-4174-9066-94fe53629e38] |===============================+======================+======================|
    2021-03-10 18:38:13.785 [82/start/876 (pid 8796)] [2bb1b538-fe24-4174-9066-94fe53629e38] |   0  Tesla V100-SXM2...  Off  | 00000000:00:1E.0 Off |                    0 |
    2021-03-10 18:38:13.785 [82/start/876 (pid 8796)] [2bb1b538-fe24-4174-9066-94fe53629e38] | N/A   43C    P0    41W / 300W |      0MiB / 16160MiB |      0%      Default |
    2021-03-10 18:38:13.786 [82/start/876 (pid 8796)] [2bb1b538-fe24-4174-9066-94fe53629e38] |                               |                      |                  N/A |
    2021-03-10 18:38:13.786 [82/start/876 (pid 8796)] [2bb1b538-fe24-4174-9066-94fe53629e38] +-------------------------------+----------------------+----------------------+
    2021-03-10 18:38:13.787 [82/start/876 (pid 8796)] [2bb1b538-fe24-4174-9066-94fe53629e38]
    2021-03-10 18:38:13.787 [82/start/876 (pid 8796)] [2bb1b538-fe24-4174-9066-94fe53629e38] +-----------------------------------------------------------------------------+
    2021-03-10 18:38:13.787 [82/start/876 (pid 8796)] [2bb1b538-fe24-4174-9066-94fe53629e38] | Processes:                                                                  |
    2021-03-10 18:38:13.788 [82/start/876 (pid 8796)] [2bb1b538-fe24-4174-9066-94fe53629e38] |  GPU   GI   CI        PID   Type   Process name                  GPU M

    Any idea what's wrong?

    19 replies
    Jacopo Tagliabue
    Hi MF community, small request for feedback! We just posted a brief article with code on re-imagining model cards in a DAG-first world. Looking for honest feedback: if you like "DAG cards", we may invest some time in building a configurable package and release it (ping me anytime).
    4 replies
    Taleb Zeghmi
    Ayotomiwa Salau
    Hello guys,
    Battling with this connection error. I restarted the port 443, 8080. No avail
    requests.exceptions.ConnectionError: HTTPSConnectionPool(host='42xg9kw0rk.execute-api.us-east-1.amazonaws.com', port=443): Max retries exceeded with url: /api/flows/MovieStatsFlow (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7fec6ace2358>: Failed to establish a new connection: [Errno -2] Name or service not known',))
    4 replies
    Anirudh Kaushik
    Metaflow seems to ignore @retry decorators on a @catch'd step if it's followed by a @catch'd step. If I've got start -> @retry @catch A -> @catch B -> end, and step A raises an exception, A won't retry at all. The flow goes to step B. Is this normal?
    5 replies
    Vishal Siramshetty

    Hi all,

    I'm having an issue with IncludeFile. When I'm trying to pass my training data file as an input, it throws an error:

    AttributeError: 'bytes' object has no attribute 'path'

    I am trying to read the file in one of the steps using Pandas. I'd really appreciate any suggestions to deal with this issue.

    Thank you,

    8 replies
    Richard Puckett
    Is there a best-practice way to deploy Metaflow into an environment that has no inbound access from the Internet? Everything would be run from within the VPC. Thanks!
    3 replies
    Ryan Chui

    In the step function role in the metaflow cloudformation template:

            - PolicyName: AllowCloudwatch
                Version: '2012-10-17'
                  - Sid: CloudwatchLogDelivery
                    Effect: Allow
                      - "logs:CreateLogDelivery"
                      - "logs:GetLogDelivery"
                      - "logs:UpdateLogDelivery"
                      - "logs:DeleteLogDelivery"
                      - "logs:ListLogDeliveries"
                      - "logs:PutResourcePolicy"
                      - "logs:DescribeResourcePolicies"
                      - "logs:DescribeLogGroups"
                    Resource: '*'

    What is the action logs:PutResourcePolicy used for?

    8 replies
    Corrie Bartelheimer

    Hey, I want to dynamically change the required resources for a step and found this example Netflix/metaflow#431 for a workaround:

    @resources(cpu=8, memory=os.environ['MEMORY'])

    and then starting the flow with MEMORY=16000 python myflow.py run. This works fine locally but fails when running with batch. Am I missing something?
    Or is there any other way to change the resources using parameters or similar without creating different sized steps?

    9 replies
    Greg Hilston

    Hey @savingoyal and other Metaflow developers, myself and some colleagues are getting to the point where we'll have a PR ready for the metaflow-tools repo. This PR will add a deploy-able Terraform stack.

    We've read through the CONTRIBUTING.md file and found this older issue that documents asking for a Terraform stack:


    Our goal is to have this PR submitted by the end of this week and just wanted to start the dialogue with you guys. Super excited to see what happens :)

    7 replies

    Hi Guys,

    Could you provide some sort of diagram of AWS resources, required to run metaflow on cloud? Cloudformation template is not much helpful. The yaml file is huge

    4 replies
    Antoine Tremblay
    Hi, I just realized that Metaflow doesn't show output that comes from the standard logging modules... like calls to logging.info("something") are not printed out.... is there a way to make those print ?
    2 replies
    David Patschke
    Is there a way to launch a Flow run via command-line with --namespace that sets the run to the global namespace?
    I tried the suggestion in the CLI help recommending the empty string (--namespace=) but when I run get_namespace() within the Flow (or current.namespace), I'm still getting the user namespace. I also tried setting it to --namespace=None but that uses the string 'None' vs. NoneType. As per the Metaflow docs, I'm hesitant to hardcode namespace(None) into my code as a workaround.
    5 replies
    Ayotomiwa Salau
    Battling with this server connection error. I restarted the port 443, 8080. No avail
    requests.exceptions.ConnectionError: HTTPSConnectionPool(host='42xg9kw0rk.execute-api.us-east-1.amazonaws.com', port=443): Max retries exceeded with url: /api/flows/MovieStatsFlow (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7fec6ace2358>: Failed to establish a new connection: [Errno -2] Name or service not known',))
    I pip installed metaflow on local. Please assistance.
    17 replies
    David Patschke
    Would it be possible to upgrade the version of pylint that Metaflow requires for the next release?
    It is currently at < 2.5.0.
    I've started to get a nasty pylinting error that is associated with pandas. This issue appears to be resolved in pylint 2.7. Here is a link to the issue I'm experiencing: PyCQA/pylint#3836
    4 replies
    Richard Puckett
    Sorry if I missed any previous questions on this, but I'm curious if I'm doing something wrong here. Seems execution stops after instantiating TestFlow (see gist). Is this expected behavior? Thanks. https://gist.github.com/rapuckett/b5355828695d1f7711400ddd837c5ede
    13 replies
    Sam Petulla
    Does anyone know the expected date for supporting Sagemaker Models?
    5 replies
    Anyone have any good dashboard notebooks for inspecting runs in progress?
    I've just been using the one from the metaflow tutorial, but it's pretty barebones and I'm wondering if anyone came up with something nicer
    6 replies
    Hi guys.
    I created a step function for my flow and specified CPU and RAM for steps using resources decorator, but after the run I noticed that it ignores values (to be more correct, it runs on default container) that are lower than metaflow defaults which are cpu : 1, memory : 4000. The situation is different when using batch decorator, in that case it allows smaller containers. Anyone facing this issue or having an idea why we can't specify smaller resources while running flow as a step function ?
    7 replies

    Hey All, Quick question,

    Does anyone know if there are any PyCharm settings or PyCharm plugins for having your IDE to check if all the right dependencies have been loaded at the _"@step" level?

    1 reply

    I am trying to run metaflow inside a docker container, but I have to run it as non-root user. When I try to import with

    "metaflow configure import metaflow_config/config.txt"

    I get "PermissionError: [Errno 13] Permission denied: '/.metaflowconfig'"

    I have tried changing the permissions with chown and chmod, currently set to

    drwxrwxrwx 2 1000 1000 6 Apr 5 19:01 .metaflowconfig

    But no luck. Can I run metaflow inside a docker container without being a root user?

    6 replies
    Can you inherit and extend a flow?
    I'l like to reuse the entire flow, and just override one of the steps
    4 replies
    when trying to do this, metaflow complains about steps being missing whenever I try to do "self.next" to a step that exists in the parent flow class
    Has anyone been able to debug in Pycharm using the conda decorators, and not gotten a "No conda installation found. Install conda first" message? I followed the instructions in the documentation (using env var PATH with path to conda executable) but it still doesn't recognize it
    20 replies
    Corrie Bartelheimer
    Hey everyone, one question regarding metaflow on step functions. Is it possible to somewhere see when a flow was last updated, i.e. redeployed?
    2 replies
    Sam Petulla
    Is it possible to remove any services from the cloudformation template? The services list is pretty extensive, wondering if any can be removed for cost-savings
    5 replies
    Sam Petulla

    Separately, anyone had this issue?

    I can run aws c3 cp (my file) (remote bucket) and it pulls from the $AWS_PROFILE variable correctly (with an alias set on aws command to do so). However, running METAFLOW_PROFILE=personal python 05-helloaws/helloaws.py --datastore=s3 run I'm getting a token expired error which, I think is because it is using the wrong profile. Any tips on how to debug without just switching profile names. I will need to use a named profile.

    7 replies
    Oleg Avdeev

    I've created an Netflix/metaflow#473 to stop supporting Python 2.x from the next Metaflow release. Mostly because it will allow type annotations from Python 3, and make codebase more contributor friendly.

    It seems like a pretty conservative move given that Python 2.7 has been EOL'ed more than a year ago. But I'm curious if anyone here is still using Metaflow with Python 2.7 and would be affected by this change?


    I am getting internal server errors when I add 'METAFLOW_DEFAULT_METADATA': 'service' to my config file. My config file contains METAFLOW_SERVICE_URL, METAFLOW_SERVICE_INTERNAL_URL and METAFLOW_SERVICE_AUTH_KEY and I have verified they match what is in the cloudformation stack output.

    when I try to run a script locally with

    python inference-flow.py --environment=conda --datastore=s3 run

    I get the following

    Bootstrapping conda environment...(this could take a few minutes)
        Metaflow service error:
        Metadata request (/flows/inference-flow) failed (code 500): {"message": "Internal server error"}

    If I try to run step functions create I get the following:

    Running pylint...
        Pylint is happy!
    Deploying inference_flow to AWS Step Functions...
        Internal error
    Traceback (most recent call last):
      File "/usr/local/lib/python3.7/site-packages/metaflow/cli.py", line 930, in main
        start(auto_envvar_prefix='METAFLOW', obj=state)
      File "/usr/local/lib/python3.7/site-packages/click/core.py", line 829, in __call__
        return self.main(args, kwargs)
      File "/usr/local/lib/python3.7/site-packages/click/core.py", line 782, in main
        rv = self.invoke(ctx)
      File "/usr/local/lib/python3.7/site-packages/click/core.py", line 1259, in invoke
        return _process_result(sub_ctx.command.invoke(sub_ctx))
      File "/usr/local/lib/python3.7/site-packages/click/core.py", line 1259, in invoke
        return _process_result(sub_ctx.command.invoke(sub_ctx))
      File "/usr/local/lib/python3.7/site-packages/click/core.py", line 1066, in invoke
        return ctx.invoke(self.callback, ctx.params)
      File "/usr/local/lib/python3.7/site-packages/click/core.py", line 610, in invoke
        return callback(args, kwargs)
      File "/usr/local/lib/python3.7/site-packages/click/decorators.py", line 33, in new_func
        return f(get_current_context().obj, args, kwargs)
      File "/usr/local/lib/python3.7/site-packages/metaflow/plugins/aws/step_functions/step_functions_cli.py", line 88, in create
      File "/usr/local/lib/python3.7/site-packages/metaflow/plugins/aws/step_functions/step_functions_cli.py", line 120, in check_metadata_service_version
        version = metadata.version()
      File "/usr/local/lib/python3.7/site-packages/metaflow/plugins/metadata/service.py", line 41, in version
        return self._version(self._monitor)
      File "/usr/local/lib/python3.7/site-packages/metaflow/plugins/metadata/service.py", line 288, in _version
        (path, resp.status_code, resp.text),
    NameError: name 'path' is not defined

    list(Metaflow()) gives the following

    Traceback (most recent call last):
      File "cleanup.py", line 14, in <module>
      File "/usr/local/lib/python3.7/site-packages/metaflow/client/core.py", line 245, in __iter__
        all_flows = self.metadata.get_object('root', 'flow')
      File "/usr/local/lib/python3.7/site-packages/metaflow/metadata/metadata.py", line 357, in get_object
        return cls._get_object_internal(obj_type, type_order, sub_type, sub_order, filters, *args)
      File "/usr/local/lib/python3.7/site-packages/metaflow/plugins/metadata/service.py", line 116, in _get_object_internal
        return MetadataProvider._apply_filter(cls._request(None, url), filters)
      File "/usr/local/lib/python3.7/site-packages/metaflow/plugins/metadata/service.py", line 247, in _request
    metaflow.plugins.metadata.service.ServiceException: Metadata request (/flows) failed (code 500): {"message": "Internal server error"}
    script returned exit code 1

    Any ideas what might be happening?

    16 replies
    Mike Bentley Mills
    Hi I'm wondering if there is a way of seeing the code packages that where saved? Specifically can you look at them using the Metaflow Run Task modules.
    8 replies
    Edvin Močibob
    Hi, is there a way to inspect a past run and see the parameters (metaflow.Parameter) it was started with?
    2 replies
    This has probably been asked before but, can you do conditional self.next ?
    3 replies
    if x:
    sorry. that's a rhetorical question. I guess I meant. What's the recommended way to do something like that? cc @tuulos
    10 replies