Where communities thrive

  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
    Marco Occhialini
    from sklearn.metrics import log_loss, accuracy_score
    import numpy as np
    from metaflow import FlowSpec, step, Parameter
    from utils.model_utils import *
    class TreeModel(FlowSpec):
            This flow evaluate a lot of machine learning algorithms
        evals = Parameter('evals',
                           help='Hyperopt Evaluations',
        def start(self):
            print("Loading models and data...")
            self.X_train, self.X_test, self.y_train, self.y_test = load_train_test()
            self.models = load_pipe_parameters() 
  , foreach='models') 
        def train(self):
            self.tup = self.input
            name, model, space, path = self.tup
            self.model = model
            # best = optimize(objective=self.to_be_optimized,
                            # space=space)
  , self.y_train
            print('finishied process for %s' % name)
        def extract_metrics(self):
            name, model, *_ = self.tup
            pred = model.predict(self.X_test)
            acc = accuracy_score(self.y_test, pred)
            log = log_loss(self.y_test, pred)
            print('{} log logg: {}'.format(name, log))
            print('{} : {}'.format(name, acc))
        def join(self):
        def end(self):
    if __name__ == '__main__':
    This is a flow that evaluates the performance of some machine learning algos. the file in utils.model_utils contains two functions load_train_test returns a tuple containing X_train, X_test, y_train, y_test
    Marco Occhialini
    the load_pipe_parameters() functions returns a list of tuples containing an sklearn model. The tuple consist in name, model, space ( for hyperparametrization you can pretend this variable doesn't exist for now, its useless ) and path to parse the model in to a pkl file
    below the return of load_pipe_parameters
     rf = RandomForestClassifier(n_estimators=2000,
    load_pipe_return_example = [('sklearn_rf', rf, space_rf, '../models/sklearn/rf/sklearn_rf_v1.pkl')]
                        #... and it goes on (name,  model, hyperopt space,  path
    my intention is that, for each tuple in the load_pipe_parameters return, the file flows to train, extract method and then join to unify everything
    Marco Occhialini
    But i always face this issue:
    3 replies
    Metaflow 2.0.5 executing TreeModel for user:occhima
    Validating your flow...
        Validity checker found an issue:
        Step end reached before a split started at step(s) join were joined. Add a join step before end.
    hey @Occhima the issue could be that the join step isn't resolving/merging the prior step artifacts from the foreach branches. Check out the foreach docs and also merge_artifacts. Also as a heads up, it looks like you'll want to update predict calls to self.model in extract_metrics :)
    Hey guys just a heads up for something that was a bit of a pain to track down because I kept looking in all the wrong places like a dummy :smile: Netflix/metaflow#292
    1 reply
    Marco Occhialini
    thank you for the advices, any problems ou can spot and tell me i'll be more than gratefull
    Greg Hilston

    Is there a preferred way to launch Meta Flow tasks on AWS Batch with predictable IP addresses? Lets say one wants to whitelist an IP Address, or range, on an application or database.

    I obviously can attempt to handle this the pure AWS way, perhaps having each Batch job run in a VPC, but was curious if there was a Metaflow answer for this situation

    13 replies
    Hi all, I've been giving metaflow a try and have liked a lot of what I see. One problem I've been having is with the logger, where tasks that fail often provide no log information at all, no logging output up to the failure, and crucially no traceback (and no exit code, if the process was killed or segfaulted). I've seen mention of an updated logging system on the way -- is this imminent?
    The only workaround I've found is to resume the job, capture the full command line invocation, kill the flow and run the failing step directly so the logs aren't buffered. Is there a better intermediate solution? Ideally the logs would be somewhere so I wouldn't have to re-run the failed step at all.
    (I'm still trying things out, only the data store is on aws, everything else is local.)
    5 replies
    Hi all, I want to get a better understanding of the build/packaging mechanics of metaflow library. I was surprised, that there are only a few dependencies listed Ist that the one and only place where you define these dependencies? Why not in a requirements.txt? Or something like Pipenv? What I also dont understand is the execution of python tests when building the package via tox Cant that be done within is this the python state of art? Again: I am not very familiar with the python mechanics, but it somehow seems to be "spread" all over the place
    3 replies
    Naman Joshi
    Hi I was playing around metaflow and use cases tutorial and got screwed up my system in "Episode 04" now am not able to run test code from the tutorial guide...when I am trying to Run "Episode 04" am getting below error... can somebody pls help here....
    [developer@ui-lin metaflow-tutorials]$ conda info --envs
    # conda environments:
    base                  *  /home/developer/miniconda3
    [developer@ui-lin metaflow-tutorials]$ python3 04-playlist-plus/ --environment=conda show
    Metaflow 2.2.1 executing PlayListFlow for user:developer
        Internal error
    Traceback (most recent call last):
      File "/usr/local/lib/python3.6/site-packages/metaflow/", line 883, in main
        start(auto_envvar_prefix='METAFLOW', obj=state)
      File "/usr/local/lib/python3.6/site-packages/click/", line 829, in __call__
        return self.main(args, kwargs)
      File "/usr/local/lib/python3.6/site-packages/click/", line 782, in main
        rv = self.invoke(ctx)
      File "/usr/local/lib/python3.6/site-packages/click/", line 1256, in invoke
        Command.invoke(self, ctx)
      File "/usr/local/lib/python3.6/site-packages/click/", line 1066, in invoke
        return ctx.invoke(self.callback, ctx.params)
      File "/usr/local/lib/python3.6/site-packages/click/", line 610, in invoke
        return callback(args, kwargs)
      File "/usr/local/lib/python3.6/site-packages/click/", line 21, in new_func
        return f(get_current_context(), args, kwargs)
      File "/usr/local/lib/python3.6/site-packages/metaflow/", line 777, in start
        ctx.obj.monitor = Monitor(monitor, ctx.obj.environment,
      File "/usr/local/lib/python3.6/site-packages/metaflow/", line 20, in __init__
        self.env_info = env.get_environment_info()
      File "/usr/local/lib/python3.6/site-packages/metaflow/plugins/conda/", line 107, in get_environment_info
        from metaflow.metaflow_config import DEFAULT_ENVIRONMENT
    ImportError: cannot import name 'DEFAULT_ENVIRONMENT'
    19 replies
    Greg Hilston
    Any reason why my EC2 instances are not being terminated after the flow that caused them to be spun up has ended?
    1 reply
    (EJ) Vivek Pandey

    Experiencing this issue at a certain step:

    2020-08-17 21:43:10.709 [321/load_and_prep_provider_claimlines/2261 (pid 13257)] Batch error:
    2020-08-17 21:43:10.709 [321/load_and_prep_provider_claimlines/2261 (pid 13257)] Task crashed due to OutOfMemoryError: Container killed due to memory usage .This could be a transient error. Use @retry to retry.
    2020-08-17 21:43:10.812 [321/load_and_prep_provider_claimlines/2261 (pid 13257)]
    2020-08-17 21:43:11.706 [321/load_and_prep_provider_claimlines/2261 (pid 13257)] Task failed.

    I checked the compute resource provisioning for the Batch Compute Environment, and the instance type is set to optimal, with max vcpus of 64, desired vcpus of 4 and min vcpus of 0.

    2 replies
    Owen Ball
    Hi all. Is it possible to access the @resources values from within a step? I would like to use it to specify the max_memory for h2o based on how much memory has been allocated to the step.
    4 replies
    (EJ) Vivek Pandey
    So i am getting s3 headObject - forbidden errors after the message “workflow starting”. Since it is happening before the job is submitted, it probably isnt the ECS IAM role limitation. The EC2 instance where i am kicking off the flow has S3 getobject, putobject access.
    workflow starting (run-id 221)
    S3 datastore operation _head_s3_object failed (An error occured (403) when calling the HeadObject operation: Foridden)
    workflow failed
    2 replies
    Wooyoung Moon
    Is there an easy way to use a custom launch template for Metaflow batch jobs? More specifically, I'd like to be able to mount an EFS/FSx filesystem to all my batch jobs.
    5 replies
    Submitted a PR for a problematic example in the doc (Netflix/metaflow-docs#4) would be grateful if someone could reproduce the bug I found in Netflix/metaflow-docs#3 and check the change's correctness.
    2 replies
    Hello, I have a parameter in my FlowSpec that I would also like to read in main(). Is there a way to do this (eg with the click package)? For example:
    if __name__ == '__main__':
        my_file = get_file_param()
        my_env_var = read(my_file)
        os.environ['MY_ENV_VAR '] = my_env_var
        MyFlowSpec() # has Parameter('file-param')
    8 replies
    I am using conda create --name metaflow python=3.7 to run a simple example flow but it stops at "Running pylint...". The same example works fine with python=3.8. Is this a known issue? I am running metaflow 2.2.2.
    1 reply
    Question on conda dependency. I am creating a new conda environment: conda create --name metaflowenv metaflow awswrangler python=3.8. I have a simple step with @conda(libraries={'awswrangler': '1.8.1'}) and run the flow: python --environment=conda run, I am getting conda "Error: UnsatisfiableError" with 7 conflicts. Why is the bootstrapping conda environment giving all these conflicts? How do I go about resolving them? Keep up the good work, this product is awesome.
    3 replies
    Malay Shah
    Hello guys, I have been using metaflow since 4 or 5 months now and we had set up a postgres metadata service on an aws instance and used the s3 to store the files. Now somehow, the instance got restarted and I have to generate the service again, in doing that I saw that all my previous flows are not in the metadata db. I cannot access any flows using the metaflow cli but there are files in s3 for the flows. How can I get the metadata db back so that I can access the files or instance objects from metaflow cli? Thanks for the help.
    17 replies
    Hi guys, when using @conda can we install our own packages from a local repository?
    4 replies
    Bahattin Çiniç

    Hi Guys,

    I'm trying to add an integration/unit test for MetaFlow Flows. (I read these docs.)

    When I tried to use flow with use_cli=False mode, I got a could not get source code error. Example code and error are below;

    import unittest
    from flows import OptimizationFlow 
    class TestFLow(unittest.TestCase):
        def test_flow(self):
            flow = OptimizationFlow(use_cli=False)


    Traceback (most recent call last):
      File "/Users/bahattincinic/Projects/xx/xx/xx/tests/cases/", line 16, in test_flow
        flow = OptimizationFlow(use_cli=False)
      File "/usr/local/anaconda3/envs/xx/lib/python3.7/site-packages/metaflow/", line 70, in __init__
        self._graph = FlowGraph(self.__class__)
      File "/usr/local/anaconda3/envs/xx/lib/python3.7/site-packages/metaflow/", line 132, in __init__
        self.nodes = self._create_nodes(flow)
      File "/usr/local/anaconda3/envs/xx/lib/python3.7/site-packages/metaflow/", line 139, in _create_nodes
        tree = ast.parse(inspect.getsource(module)).body
      File "/usr/local/anaconda3/envs/xx/lib/python3.7/", line 973, in getsource
        lines, lnum = getsourcelines(object)
      File "/usr/local/anaconda3/envs/xx/lib/python3.7/", line 955, in getsourcelines
        lines, lnum = findsource(object)
      File "/usr/local/anaconda3/envs/xx/lib/python3.7/", line 786, in findsource
        raise OSError('could not get source code')
    OSError: could not get source code

    I had a chance to read some internal code after the error. I realized that flow cannot run outside of its own file. Because MetaFlow trying to detect the flow graph with AST.

    I saw that MetaFlow uses ( and dynamic code generation (

    is this best practice?

    Should I use like"python run ...params...")


    7 replies
    Question on publishing scheduled step functions. When creating a step function with "step-functions create", how can I provide my defined parameters? I understand you can manually provide the parameters in AWS console like {"Parameters" : "{\"alpha\": 0.5}"} but not clear how to do it for the CloudWatch scheduled rule.
    14 replies
    Question on the default AWS region. When a batch job runs from the step function, how does it know which AWS region to download the code package from? How can I provide my preferred region? Below is the error I am getting. I can fix this by updating the deployed step function and providing the AWS_DEFAULT_REGION environment variable manually but that's not good.
    Setting up task environment.
    Downloading code package.
    fatal error: An error occurred (400) when calling the HeadObject operation: Bad Request
    12 replies
    Is there any way to access the tags from within a step? I see the user and namespace but can't find additional tags.
    4 replies
    Greg Hilston

    Hey guys, I have two questions regarding Metaflow's best practices:

    1. Do you guys recommend using Metaflow for large batch transform jobs? I'm wondering if there's some threshold of data size where the recommended approach may be to use AWS SageMaker for batch transform instead of Metaflow.
    2. Is there a recommended approach for logging steps that are ran remote on AWS Batch?

    Thanks :)

    10 replies
    Christopher Wong
    I’m running into an issue where after a Batch task starts, I’m getting the error
    Batch job error:
    TypeError(“‘NoneType’ object is not iterable”)
    Task failed.
    23 replies
    Christopher Wong
    3 replies
    Sonu Patidar
    @savingoyal can you please tell me something about this error: File "/home/ec2-user/.local/lib/python3.6/site-packages/metaflow/datastore/", line 429, in <lambda>
    2020-09-11 05:22:34.014 [8/start/16 (pid 2995)] transformable_obj.transform(lambda x: pickle.dumps(x, protocol=2))
    2020-09-11 05:22:34.014 [8/start/16 (pid 2995)] TypeError: can't pickle _thread.RLock objects
    9 replies
    Sonu Patidar
    I am using docker container on aws batch and it is not able to find job.tar file. My docker file ends with WORKDIR /srv/ can you tell me where metaflow puts job.tar file?
    29 replies
    hello team,
    i am running a simple Flow with conda decorater @conda(libraries={"beautifulsoup4": "4.9.1", "s3fs": "0.5.1", "pandas": "1.1.2"})
    i am running into error Error: UnsatisfiableError: The following specifications were found to be incompatible with each other:
    i checked a previous post in this channel, with the same error and later there was one PR to solve this issue and fix was included in latest release.
    even though i am getting this error, could someone help me? thanks
    15 replies
    What's the recommended approach to use sensitive information such as passwords? For example, if I want to connect to a database to get some data out but I don't want the password to be in the code. I understand the custom environment variable is not supported.
    5 replies
    Philippe Ombredanne
    Howdy... as a FYI we just made a first release of this new tool that makes extensive use of metaflow for code analysis pipelines for origin and licenses such as these
    I want to thank the metaflow team for making this possible :bow:
    @savingoyal and @romain-intel in partcular ... :bow:
    Note that's a tad off broadway from your standard use case but still works nicely for us
    6 replies
    Valay Dave


    I was recently trying to store the data in the flow MetaflowData object which gets stored at the end of a flow. I was using setattr(self,some_id,dict) to store a large number of objects as a part of the MetaflowData object at the end of the flow. When loading the values from the datum via getattr the first 1000 elements finish within a 1 second. But after that, it took 4 minutes to load the next 3000. I am just simply iterating through the values via getattr . I am assuming that it's getting the data via the metadata and picklising objects. But what could be the reason for such latency gaps ?

    4 replies
    Hey guys, I am a newbie using metaflow, just finished first three tutorials. I have following confusions when I am using metaflow. (1) In metaflow, document says using Includefile to read a .csv local file. I noticed, "IncludeFile" will generate a compress file in the .metaflow folder, but the speed of using pd.read_csv to read this compress file is slower than using pd.read_csv to directly read original csv file especially when the .csv file's size is large. Then what's the reason that metaflow using this "IncludeFile" function to pre-read the file? (2) If I want to read a like 7Gb csv file, directly using pd.read_csv is fine. However, if I use pd.read_csv under the metaflow, after couple of minutes, it will give me like "memory outflow" error. What's the reason of this error, how to avoid this and if there is any way to read this kind of files faster? Thanks!

    Hey guys, I am a newbie using metaflow, just finished first three tutorials. I have following confusions when I am using metaflow. (1) In metaflow, document says using Includefile to read a .csv local file. I noticed, "IncludeFile" will generate a compress file in the .metaflow folder, but the speed of using pd.read_csv to read this compress file is slower than using pd.read_csv to directly read original csv file especially when the .csv file's size is large. Then what's the reason that metaflow using this "IncludeFile" function to pre-read the file? (2) If I want to read a like 7Gb csv file, directly using pd.read_csv is fine. However, if I use pd.read_csv under the metaflow, after couple of minutes, it will give me like "memory outflow" error. What's the reason of this error, how to avoid this and if there is any way to read this kind of files faster? Thanks!

    Plus I used the resouces decorator (memory=16000, cpu=8)to ask more resouces, but it still didn't work

    7 replies
    Philippe Ombredanne
    MO is that your large CSV file may end up being pickled?
    Try to post snippets of your pipeline and the error messages/stack trace in a Gist
    Greg Hilston

    Hey everyone, what's Metaflow's recommended approach for installing a dependency in a remote Conda environment when the package does not exist in Conda or the specific version does not exist in Conda, but does exist in Pip?

    From the documentation I've found that I have two options:

    1. Perform a os.system('pip install my_package') in my Step's code which looks like it should work but does not look like a great solution.
    2. Download the code into my source code directory and import the package from the file system. Also seems like it'd work but not a great solution.

    Are there any options I'm not considering? Perhaps a cleaner approach?

    Philippe Ombredanne
    @GregHilston see Netflix/metaflow#24
    2 replies
    Gustav Svensk

    Hi, I'm not able to view the logs from the metadata_service when running it in a docker container. A minimal example can be achieved by creating a virtual env, entering it and running pip install, I also did

    docker pull netflixoss/metaflow_metadata_service
    ds up -d

    ds logs
    gives me
    metadata_service | 2020/09/21 15:33:31 OK 1_create_tables.sql metadata_service | 2020/09/21 15:33:31 OK 20200603104139_add_str_id_cols.sql metadata_service | 2020/09/21 15:33:31 goose: no migrations to run. current version: 20200603104139 my_postgres | The files belonging to this database system will be owned by user "postgres". my_postgres | This user must also own the server process. ...
    But not serving on ('', 8080)which I would expect.

    On MacOs catalina 10.15.6

    2 replies
    Hey guys, I noticed metaflow did a great job on communicating between local machine and AWS. If I want to implement other cloud system, like Azure Blob Storage, how can I also implement some decorators (like batch). I know s3 module is used in the batch decorator , is there a way to make a change on that? Thanks!
    2 replies
    Greg Hilston

    Hey guys, is there a recommended way to pass secrets to a remote Batch job? I found this being asked in the Gitter back in 2019-12-07 here:

    Basically the same question but with a very specific focus on the values of the environment being treated as sensitive. Therefore the environment_decorator or any other solution that stores the secret in code does not work.

    I read that Savin said the AWS folks are able to seamlessly switch between running locally and remote so there must be a solution to this. Just wanted to ask you guys before trying to come up with any custom way.


    2 replies