Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
    Ville Tuulos
    @tuulos

    📣 Metaflow was just included in the Netflix's security bug bounty program! Find vulnerabilities in the code and get paid for it 💰(Or just enjoy Metaflow getting more secure over time)

    https://bugcrowd.com/netflix/updates/59a4e5dc-5e79-4965-9289-ae5a0d9de044

    Greg Hilston
    @GregHilston

    Hey Metaflow! I have a pretty specific question:

    I find myself having trouble running a flow on AWS Batch that uses a container with pre-installed Python libraries. I happen to be using conda to install a few extra libraries in this step but by doing so, it seems I now have a fragmented environment.

    Any advice on how one can use a Docker container as a base environment and then seemingly add a few more packages in a specific step using conda?

    The success criteria here would be to successfully import a package installed by the Docker image as well as a different package installed by the conda decorator

    9 replies
    russellbrooks
    @russellbrooks
    Sharing a difference in the behavior of --max-workers between the local runtime and when deployed via SFN, specifically when having nested foreach fanouts. Locally, the runtime will enforce the parallelization at the task level so it will never go beyond that, however the SFN concurrency limit is enforced per-split, so the nested fanout will result in an effective parallelism of max-workers^2. Similarly, normal fanouts in a SFN deployment are not rate limited. Not sure it’s worth explicitly stating this in the docs, but thought I’d mention it just in case
    2 replies
    Christopher Wong
    @christopher-wong

    I just noticed Batch has started hitting the Docker free tier rate limit. What’s the best way to mitigate this?

    CannotPullContainerError: Error response from daemon: toomanyrequests: You have reached your pull rate limit. You may increase the limit by authenticating and upgrading: https://www.docker.com/increase-rate-limit

    Any chance we can get a copy of the Metaflow docker image hosted on the new Public ECR repos?

    4 replies
    Luis Arias
    @kaaloo
    Hello Metaflow community
    Just wanted to say I started working with Metaflow recently on processing some wikidump size datasets and after some battling with AWS Batch's Compute Environments and Launch Templates I managed to setup a working pipeline for us. The main challenge was understanding that the compute environment needed to be recreated each time I changed the launch template in spite of using the $Default version. Now the launch template takes care of using a much larger volume for the instances and allowing a lot more space for each docker container. Thanks so much for this wonderful piece of software! Now I'll be working on the next Flow in our pipeline....
    6 replies
    Ji Xu
    @xujiboy
    Hello, as I know that metaflow is supposed to be used in shell environment, I experimented and saw that I can also run metaflows in a notebook. May I know if there are some unforeseen, bad consequences for using it in a notebook?
    29 replies
    Matt Corley
    @corleyma
    I am having some issues with how the Metaflow Stepfunctions integration handles flow parameters. It seems like, only when executing a flow via SFN: flow parameters are converted (rather naively) into environment variables, by upcasing and then prepending with METAFLOW_INIT_. A parameter with a name like "my-param", which is otherwise perfectly valid for Metaflow when using the local runtime, will result in an error when running via SFN, because many shells won't allow env vars with dashes in the name.
    3 replies
    I was hoping to get some clarity on why this behavior exists in the first place for the StepFunctions integration, and then perhaps to strategize about the best approach to reconcile the allowable parameter names for stepfunctions and local runtime flows.
    Ayotomiwa Salau
    @AyonzOnTop
    Hello.
    Happy to be part of the Metaflow community.
    Cheers!
    2 replies
    Antoine Tremblay
    @hexa00
    Hi, is there a way to make a step-function flow in the user namespace ? It seeems like --namespace something has no effect ?
    The use case is that we're many users running different versions of the same flow with step functions....
    9 replies
    Antoine Tremblay
    @hexa00
    Is there a way to have kind of a fallback ECR repo ?
    Use case is that we have some custom images... but we'd still want to allow access to more general images like the default python one ....?
    3 replies
    waz-mataz
    @waz-mataz

    Hi, what is the way to run a nodejs process in the background in metaflow? I am running on batch using a custom docker image that has nodejs and python dependencies. The node app, once started, waits for a json post which is done by a task later in the metaflow python process.

    The way to start the node app to is "npm run dev" however when I use os.system('npm run dev') , the metaflow process gets paused at "App listening on http: / / localhost :8888" (as below) since it starts the node app right away which is then waiting for the json on port 8888. However this will be calculated in a later metaflow step and posted via requests.post("http://localhost:8888/savings-report", json=self.json_structure)

    2021-01-26 00:29:49.835 [4816/start/31201 (pid 78223)] [94fd75e2-9b6a-4c13-87a9-57f6e6d4b811] Starting report generator ...
    2021-01-26 00:29:49.835 [4816/start/31201 (pid 78223)] [94fd75e2-9b6a-4c13-87a9-57f6e6d4b811] > report-generator@1.0.0 dev /usr/src/app
    2021-01-26 00:29:49.836 [4816/start/31201 (pid 78223)] [94fd75e2-9b6a-4c13-87a9-57f6e6d4b811] > ts-node src/server.ts
    2021-01-26 00:29:49.836 [4816/start/31201 (pid 78223)] [94fd75e2-9b6a-4c13-87a9-57f6e6d4b811] App listening on http://localhost:8888

    I would like to start the nodejs app using npm run dev via metaflow and leave it running in the background and continue to the next steps in metaflow

    3 replies
    russellbrooks
    @russellbrooks

    Wondering if there's a more efficient way to implement the following design pattern directly in metaflow such that it would utilize multiprocessing to load and combine multiple dataframes after a foreach fanout:

    df = [input.partition_df for input in inputs]
    df = pd.concat(df, ignore_index=True, sort=False, copy=False)

    A hacky way that's coming to mind is to just use joblib.Parallel or metaflow's parallel_map to access the artifacts in parallel, but it feels a bit odd. This pattern may also be related to the roadmap effort to open source your all's in-house goodies for dataframes. I use partitioned parquet files in a couple places to split out data, pass references around, and load in parallel – but there's a couple use cases where I'd prefer to stay within the metaflow ecosystem if possible :smiley:

    Curious what your all's thoughts are, and just want to make sure I'm not missing something like a clever usage of s3.get_many.

    5 replies
    Savin
    @savingoyal
    :tada: Metaflow 2.2.6 (the newest release) is now available on pip and conda-forge. Changes include support for AWS Fargate as a compute backend for Metaflow on AWS, support for very wide workflows on AWS Step Functions and more.
    seanv507
    @seanv507
    Hi, is there anyway to specify the memory dynamically for batch jobs? eg if size of data =M in step X, allocate memory 5M in step X+1?
    4 replies
    Ahmad Houri
    @ahmad_hori_twitter
    Hi, is there a way to define step function name on aws to be different from the flow name when creating it?
    I want to do this because I am thinking to create 2 different step functions from the same flow: MY_FLOW_STG and MY_FLOW_PRD and then to update these step functions through a pipeline when user pushes to specific branch
    5 replies
    jrs2
    @jrs2
    Is there a way to specify a Docker image for a flow when running locally? I can see how to do it for Batch and have used that, but only see @conda for local dependency support.
    1 reply
    Mehmet Catalbas
    @baratrion
    What is the best way to profile memory usage step by step within a flow (better yet line-by-line within a step?) memory_profiler @profile decorator does not work well with foreach steps, so is the best approach to copy steps to isolated functions to different scripts and run memory profiling on them?
    12 replies
    seanv507
    @seanv507
    @baratrion you might want to look at https://pythonspeed.com/products/filmemoryprofiler/, the author @itamarst has reached out here ... its focussing on peak memory
    14 replies
    Ahmad Houri
    @ahmad_hori_twitter
    Hi, I have a question regarding the performance of running the batch jobs in AWS, I run a simple flow (HelloWorld) on my machine which contains 3 steps, the execution on my machine took around 23 seconds while on AWS (when I run the same flow --with batch it takes around 8 minutes) most of the time is consumed to bootstrap conda environment for each step before running it!!!
    is there something I missed here or is there any cache technique I should use to improve the flow performance on AWS?
    6 replies
    Greg Hilston
    @GregHilston

    I was in our AWS Batch console and I noticed two jobs that were seeming stuck in RUNNING. The individual who kicked off those jobs says all his terminal sessions have been ended, even to go as far to restart his PC/sever internet connection.

    I figure this is more of an AWS situation I'm debugging but has anyone witnessed flows being stuck in RUNNING?

    I know the jobs will die when the timeout is reached, just want to understand what may have caused this

    4 replies
    joe153
    @joe153
    Hi, I am trying to include a number of json/sql files in the conda package. I have MANIFEST.in file specifying the files and also setup.py has include_package_data=True but I am not seeing them. What am I missing? How do I include them in the conda package?
    6 replies
    Nimar Arora
    @nimar
    Hi, I am just trying to follow along with the tutorial and in 08-autopilot my AWS Batch job fails with "ModuleNoteFoundError: No module named 'pandas'" on stats.py, line 41. Looking at the code I'm not sure how this tutorial is expected to work since there is no @conda decorator to install the pandas library in 02-statistics/stats.py.
    3 replies
    Nimar Arora
    @nimar

    The tutorial 4 seems to be failing attempting to create a conda environment. The funny thing is that if I run that command directly it seems to succeed. Not sure how to get the conda errors:

    python 04-playlist-plus/playlist.py --environment=conda runMetaflow 2.2.6 executing PlayListFlow for user:...
    Validating your flow...
        The graph looks good!
    Running pylint...
        Pylint is happy!
    Bootstrapping conda environment...(this could take a few minutes)
        Conda ran into an error while setting up environment.:
        Step: start, Error: command '['/opt/miniconda/condabin/conda', 'create', '--yes', '--no-default-packages', '--name', 'metaflow_PlayListFlow_linux-64_c08336d0946efed6e92f165475dfc0d181f64361', '--quiet', b'python==3.8.5', b'click==7.1.2', b'requests==2.24.0', b'boto3==1.17.0', b'coverage==5.4', b'pandas==0.24.2']' returned error (-9): b''

    Note that the following command succeeds:

    /opt/miniconda/condabin/conda create --yes --no-default-packages --name metaflow_PlayListFlow_linux-64_c08336d0946efed6e92f165475dfc0d181f64361 --quiet python==3.8.5 click==7.1.2 requests==2.24.0 boto3==1.17.0 coverage==5.4 pandas==0.24.2

    Note: that I had to make a few minor changes to the demo to refer to Python 3.8.5 and to add a dependency to more recent versions of boto3 and coverage than what metaflow was requesting otherwise the generated conda create command would fail even on the command line.

    1 reply
    joe153
    @joe153
    Hi, I am having a problem using Metaflow 2.2.6 and Fargate as the compute environment when foreach is used. What works fine with EC2 doesn't work with Fargate. Here is the error message: ... File "/metaflow/metaflow/plugins/aws/step_functions/step_functions_decorator.py", line 54, in task_finished self._save_foreach_cardinality(os.environ['AWS_BATCH_JOB_ID'], ... requests.exceptions.ConnectionError: HTTPConnectionPool(host='169.254.169.254', port=80): Max retries exceeded with url: /latest/meta-data/placement/availability-zone/ (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f7c0f12e3d0>: Failed to establish a new connection: [Errno 22] Invalid argument'))
    7 replies
    Ritesh Agrawal
    @ragrawal
    hi, I am trying to leverage metaflow to train and deploy models on sagemaker. I am able to train the model but not able to find relevant documentation on how to deploy models. Ideally I would like to create a docker container with proper environment and all the supporting files and then deploy the docker container either on sagemaker or on our kubernetes cluster. What I am missing is once the pipeline is execute successfully, how can I get the environment, supporting files, model files
    13 replies
    Anirudh Kaushik
    @anirudh-k
    Hi! What's the best way to handle a potentially empty list for a foreach step?
    7 replies
    Ritesh Agrawal
    @ragrawal
    where to specify import statements. Assuming I have a train step that requires sklearn and I am using @conda to install the package. Should I define import sklearn statement inside the step or it can be outside the class definition
    6 replies
    Ritesh Agrawal
    @ragrawal
    I am getting access denied to following s3 folder: "s3://.../metaflow/conda" as it doesn't exists. Is there anything I need to do in order to create this S3 Key ?
    4 replies
    Ritesh Agrawal
    @ragrawal
    why code has all the metaflow examples in it
    4 replies
    Kyle Smith
    @smith-kyle
    If a step both creates a bunch of tasks with a foreach and branches to another step, will all the tasks created by this step execute in parallel?
    9 replies
    dewiris
    @kavyashankar93

    Hi, I am working on getting the metaflow artifacts from S3. The code is deployed on AWS lambda I set the environment variable “METAFLOW_DATASTORE_SYSROOT_S3” to the s3 location. Our use case requires us to change the datastore environ variable in every iteration so that different flows and runs’ artifacts can be accessed as follows:

    def _queryMetaflow(self, appName, starflowResp):
    metaflow_run_id = starflowResp["details"]["frdm"]["metaflowRunNumber"]
    metaflow_name = starflowResp["details"]["frdm"]["metaflowId"]

        os.environ['METAFLOW_DATASTORE_SYSROOT_S3'] = "{}/artifacts/{}/higher".format(getMetadataLocation(), appName)
    
        from metaflow import Metaflow, get_metadata, metadata, namespace, Run, get_namespace, Flow
    
        metadata1 = metadata(getMetadataURL())
        namespace(None)
        mf = Metaflow()
    
        # call metaflow and get results and send success or error
        try:
            metaflowResp = Run(metaflow_name + '/' + metaflow_run_id).data
            print(metaflowResp)
            del Metaflow, get_metadata, metadata, namespace, Run, get_namespace, Flow
            return metaflowResp
        except Exception as e:
            print("Exception occured in query metaflow: {}".format(e))
            raise CapAppFailure("Exception occured in metaflow response, S3 datastore operation _get_s3_object failed likely")

    When this method is called the first time, it doesn’t fail in the first iteration but fails in the second iteration. I inspected the environ variable and the location is the correct in every iteration but this error is encountered in the second iteration:
    S3 datastore operation _get_s3_object failed (An error occurred (404) when calling the HeadObject operation: Not Found). Retrying 7 more times..

    I am unable to fix this issue. Can you please help?

    4 replies
    Kyle Smith
    @smith-kyle

    Hello Netflix employees, can someone please share about Metaflow's adoption at Netflix? In late 2018 it was used in 134 projects, how has it grown since then? What percentage of Netflix data scientists use metaflow?

    We're considering Metaflow at my organization, so I'd just like to get a sense of the adoption rate we can hope for at my employer.

    7 replies
    Matt McClean
    @mattmcclean
    Hi there. Am new to Metaflow and trying to run the tutorial Episode 8 Autopilot but getting the following error message in the AWS Batch job ModuleNotFoundError: No module named 'pandas' when the step function is triggered. I tried running with the commands python 02-statistics/stats.py --environment=conda step-functions create --max-workers 4 --with conda:python=3.7,libraries="{pandas:0.24.2}" as well as python 02-statistics/stats.py step-functions create --max-workers 4 and both give the same error message.
    However if I run the command python 02-statistics/stats.py --environment conda run --with batch --max-workers 4 --with conda:python=3.7,libraries="{pandas:0.24.2}" it works fine.
    3 replies
    Matt McClean
    @mattmcclean
    How can I switch my local machine to run Metaflow on AWS? I have already run the CloudFormation template to setup the Stack and can run metaflow commands from the SageMaker notebook instance. However when I run on my local machine metaflow configure aws --profile dev and then metaflow configure show it still says Configuration is set to run locally.
    4 replies
    Kelly Davis
    @kldavis4

    I am attempting to run step-functions create and getting the following error: AWS Step Functions error: ClientError("An error occurred (AccessDeniedException) when calling the CreateStateMachine operation: 'arn:aws:iam::REDACTED:role/metaflow-step_functions_role' is not authorized to create managed-rule.")

    I am specifying METAFLOW_SFN_IAM_ROLE=arn:aws:iam::REDACTED:role/metaflow-step_functions_role in my metaflow config.

    The role is being created via terraform, but is based on https://github.com/Netflix/metaflow-tools/blob/master/aws/cloudformation/metaflow-cfn-template.yml#L839. That role does not have a grant for states:CreateStateMachine but even if I add that, I still get the same error.

    Any tips for troubleshooting this?

    2 replies
    Corrie Bartelheimer
    @corriebar
    Hey,
    I created a step function flow using python flow.py --with retry step-functions create --max-workers 1000 but when triggering the flow it only runs maximum 40 tasks in parallel. When running the flow without step functions on batch it worked fine. Any ideas what could be the reason for this throttling?
    5 replies
    Taleb Zeghmi
    @talebzeghmi
    Has anybody thought of how Metaflow datastore and CCPA data compliance? For example, the ability to remove customer data at the customer’s behest unless the data expires or does not exist after 28 days?
    13 replies
    mkjacks5
    @mkjacks5
    What is the correct way to set a specific namespace before doing a local run? We have several people running metaflow locally on Sagemaker instances, which defaults to the username 'ec2-user'. Using namespace('user:[correc username]') does not change the namespace used for the actual local run, seems to just affects namespace used for inspecting results. Thanks
    2 replies
    Ahmad Houri
    @ahmad_hori_twitter
    ur env there then batch just downloads & run the ima
    behjatHume
    @behjatHume
    Hey All! I was recently introduced to Metaflow and I have few questions. If anyone can help me through? Does metaflow provides data labelling?, explainability feature?, team collaborations and if it is open source?
    3 replies
    derek
    @yukiegosapporo_twitter

    Hey,

    I am passing my image to python yo.py step-functions create --with batch:image=bla. Are there any ways to pass runtime variables to that image? thanks in advance!

    6 replies
    Greg Hilston
    @GregHilston

    I'm experiencing some problems when trying to install pytorch with CUDA enabled.

    I'm running my flow on AWS Batch, powered by a p3.2xlarge machine and using the image

    763104351884.dkr.ecr.us-east-1.amazonaws.com/pytorch-training:1.6.0-gpu-py36-cu110-ubuntu18.04

    to get the NVIDIA driver installed.

    The relevant flow code looks like:

    @conda_base(python="3.8")
    class FooFlow(FlowSpec):
        ...
        @batch(image=URL ABOVE)
        # this line below is of most interest
        @conda(libraries={"pytorch": "1.6.0", "cudatoolkit": "11.0.221"})
        @resources(memory=4*1024, cpu=2, gpu=1)
        @step
        def test_gpu(self):
            import os
    
           print(os.popen("nvidia-smi).read())
           print(os.popen("nvcc --version).read())
    
           import torch

    I'm not convinced this is precisely a Metaflow issue, but the common solutions one finds when Googling involves installing Pytorch using the conda CLI, which obviously the @conda decorartor extrapolates away from us.

    I've been running many flows, of different versions of pytorch and cudatoolkit.

    Torch not compiled with CUDA enabled

    I'm familiar with the Github Issue: Netflix/metaflow#250

    Any advise at all?

    19 replies
    Taleb Zeghmi
    @talebzeghmi

    We’re working on creating a @notify() decorator that could send a notification upon success or failure, per Flow or per Step. It could send email or slack messages.

    It would be up to the scheduler (local, AWS Step Functions, KFP) to honor the @notify decorator.

    @notify(email_address=“oncall@foo.com", on="failure")
    @notify(email_address=“ai@foo.com", on="success")
    class MyFlow(Flow):
       @notify(slack_channel=“#foo", on="success")
       @step
       def my_step(self):

    To implement this I’d like to introduce a new Metaflow concept, a @finally step.

    class MyFlow(Flow):
       @finally
       def finally_step(self, status):
          status  # we need a way to message Success or Failure
    7 replies
    Hamid
    @Hamid75224834_twitter
    Hi, I have recently been working with Metaflow, and am not able to access the previous flows of other members of group using namespace. Just want to make sure I am not missing anything regarding namespace, any help is appreciated. Thanks
    1 reply
    derek
    @yukiegosapporo_twitter

    Hi!

    How can I pass @Parametersdifferent than default to step-functions create?
    I know step-functions trigger can take any @Parameters in a pipeline python file but this is valid only for this run.
    What I wanna do is to pass @Parameters to cron schedule in AWS EventBridge dynamically.

    9 replies
    grizzledmysticism
    @grizzledmysticism
    Just wanted to say - fantastic work on this. Can't wait for the addition of some of the new features, particularly the graph composition and inclusion of external modules (symlink).
    15 replies
    Ayotomiwa Salau
    @AyonzOnTop
    Hello guys, I am pretty new to the Metaflow community. How do I start contributing?
    8 replies
    Daniel Perez
    @sandman21dan

    Hey guys, been using metaflow for a bit over a year now, and I've recently started to ingrate our deployment with AWS Batch for the scale-out pattern. I'm now able to execute flows with some steps that run in Batch, however I don't see the ECS cluster ever scaling back down

    To ellaborate, my compute environment has the following settings, min vcpus = 0, desired vcpus = 0, max vpcus = 32

    When I run a flow, a job definition gets added into the job queue, an instance gets started in the cluster, the task runs and finishes fine, but the job definition stays as "Active" the instance seems to stay up indefinitely inside the cluster until I go and manually "deregister" the job definition

    Is this the way it's designed? or am I missing something in the way I configured my Compute environment?

    Is metaflow supposed to update the job definition after a flow finishes?

    5 replies