Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
    Kamil Bobrowski
    @kbobrowski

    Hi, question about isolation of steps, I noticed that these two steps will be executed in the same conda environment:

    from metaflow import FlowSpec, step, conda
    
    class IsolationTest(FlowSpec):
    
        @conda(python="3.8.5")
        @step
        def start(self):
            import sys
            print(f"start executable: {sys.executable}")
            self.next(self.end)
    
        @conda(python="3.8.5")
        @step
        def end(self):
            import sys
            print(f"end executable: {sys.executable}")
    
    
    if __name__ == "__main__":
        IsolationTest()

    They will run in separated environments only if the python version is different. Is there a way to ensure that separate environments are created? (context: I need to install packages from pip, which results in heavy installing / rolling back packages every time flow is executed)

    5 replies
    Kelly Davis
    @kldavis4
    Question about step function IAM permissions. We have a step that is doing a next() with a foreach param, and when it runs as a step function, we get an error that the aws batch execution role doesn't have permission to call PutItem to the step functions dynamo table. This is in the step_functions_decoratory.py in task_finished(), so it makes sense to me that the batch execution role would need the dynamo permissions, but when I look at the cloudformation template in metaflow-tools, it doesn't seem to be granting those permissions. Can someone confirm that the batch execution role does need permissions to the step function dynamodb table?
    2 replies
    daavidstein
    @daavidstein

    Re (2) of issue #149:

    We are considering porting our data processing/ML pipelines to metaflow, but one thing that is holding us back is the lack of support for integration testing. For instance, we currently use kedro for our data pipelines. Kedro provides the ability to call and run a kedro pipeline from another python script (directly, without using subprocess.run) and additionally provides the ability to override the default datasets used in that pipeline at runtime. This is important, because some of our datasets are very large, so we naturally want to use subsetted versions of these datasets. In fact, some of the datasets we inject into the pipeline for the integration test are generated with Hypothesis which ensures that our pipelines are robust to unanticipated variations in the data.

    Furthermore, although it's not ideal for an integration test, there are some expensive functions, or functions that rely on a network connection, that we want to patch using unnitest.mock.patch . This doesn't seem to be possible when running a metaflow pipeline with subprocess.run .

    One solution could be to just to have a test flow inherit from the flow to be tested, and override the artifacts that way, ie:

    class TestPlaylistFlow(PlayListFlow):
        movie_data = test_data

    But as far as I understand it, the child flow will not inherit the step functions, which would force us to import them and manually specify them in the test flow.

    The only other solution I can think of at present is to define a boolean parameter test in the original flow and based on the value of that parameter assign different artifacts to the instance variables as necessary. Is there another option that can be implemented with the current version of metaflow==2.2.9?

    15 replies
    russellbrooks
    @russellbrooks

    hey guys, had a teammate run into what I believe is a bug in how Parameters are used in Step Functions deployments. If there is a Parameter that defaults to None like

    Parameter(name="test_param", type=int, default=None)

    it'll result in the following error when deploying with step-functions create.

    Flow failed:
        The value of parameter test_param is ambiguous. It does not have a default and it is not required.

    The same flow/parameter will run successfully locally or when submitted to batch without SFNs.

    1 reply
    pranaygp
    @pranaygp:beeperhq.com
    [m]
    is there a good way to "tag" runs?
    the auto increment number is fine, but I'd like to use a name ideally when kicking off jobs from the command line so it's easier to keep track of results from multiple runs and compare them easily
    2 replies
    vinod-rachala
    @vinod-rachala
    I have using metaflow on the ECR and while i am executing the code through the step function i am getting this error please let me if there any solutions. Error:Metaflow 2.2.8 executing preprocessflow unknown user:Metaflow could not determine your username based on environment variables ($USERNAME etc.)
    1 reply
    Ayotomiwa Salau
    @AyonzOnTop
    Hello, I noticed any time I join two dag steps in a class, it loses its self.<attributes>. I can no longer call the attributes in the dag steps after the join.
    3 replies
    Kamil Bobrowski
    @kbobrowski
    Hi, the way metaflow executes locally (creates unique conda environments only for a set of unique @conda(...) decorators) is making it quite difficult when relying on pip-installed packages - if the step which requires pip package is executed in parallel within foreach then pip will fail as multiple threads will be trying to install packages into the same environment. I'm thinking about possible solutions - maybe a switch to force creation of unique conda environments for each step / option to run each step in separate docker container / proper support for pip through @pip. What do you think? I'd be happy to contribute
    7 replies
    Ayotomiwa Salau
    @AyonzOnTop
    I tried running a logistic regression on one branch of my dag and randomforest regression on another branch. The randomforest completed its task quickly while the logistic regression kept on running on and on, almost 3 hours. Data is about 500k rows and 1024 cols. Not so much. Why is it taking long?
    1 reply
    Richard Decal
    @crypdick
    @savingoyal @tuulos Hopefully this clarifies (sorry, couldn't attach to our thread) metaflow_testing.txt
    Kelly Davis
    @kldavis4
    Is there any way to "checkpoint" mid-step and allow resuming a run from that checkpoint?
    The use case is we have two activities where the second depends on the first, but where it doesn't necessarily make sense to split them into separate steps and the first activity takes a while to complete. During development, say we are modifying the second activity, then every time we run another test of our changes, we have to execute that first long activity again. In order to speed things up during dev, we might split up into separate steps to allow resuming at the right place, but then we might combine them into a single step for higher efficiency in production
    3 replies
    David Vega Fontelos
    @repocho
    Hello !! I'm using a metaflow metadata service + a on premise object storage (S3) for the data store. But I saw that the Object Storage is using more than 2TB, storing very old flow.
    Is there a procedure to cleaning old flows from the data store ??
    Thanks !!
    3 replies
    David Patschke
    @dpatschke

    @tuulos @savingoyal I'm seeing some strange behavior when running Flows via AWS Batch that are not consistently reproducible. The error I'm getting says that:

    Task is starting.
    list index out of range
    Task failed.

    There is no hash present within hard brackets via the Metaflow logging so something is telling me this might be an undetected AWS Batch issue ( but I have no idea). It's like the Task never even got a chance to start and hard-failed from there.
    What really stinks about this error is that the Flow is run with --with retry but a retry is not even attempted. This is the 2nd time this error has presented itself to me within a week. Both times I have been able to re-run the Flow immediately afterwards, and it completes successfully. FWIW, this last time, it happened in the middle of a foreach fan-out and all other fanned out processes kept running. The Flow failed at the end step because, I'm guessing all known tasks didn't complete successfully and a check is made (good work on your end for having that).
    Are either of you aware of what may be causing this error and/or have any potential suggestions?
    Thanks!

    9 replies
    daavidstein
    @daavidstein
    We want to use a separate namespace for executing flows in a test environment. In particular we want runs executed in the test namespace to not be visible to user namespaces, or at least be clearly differentiated and easily filtered out. Currently if It seems that the current way that namespaces are implemented, if we run a flow with python my_flow.py run --namespace test --tag test, the last_run of the flow in the user namespace is updated with the run in the test namespace. Furthermore, It doesn't seem to be possible to run flows as a different user using the --namespace flag:
    python my_flow.py run --namespace user:test
    ...
    2021-04-29 18:18:04.147 [1619709482612481/end/5 (pid 67190)] Task finished successfully.
    
    namespace("user:daavid")
    Metaflow().flows[0].latest_run
    
    >> Run('PlayListFlow/1619709482612481')
    
    Metaflow().flows[0].latest_run.tags
    
    >> frozenset({'date:2021-04-29',
               'metaflow_version:2.2.10',
               'python_version:3.8.5',
               'runtime:dev',
               'user:daavid'})
    5 replies
    ailishbyrne
    @ailishbyrne
    hi all. we've implemented a custom step decorator to instrument our tasks and are super happy with the results. thank you! the one thing i have been unsuccessful in doing is accessing the input to foreach step in the task_pre_step. if i even try to do anything with the input on the flow parameter to that method, whether calling flow.input or using getattr(flow, 'input'), the input is not only unavailable there but also no longer available when the step is executing. we are using a contextual json logger, and i am hoping to add that context to the logger centrally, rather than requiring the context be added in the foreach steps themselves. here is an example log statement: {"text": "task succeeded in 0.2 seconds", "log_level": "INFO", "filename": "decorators.py", "lineno": "46", "method_name": "task_post_step", "flow_name": "TestFlow", "run_id": "1451", "step_name": "forme", "task_id": "11051", "task_input": "1"}
    20 replies
    Kyle Smith
    @smith-kyle

    Hello, I've recently installed metaflow on AWS using a manual deployment. I'm running the helloaws.py flow and it's stuck in the RUNNING state. Looking at the log steam I see the code is downloaded but again it's just stuck at Task is starting.

    Can someone please help me diagnose the problem?

    4 replies
    Ayotomiwa Salau
    @AyonzOnTop
    Hello guys, I just published a blog on building data science projects with Metaflow using MNIST dataset as a use case. Check it out, feedbacks and acknowledgements are welcomed.
    https://ayotomiwasalau.medium.com/starting-your-data-science-project-with-metaflow-the-mnist-use-case-44e3b3ad6ec3
    5 replies
    Mike Bentley Mills
    @mikejmills
    Hi I'm running metaflow but NOT using AWS batch (yet). I've found that the run.code & task.code are all None. Is there a way to tell Metaflow to always save the code?
    2 replies
    Christopher Wong
    @christopher-wong
    I’m probably missing something obviuos in the docs, but how do you delete a scheduled metaflow step function without manually deleting all the compoments from the console?
    3 replies
    ailishbyrne
    @ailishbyrne
    Screen Shot 2021-05-05 at 7.21.31 PM.png
    Ahmad Houri
    @ahmad_hori_twitter
    I have a flow which is scheduled to run everyday many times but each time is for different client, how can I use tags to tag each flow triggered using step function with specific tag?
    1 reply
    Kelly Davis
    @kldavis4
    HI all, we just published an article on our usage of Metaflow on CNN's Digital Intelligence team (w/ some highlight on the terraform PR we've been working on): https://medium.com/cnn-digital/accelerating-ml-within-cnn-983f6b7bd2eb
    7 replies
    baothienpp
    @baothienpp
    Hi everyone, is there anyway to assign namespace inside the flow code and not from cli ?
    17 replies
    Samuel Than
    @samuelthan
    ML Platform Architecture.png
    17 replies

    Hi all, taking some ideas from the community here. Would like you all to “roast” my potential ML platform Arch.

    The purpose is to provide a centralized place to handle the ML Training piplines. lThis is specific to using Metaflow

    1.Users executes their Metaflow “jobs”
    2.Metaflow runs the training pipeline jobs
    3.Outputs a model packaged as a docker image, stored in AWS ECR
    4.Gets duplicate/pulled into users of different teams/business accounts.

    Anyone had this approach before ? or i’m doing it wrong

    Kha
    @nlhkh
    Hi guys. I have some ML flows that have steps in both python and R (data fetching in Python, model training in R). This is due to previous development. I read that Metaflow supports both Python and R. I would like to know if I can mix a python step and an R step within a flow.
    2 replies
    Brian Lambert
    @brlambert7818
    Hey everyone, just started trying out Metaflow and AWS and have a question about AWS charges. So I ran the Metaflow AWS tutorial successfully and assumed I would stop being charged for AWS use after the run terminated, however when I checked AWS billing a day later I had been charged for full-day use because many of the CloudFormation services were still active, especially the SageMaker and EC2 instances. Is this normal behavior or am I missing how to stop these instances when a job isn't running?
    13 replies
    pieceskieran
    @pieceskieran
    Screen Shot 2021-05-19 at 9.34.33 AM.png
    5 replies

    Hi guys, having trouble with conda dependencies when trying to run a flow on AWS batch. I'm working on an M1 Macbook with Anaconda. My workflow was to develop as usual in a local conda env, then port my main.py script to a Metaflow graph with conda_base decorator, listing the package versions as installed in the local env:

    @conda_base(libraries={'numpy':'1.19.5',
    "pyyaml":"5.4.1" ,
    "torchvision":"0.9.0",
    "requests": "2.25.1",
    "pytorch": "1.8.0"
    },
    python='3.7')

    This Metaflow script ran perfectly locally, successfully bootstrapping the conda environment and running the flow. When running with "--with batch" the conda build failed and printed a ton of package clashes (see above). I havent been able to resolve these but i can successfully run on batch with a simple python 3.7 + numpy conda env. Is there something up with our AWS setup that is causing these packages which evidently are compatible, to clash on batch?

    Matthew Beckers
    @mattlbeck

    Hi all, we are investigating Metaflow coming from a DVC background. What we have seen so far with Metaflow is great, in particular the seamless deployment to AWS is a huge benefit over DVC. However, I am missing some features that DVC had in relation to transparency of runs.

    For instance, is there a way to retrieve all parameter artifacts from a particular run for easy comparison of parameters between runs? I see that parameters are stored as artifacts, but programmatically they are mixed in with all other artifacts.

    9 replies
    Dr. Hanan Shteingart
    @chanansh
    Hi, Does MF supports hierarchy (e.g. step of steps)? e.g. one step could be Feature Extraction which within has several steps
    5 replies
    Dr. Hanan Shteingart
    @chanansh
    pycharm does like metaflow style of self.param =5 dynamic attribute assignment. It shouts:
    "Instance attribute param defined outside __init__ "
    3 replies
    Dr. Hanan Shteingart
    @chanansh
    This message was deleted
    1 reply
    Matthew Beckers
    @mattlbeck
    I am trying to run a flow on batch that has an input, using IncludeFile that is a path to another S3 bucket in the same account. The flow is failing due to an "Access Denied" error for that S3 resource. Seems like the issue is a lack of permissions somewhere to read data from other buckets but I am unsure where?
    17 replies
    Dr. Hanan Shteingart
    @chanansh
    I found the cool resume action for a flow. Is it sensitive to code edits or just Parameter change?
    5 replies
    Samuel Than
    @samuelthan

    Hi all, am going through the tutorial 4 of metaflow https://github.com/Netflix/metaflow/blob/master/metaflow/tutorials/04-playlist-plus/playlist.py, when running it with —environment conda —with batch
    i’m getting the following in the aws batch logs

    bash: metaflow_PlayListFlow_linux-64_a55a81bc46d543afe1d885c81703deeaae95efef/bin/python: No such file or directory

    this is running the default python:3.8 image.

    any thoughts what i should be doing to fix this ?

    24 replies
    Dr. Hanan Shteingart
    @chanansh
    metaflow status is path dependent. How can I get the current “db” location and move it from one location to another. How can I make metaflow path independent?
    2 replies
    How can I write a flow where each step is a flow? According to Netflix/metaflow#116 one can use subprocess to call other flows from command line but this it is not clear to me how to get grab the output of the run flow. Should I use the metaflow.Flow().get_latest_successful_run ?
    3 replies
    OwenZhu
    @OwenZhu
    Hi a newbie here. When I run tutorial 02, the latest run seems successful 2021-05-26 19:26:19.521 [115/end/534 (pid 44499)] Task finished successfully. But when I run jupyter code, I got this error message
    10 replies
    image.png
    pranaygp
    @pranaygp:beeper.com
    [m]

    Do you guys trigger metaflow runs in production from API gateway. I'd like to effectively use my metaflow flow to back an inference endpoint, but that might be stupid. I'm not sure. What's the right way to go about this?

    I have a metaflow flow for inference (it's a multi-model setup and a complicated inference pipeline which is why this works best). I normally trigger it from the CLI, but now I want to ship it to production

    1 reply
    Savin
    @savingoyal
    :sparkles: Metaflow 2.3.0 released :tada: https://docs.metaflow.org/introduction/release-notes#2-3-0-may-27th-2021 :sparkles:
    This release introduces a new decorator @project for managing/coordinating larger Metaflow projects besides other exciting features. Now, Metaflow users can iterate on & deploy experimental versions of their workflows alongside production deployments in a safe & isolated manner.
    russellbrooks
    @russellbrooks
    :tada: been excited for the @project decorator – great stuff!
    Kyle Smith
    @smith-kyle
    Hi there, question about metaflow.S3, does s3.get and s3.put ever write the data to disk? That is, does it ever need to write the data to disk during the upload/download?
    5 replies
    Richard Decal
    @crypdick
    Is there a way to customize the CloudFormation template for a different region? Everything is setup to us-east-1 and all the rest of our infrastructure is in ap-southeast-2. The cross-region S3 transfer costs would kill us.
    2 replies
    Matthew Beckers
    @mattlbeck

    Is there any way to specify common commands to be run before every step in a flow? Perhaps a sort of pre-step function?

    My use case is trying to integrate mlflow: I need to make calls such as mlflow.start_run(run_id=...) at the start of every step to ensure all steps log to the same mlflow run.

    Additionally, any other tips surrounding use of metaflow and mlflow together would be very helpful.

    4 replies
    Felipe Adachi
    @FelipeAdachi

    Hi!
    Is there some reference to using minio as a datastore?
    I tried setting the variables like:

    METAFLOW_S3_ENDPOINT_URL=http://adm:pswd@xx.x.x.x:PORT METAFLOW_DATATOOLS_SYSROOT_S3=/minio/metaflow-dir/data METAFLOW_DATASTORE_SYSROOT_S3=/minio/metaflow-dir python mf.py run

    But I get "invalid bucket name" error.

    6 replies
    seanv507
    @seanv507
    Hi, we are looking to cost batch runs distinctly, and the simplest way to do this seems to be to create/destroy an environment for each run (and use EC2tags). Is it possible to specify the job queue when triggering the flow? eg
    METAFLOW_JOB_QUEUE_NAME python parameter_flow.py step-functions trigger --alpha 0.5
    5 replies
    Ville Tuulos
    @tuulos

    📣 hey all - @savingoyal, @oavdeev and I founded a startup, Outerbounds, to help the wider Metaflow community outside Netflix. We have a new Slack instance at http://slack.outerbounds.co that has a good number of companies and individuals using Metaflow. You are welcome to join it! 👋

    We will keep monitoring this Gitter channel too but many people seem to prefer the ergonomics of Slack over Gitter.

    Savin
    @savingoyal
    This message was deleted