Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
    Peter Wilton
    @ammodramus

    Hi all, I'm working on converting a legacy pipeline to Metaflow and was wondering whether there is any way to do something like the following.

    @step
    def map_step(self):
        self.vars = ['a', 'b']
        self.next(self.do_compute, foreach='vars')
    
    @step
    def do_compute(self):
        self.var = self.input
        self.artifact1 = do_something(self.var)
        self.artifact2 = do_something_else(self.var)
        self.artifact3 = do_something_else_yet(self.var)
        self.next(self.join_step)
    
    @step
    def join_step(self, inputs):
        self.artifact_dict = dict()
        for inp in inputs:
            self.artifact_dict[inp.var] = inp

    I was hoping that this would give me programmatic, lazily-loading access to the artifacts computed in do_compute for each value of var (a la self.artifact_dict['a'].artifact1), but of course I am getting this error message:

    Flows can't be serialized. Maybe you tried to assign self or one of the inputs to an attribute? Instead of serializing the whole flow, you should choose specific attributes, e.g. input.some_var, to be stored.

    Is there a recommended way to achieve this programmatic, lazy access? I see a workaround programmatically defining names and calling setattr and getattr, and searching through this gitter's history, this approach seems to have been recommended before. Is that still the recommended approach? Thanks!

    21 replies
    Andrew Achkar
    @hajapy
    Hello, I’m trying to determine what is the best option for having ci testing of my metaflow workflows. Are there examples I could follow on how to use pytest or the unittest framework to execute the flow (locally) and ensure outputs/artifacts are as expected?
    9 replies
    ayorgo
    @ayorgo

    Hey Metaflow,
    I'm having an issue with running one of my flows on AWS Batch. The issue is as follows

    mkdir: cannot create directory ‘metaflow’: Permission denied
    /bin/sh: 1: [: -le: unexpected operator
    tar: job.tar: Cannot open: No such file or directory
    tar: Error is not recoverable: exiting now

    I run it on a pre-built image hosted on ECR. The Dockerfile contains a WORKDIR command which points at /home/my_proj/code. I can successfully build my image locally, bash into it and mkdir metaflow (under the default /home/my_proj/code/ directory) without an issue.
    What I suspect might be happening is that the WORKDIR statement is somehow ignored and the Metaflow command ["/bin/sh","-c","set -e ... is run from within /.
    It's worth noting that I have several flows running on AWS Batch already with no problem at all. Their Dockerfiles are almost identical to the one that is having the problem.

    Not really sure if it's a Metaflow issue but hoping for somebody to have seen this already.
    Thank you.

    7 replies
    ayorgo
    @ayorgo
    Hello again, Metaflow,
    Is there a way to run flows on AWS Batch in a detached mode so I can start my run, close the lid of my laptop and go home without interrupting the execution to check its status later from the AWS Management Console? Or does it need access to my local disk to be able to package up the next step for execution?
    3 replies
    Youliang Yu
    @elv-youliangyu

    Hey Metaflow,
    I am new to Metaflow, and trying to update a parameter in a step, but get "AttributeError: can't set attribute".
    Here is the snippet:

    class TestFlow(FlowSpec):
        param = Parameter(
            'param',
            type=str,
            help='test parameter',
            default='OD'
        )
        @step
        def start(self):
            self.param = self.param.lower()
            ...
        ....

    This seems a common use cases. Probably there's some mistake in my usage. What am I missing?

    4 replies
    David Patschke
    @dpatschke
    @savingoyal @tuulos I want to throw something a little wild out here. I see in a couple of messages that you mention a lot of development of Flows at Netflix happen within notebooks. Have you all ever explored creating a jupyterlab-extension for Metaflow in the same vein as something like Kubeflow Kale?
    7 replies
    Denis Maciel
    @denismaciel

    Hi there,

    I have a Flow that is supposed to run completely on batch. I'd like to run one of the steps in a different Docker container than the default one. Is it possible to run something like this:

    # pipeline.py
    from metaflow import FlowSpec, step, batch
    
    class TestFlow(FlowSpec):
    
        @batch(image='python:3.6.12-buster')
        @step
        def start(self):
            import sys
            print(sys.version)
            self.next(self.end)
    
        @step
        def end(self):
            import sys
            print(sys.version)
    
    if __name__ == "__main__":
        TestFlow()

    with the following command: python pipeline.py run --with batch

    Here the end step should run with the default image and the start step with python:3.6.12-buster.

    3 replies
    Andrew Achkar
    @hajapy
    Hi, I have a use case I’m not sure how to handle best with metaflow. I have a flow that is largely identical but that the team iterates one of the steps quite frequently. This can either be in the form of a new conda package or a new docker container. What would be the recommended way to allow this step to take different versions of the package/image for that step, without having to modify the flow code (so perhaps by providing a cli arg or env var when triggering the flow)?
    6 replies
    David Patschke
    @dpatschke
    @tuulos Would you be able to share what is happening from a memory perspective when artifacts are being preserved at the end of a task? I have a pretty large local machine (128GB) and I'm running multiple workers in parallel using 'foreach'. I'm seeing what looks to be an effective rapid doubling in memory right before the parallel tasks complete ... nearly depleting the available memory on my machine. Actually, I've had to significantly throttle down 'max-workers' due to this phenomenon (down to 4). Then, when the task(s) complete, the memory quickly goes back down. In contrast, if I create the same 4 artifacts within a jupyter notebook I see somewhere between 40-50 GB of memory being consumed. I looked at the Github issues but couldn't find any that exactly discussed what I'm seeing. Any suggestions?
    5 replies
    Greg Hilston
    @GregHilston

    Hey Metaflow, I stumbled upon a situation that I was hoping you guys could comment on. I'm trying out Metaflow in a code base that already exists, which provides numerous utility files and functionality.

    I was read ingthrough this page on managing external libraries (https://docs.metaflow.org/metaflow/dependencies) and was wondering if there is any other way to let a Metaflow flow import these utility files without having to copy them from all over the repository into the same folder the flow is defined in.

    I'm aware of symbolic links and file system approaches but was wondering if there was any other Metaflow approach for a scenario like this

    3 replies
    Ji Xu
    @xujiboy

    Hi, I recently upgraded metaflow from 2.0.1 to 2.2.3, and when I execute a parameterized flow I got the following error which I haven't seen before (truncated to the last few lines):

    ...
    File "/home/ji.xu/.conda/envs/logan_env/lib/python3.7/site-packages/metaflow/includefile.py", line 229, in convert
        param_ctx = context_proto._replace(parameter_name=self.parameter_name)
    AttributeError: 'NoneType' object has no attribute '_replace'

    Any suggestions?

    8 replies
    Andrew Achkar
    @hajapy
    Hi, I've got another question from some metaflow trials. Say I have a conda step and want to run in batch, but my local machine is a mac and the batch environment is linux. It is conceivable that the batch execution conda environment is not buildable on mac (eg. specifying nvidia::cudatoolkit as a conda library). Why does this step attempt to create the conda environment locally before sending the step to batch and hence fail the flow? Is there any way to bypass this?
    6 replies
    Timothy Do
    @TDo13
    Hi again! I've been running into a particularly interesting issue when trying to use a docker image from our private repository. It looks like metaflow mistakenly identifies a registry for image paths like: foo/bar/baz:latest
    5 replies
    russellbrooks
    @russellbrooks
    Question for you all – is there a way to specify to use "any production namespace" within the client API? For example, when accessing the results of one flow in a subsequent flow, the namespaces work very well for individual users, but in a production setting the namespaces don't align quite as intuitively since each production deployment will get something like production:FlowName-<deployment #>-<hash>. This can be worked around by specifying the whole production token for the namespace, but curious what your all's thoughts are for this usage.
    5 replies
    joe153
    @joe153
    I am starting to get this error: [Errno 62] Too many levels of symbolic links when running a job with AWS batch on my mac. It worked fine before but just started today. Any ideas why? Only for environment=conda. Not obvious at the moment-
    I still have the same python 3.8, same metaflow 2.2.3, same conda 4.8.4, etc.
    14 replies
    David Patschke
    @dpatschke
    Another question for the Netflix Metaflow team as I've searched and haven't found anything related ...
    Is it possible to 'sync' the results/artifacts from a Run that exists within a local metadata store to the same Flow that exists on the remote metadata store on AWS?
    I had a long-running Metaflow training Run which I executed locally and which had a significant amount of data artifacts that were being saved/tracked. I wasn't sure that the results of the Run were going to be exactly what I was desiring, but now the job has finished (several days later) and I've vetted the results, I would like to be able to 'merge' this Run with the Flow metadata that exists remotely.
    Think of this almost as git-like functionality within the Metaflow metastore. I'm pretty sure the git-like behavior doesn't exist but perhaps there is a manual way of accomplishing this which I haven't run across?
    10 replies
    Revaapriyan
    @Revaapriyan
    Hi People. I would like to know that if there a way to limit the number of processor cores used by a metaflow code. In python's multiprocessing library I can specify the number of cores has to be used by the entire parallel processing tasks. But in metaflow I can specify a step or a task's minimum resources and could not find a way to restrict the amount of parallelism or how many times a step can be executed simultaneously.
    2 replies
    David Patschke
    @dpatschke
    Back again for another question for the Netflix Metaflow team. Metaflow does an awesome job of storing artifacts, allowing for tagging, and organizing of ML/DS experiments. On the flip side, I haven't really seen anything mentioned about actually managing the Flows themselves (other than inspection through the Python SDK). I'm visualizing/thinking about something along the lines of MLFlow's Model Registry. Just curious whether anything like this is leveraged internally and, perhaps, being considered for open source release or whether it is up to us mere mortals to come up with a solution ourselves. Would love any information you are able and willing to provide.
    4 replies
    russellbrooks
    @russellbrooks
    Hey guys, just wanted to point out a difference in retry behavior between batch jobs when using the local metaflow scheduler versus step functions. When jobs are retried normally with the metaflow scheduler, there's a 2min delay between attempts. When jobs are retried with step functions, the retries occur immediately. This appears to be from using the RetryStrategy at the batch job level, which doesn't support customization, whereas the step functions API allows for step-level retry logic that also supports delay intervals, exponential backoffs, etc. It may not be feasible to switch this over, but wanted to run it by you all and see if it's something you've considered.
    4 replies
    Greg Hilston
    @GregHilston

    Hey guys, I'm aware of the resources decorator for memory, CPU and GPU requests when running on AWS Batch but was wondering how Metaflow recommends handling the need of more disk space?

    I've read that one can modify the AMI deployed on batch to get a larger than 8GB default volume size.

    Is there a more friendly way to achieve this? I find myself working with datasets that are bigger than 8GB for some experiments but others use much less than 8GB.

    Thanks!

    7 replies
    Andrew Achkar
    @hajapy
    Quick question/comment. We’re trying to package a number of flows into a standard python project structure. That is a top level package, modules and sub-packages. We cannot use absolute imports because when we launch the flow the import fails. So we have to go with imports that work at runtime, but our IDE doesn’t like. Is there a way to make this project structure work? How do you recommend structuring a project with multiple flows that share common utility modules/packages? My other comment is just that our IDE (pycharm) doesn’t recognize the decorators, which I think is due to them being dynamically created. Are there any workarounds for this?
    13 replies
    Greg Hilston
    @GregHilston

    Is there any documentation, procedure or scripts for transferring one Metdata service to another?

    Imagine a user stood up all the infra in region A on AWS and wanted to move to region B without data loss.

    I can write the S3 and postgres transfer scripts myself but was hoping to not re-invent the wheel.

    Thanks!

    Alireza Keshavarzi
    @isohrab

    Hi guys, I created a preprocessing Flow with some (default) parameters. It works well in my local machine and I able to run steps on the AWS. Now I want to integrate AWS Step functions to schedule my preprocessing and I created step function in the AWS with step-functions create command but when I execute manually from AWS console, I will get '$.Parameters' Error in AWS. Here is the whole message:

    "An error occurred while executing the state 'start' (entered at the event id #2). The JSONPath '$.Parameters' specified for the field 'Value.$' could not be found in the input '{\n    \"Comment\": \"Insert your JSON here\"\n}'"

    When I checked the state machine generated in AWS, I see that there is item in the Environment section as follow:

    {
          "Name": "METAFLOW_PARAMETERS",
          "Value.$": "$.Parameters"
    }

    I checked following AWS resource : https://docs.aws.amazon.com/step-functions/latest/dg/input-output-inputpath-params.html
    But I couldn't solve my problem. I believe I don't need to provide any parameters because I provided a default value for all parameters.
    Do you have any idea? I appreciate your help.

    3 replies
    Greg Hilston
    @GregHilston

    Hey Metaflow, has anyone been able to create a single Batch Job Queue and Compute Environment that handles both CPU and GPU jobs, say with a p3.2xlarge?

    I ask as I've seen others suggest online using two separate Job Queues, one for CPU and one for GPU jobs but Metaflow only supports a single Job Queue.

    While my Compute Environment has successfully spun up p3.2xlarge instances, I have been unable to get a single GPU Step to leave the RUNNABLE state . I've been exploring if this is related to the AWS Launch Template I created to increase the disk size of my instances.

    If anyone has any advice, documentation or examples of running GPU jobs along side CPU jobs in the same Batch Job Queue and Compute Environment with Metaflow, I'd very much appreciate it

    3 replies
    russellbrooks
    @russellbrooks

    Hey guys, nearly forgot to follow up and share some advice when creating Batch compute environments that was especially relevant towards my previous issues when having SFN-executed, highly parallel and short-running jobs being co-located on large instances:

    Not using the default Batch ECS-optimized AMIs that are still using the soon-to-be deprecated Amazon Linux 1 AMIs instead of the latest ECS-optimized Amazon Linux 2 AMIs.

    The Linux 1 AMI uses the Docker devicemapper storage driver, and preallocates 10GB of per-container storage. The Linux 2 AMIs use the Docker overlay2 storage driver, which exposes all unused space on the disk to running containers.


    Manually setting my Batch compute environments to use the latest ECS-optimized Linux 2 AMIs seems to be the cleanest approach, rather than playing with custom ECS Agent docker cleanup parameters. I also reached out to AWS support to see if there’s a reason why Batch hasn’t updated their default AMI, even though the Linux 1 AMI is end-of-life in 2 months. No information was given, but mentioned that they have an internal feature request for it without any guarantees or ETA on when it’d be changed.

    Sharing in case this is useful for anyone else!

    4 replies
    Antoine Tremblay
    @hexa00
    HI, I was wondering looking at the doc, it talks about metaflow.S3 as being new performant client for s3... but looking at the code I see it basically use boto + some custom retry logic... am I missing something?
    13 replies
    Nrithya M
    @MNrithya_twitter
    This message was deleted
    1 reply
    Matt Corley
    @corleyma
    Hi there. I see in the codebase that there seems to be support for monitoring and event logging plugins, both of which have existing debug implementations, but I can't find any mention in the documentation. What would be the best entry point in Metaflow today to handle common cross-cutting concerns for flows like monitoring/performance profiling? My goal is to create a re-usable abstraction that can be enabled to profile on a per step basis things like execution timings (not just of the step overall, but e.g. time spent serializing/deserializing state), peak memory utilization, etc.
    23 replies
    acsecond
    @acsecond
    Hi guys I hope it is the correct place to post, I am new to metaflow I configure it with s3 and batch but now I want to start using step functions in the documentation i read python parameter_flow.py --with retry step-functions create but I get no such command step function, can some one maybe refer me to a good documntation?
    1 reply
    Revaapriyan
    @Revaapriyan

    Hey People. I would like to know a way to restrict the amount of parallelization that should be done in my local instance at any point in time. Parallelization meaning amount of cpu-cores used by the program. Say I have a task that has to executed parallelly as 50 threads, each requires 2 core to process and if my machine is a 32 core machine, Metaflow runs ~15-16 threads at a time utilizing all the processing-cores in the machine. I would like to restrict this parallelization to, say 12 threads at any given point of time.

    In python's multiprocessing library, there is an option of setting the number of pool workers as a required number. Is there a way to achieve the same with Metaflow?

    5 replies
    Carter Kwon
    @CarterKwon

    Hello, I see that Metaflow snapshots the code used in a run

    From the docs: "Code package is an immutable snapshot of the relevant code in the working directory, stored in the datastore, at the time when the run was started. A convenient side-effect of the snapshot is that it also works as a code distribution mechanism for runs that happen in the cloud."

    How would I access the code from previous runs?

    Thanks!

    8 replies
    Richard Decal
    @crypdick

    Hey all, I assessed MetaFlow as an alternative to our Kedro + Airflow infra. Thought I'd share my assessment. One blocker for adopting MetaFlow is the inability to separate parameters from pipeline definitions.

    For context, we currently use Kedro to generate many "flavors" of the same pipeline for different scenarios. For instance, we use the same template inference pipeline for model validation, active learning, detecting label noise, etc. We do this by defining our parameters separately from our DAGs. It would be nice if MetaFlow had integrations with (say) Facebook's Hydra so that we could easily compose config files and separate parameter definitions from DAG definitions.

    image.png

    7 replies
    Bahattin Çiniç
    @bahattincinic

    Hey all, I have a question about logging. In our project, we are using python standard logging. (https://docs.python.org/3/howto/logging.html) When we send a warning, debug vs. logs with it, Metaflow overrides these logs and sends it's to info.

    Here is a code example;

    import logging.config
    
    from metaflow import FlowSpec, step
    
    LOGGING_CONFIG = {
        'version': 1,
        'disable_existing_loggers': False,
        'formatters': {
            'standard': {
                'format': '[%(levelname)s] %(name)s: %(message)s'
            },
        },
        'handlers': {
            'default': {
                'level': 'INFO',
                'formatter': 'standard',
                'class': 'logging.StreamHandler',
                'stream': 'ext://sys.stdout',
            },
        },
        'loggers': {
            '': {  # root logger
                'handlers': ['default'],
                'level': 'INFO',
                'propagate': False
            },
        }
    }
    
    class DebugFlow(FlowSpec):
    
        @step
        def start(self):
            self.next(self.a, self.b)
    
        @step
        def a(self):
            logger.debug("Hello Debug log")
            self.x = 1
            self.next(self.join)
    
        @step
        def b(self):
            self.x = int('2')
            logger.warning("Hello warning log")
            self.next(self.join)
    
        @step
        def join(self, inputs):
            logger.info('a is %s', inputs.a.x)
            logger.info('b is %s', inputs.b.x)
            logger.info('total is %d', sum(input.x for input in inputs))
            logger.error("Hello error log")
            self.next(self.end)
    
        @step
        def end(self):
            pass
    
    if __name__ == '__main__':
        logger = logging.getLogger('DebugFlow')
        DebugFlow()

    When I took a look at how Metaflow handles logging, I realized that Metaflow uses different logging systems. I also tested logging configuration with --event-logger. it looks like it doesn't work.

    import logging.config
    
    from metaflow.plugins import LOGGING_SIDECAR, SIDECAR
    
    from metaflow import FlowSpec, step
    
    
    LOGGING_CONFIG = {
        'version': 1,
        'disable_existing_loggers': False,
        'formatters': {
            'standard': {
                'format': '[%(levelname)s] %(name)s: %(message)s'
            },
        },
        'handlers': {
            'default': {
                'level': 'INFO',
                'formatter': 'standard',
                'class': 'logging.StreamHandler',
                'stream': 'ext://sys.stdout',
            },
        },
        'loggers': {
            '': {  # root logger
                'handlers': ['default'],
                'level': 'INFO',
                'propagate': False
            },
        }
    }
    
    
    class DebugFlow(FlowSpec):
    
        @step
        def start(self):
            self.next(self.a, self.b)
    
        @step
        def a(self):
            logger.debug("Hello Debug log")
            self.x = 1
            self.next(self.join)
    
        @step
        def b(self):
            self.x = int('2')
            logger.warning("Hello warning log")
            self.next(self.join)
    
        @step
        def join(self, inputs):
            logger.info('a is %s', inputs.a.x)
            logger.info('b is %s', inputs.b.x)
            logger.info('total is %d', sum(input.x for input in inputs))
            logger.error("Hello error log")
            self.next(self.end)
    
        @step
        def end(self):
            pass
    
    
    class CustomEventLogger(object):
        TYPE = 'customEventLogger'
    
        def __init__(self):
            self.logger = logging.getLogger('DebugFlow')
    
        def log(self, msg):
            self.logger.info('event_logger: %s', str(msg))
    
        def process_message(self, msg):
            # type: (Message) -> None
            self.log(msg.payload)
    
        def shutdown(self):
            pass
    
    
    def setup_logger():
        logger_config = {
            'customEventLogger': CustomEventLogger
        }
    
        LOGGING_SIDECAR.update(logger_config)
        SIDECAR.update(logger_config)
        logging.config.dictConfig(LOGGING_CONFIG)
    
    
    if __name__ == '__main__':
        setup_logger()
        logger = logging.getLogger('DebugFlow')
        DebugFlow()
    python debug_flow.py --event-logger=customEventLogger run

    How can I configure the Metaflow logger? if it is not possible, how can I send debug, warning logs with Metaflow logger? Thanks.

    10 replies
    Apoorv Sharma
    @sharma_apoorv_twitter

    Hello everyone ! I am exploring options for my next project implementation. Based on initial documentation metaflow seems to hit all the points my team is looking for in a framework. The only question I have is:

    Our team uses Azure and not AWS. Are there going to be issues in deploying and scaling metaflow based solutions on Azure ?

    6 replies
    Calum Macdonald
    @calmacx_gitlab

    hi all,
    I'd like to know what the best way of passing a variable defined in a step that gets split and then use it after joining.

    I could do something like use self.merge_artifacts(inputs,include=[<vars>])? Im sure inputs[0].<var> also works. These are fine, but Im not sure how efficient it is, or how it will cope with many more splits

    Fuller simple example to see what I mean:

    from metaflow import FlowSpec, step
    
    class Foo(FlowSpec):
        @step
        def start(self):
            self.msg = 'hi %s'
            self.steps = list(range(0,10))
            self.next(self.bar, foreach='steps')
        @step
        def bar(self):
            print (self.input)
            print (self.msg%(' from bar'))
            self.next(self.join)
        @step
        def join(self,inputs):
            #to be able to use self.mg in the next step, use merge_artifacts
            self.merge_artifacts(inputs,include=['msg'])
            self.next(self.end)
        @step
        def end(self):
            print (self.msg%(' from end'))
            print ('end')
    
    
    if __name__ == "__main__":
        Foo()

    I want to make sure I'm doing this in the best way

    cheers. Loving metaflow btw , top work on all the docs!

    2 replies
    jonathan-atom
    @jonathan-atom

    Hello Metaflow community! After setting up Airflow for a proof of concept and evaluating the other obvious/recent options, I am trying to decide between Prefect (self-hosted) and Metaflow for next steps.

    There seems to be a gap when it comes to monitoring Metaflow jobs (no ui/dashboard). How do you handle this? Am I missing something or do you fall back on AWS monitoring features?

    1 reply
    Richard Decal
    @crypdick
    ^ looks like my message got chopped. if I factor out a step as a separate imported module, I have to just make sure to return all the artifacts I want to persist and do something like self.x, self.y, self.z, ... = imported_node() ?
    2 replies
    joe153
    @joe153
    I am starting to see this docker error: You have reached your pull rate limit.. I believe this is due to the recent (November 2, 2020) change: https://www.docker.com/increase-rate-limits. What is recommended approach to resolve this? Do you guys have a step by step instruction how we can set up a private account?
    6 replies
    Malay Shah
    @malay95
    Hello everyone,
    I wanted your advise on setting up a devops infrastructure for our team in our company. We want to run the tests in the aws batch and get the artifacts as files back to the caller (either cmd or a script). I know that metaflow shows the stdout from the batch instance in the cmd line and we wanted to do something similar. Can you guys shed some light on this? And what are your thoughts?
    5 replies
    acsecond
    @acsecond

    I have the following folder structure:
    -metaflow project/

    - flow_a.py
    - flow_b.py
    - helpers.py

    Flow a and flow b are separated independent flow, but there some functions that occurs both in a and b,
    For avoiding duplicate code I made helper function in helpers.py which I import in both flow a and b.
    My problem is, when I deploy on AWS step function with python flow_a.py step-functions create
    The flow Is uploaded but helpers.py not, therefore when I try to import in my steps function from the helpers.py the code fail,

    What is the correct approach to address this problem?
    Thx

    3 replies
    Apoorv Sharma
    @sharma_apoorv_twitter
    Can I start a new flow as part of a task ? Or is that considered bad design ?
    2 replies
    Wooyoung Moon
    @wmoon5
    Hi I was wondering if I could see a basic example of parameterizing the @batch decorator of a step from a json file that gets read in as a Parameter. Savin mentioned something about being able to do this by defining a separate function, but now that I'm actually trying to do it, it's not super obvious to me how to actually do it.
    3 replies
    Matt Corley
    @corleyma
    Is there currently a mechanism to change the pickle protocol level for metaflow? Would be great to be able to use protocol 5 where supported.
    15 replies
    Antoine Tremblay
    @hexa00
    Question is the best way to run a long running task to use step functions ? Or there's a way to run things in "daemon" mode... say a user works on his laptop and starts something that takes 3 days... then his laptop is closed etc... what's the proper workflow there ?
    2 replies
    Note we just installed metaflow on aws using terraform.. works great so far ! :) faster than I thought to spin up jobs etc
    2 replies
    acsecond
    @acsecond
    Hi guys short question, In the example of sending Parameters to step function there is the following: {"Parameters": "{\"key1\": \"value1\", \"key2\": \"value2\"}"} always with those "\.... but how do I send a dictionary as a params I tried several method and nothing works.. {"Parameters": "{\"key1\": \"value1\", \"key2\": \"value2\, \"key3\": \"json.dumps(my dict\"}"} this is not working, what is the corrct way?
    1 reply
    karimmohraz
    @karimmohraz
    Hi, we are working on a plugin for creating argo workflows out of metaflow scripts.
    When using "foreach" the method decompress_list(input_paths) in cli.py is called. Unfortunately the argo output parameters do not match the expected "input mode".
    I was wondering if there is a way similar to "task_pre_step" where our ArgoInternalStepDecorator could convert the input_paths into the desired format. Or maybe you have another hint.
    (We want to prevent persisting the childrens' output as step_functions does.)
    1 reply
    Greg Hilston
    @GregHilston

    Is there any reason why one would be unable to store a function in Flows' step and call said function in a subsequent step?

    from metaflow import FlowSpec, step, conda_base
    
    
    @conda_base(python="3.8.3")
    class FunctionStateFlow(FlowSpec):
        """Explores how one can pass functions through Metaflow's state from one
        step to another.
        """
    
        def simple_function(self):
            """Defines a simple function that we can use to pass throughout
            Metaflow
            """
            return 42
    
        @step
        def start(self):
            """Initial step in DAG."""
            self.fun = self.simple_function
    
            print(f"is the variable 'fun' available in 'self'? {hasattr(self, 'fun')}")  # prints true
    
            self.next(self.should_print_forty_two)
    
        @step
        def should_print_forty_two(self):
            """Prints forty two as it leverages the pickled function from the start step"""
            print(f"is the variable 'fun' available in 'self'? {hasattr(self, 'fun')}")  # prints false
    
            print(self.fun())  # AttributeError: Flow FunctionStateFlow has no attribute 'fun'
    
            self.next(self.end)
    
        @step
        def end(self):
            """Does nothing, exists as a formality"""
            pass
    
    
    if __name__ == "__main__":
        FunctionStateFlow()

    I know Metaflow does not support the storage generators but cannot see why storing this function would not work.

    4 replies
    Alireza Keshavarzi
    @isohrab

    Using SAP HANA in one step

    Hi, I need to connect to SAP Hana in one of my steps. I followed official documentation and tested it in sample flow and it works well with batch decorator.

    The problem arise when I use @conda (or @conda_base) decorator. The error is ModuleNotFoundError No module named 'hana_ml'.

    I think I need something like os.system(path/to/conda install hana-ml)
    I posted my code inside this thread.
    I appreciate your help.

    5 replies
    hey @dpatschke, yup that's right that you need to use the respective GPU and non-GPU Linux 2 AMIs! I just get around this by creating a GPU-specific and CPU-specific compute environment for each, and then assigning both compute environments to the same Batch job queue