by

Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
    Savin
    @savingoyal
    @sergiocalde94 Yes, we are in the process of revamping logging for OSS. Expect some progress on that issue relatively soon.
    (EJ) Vivek Pandey
    @Viveckh

    So I am getting this error while trying to inspect the data of my last successful run this morning: S3 datastore operation _get_s3_object failed (An error occurred (404) when calling the HeadObject operation: Not Found)

    I tried looking up the path with flow.latest_successful_run.end_task.artifacts.raw_data_df._object and got the s3 location of the raw_data_df data artifact. And used a separate script to verify I can access and download and it seems to work fine. Any ideas what I might be missing here?

    1 reply
    oonisim
    @oonisim
    Tutorial Episode 6 does not work as in Netflix/metaflow#241. Please advise how to fix.
    1 reply
    A Ivan
    @aaivan239_gitlab
    Hi! I am trying to run the 05-hellowaws example. I have manually setup a Batch Job Queue and Compute environment, as well as an S3 bucket. To debug, I have given the AWSBatchServiceRole and the ecsInstanceRole the S3FullAccess policy. I have also created an IAM role for ecs-tasks with the S3FullAccess policy, this is the policy that I have specified for METAFLOW_ECS_S3_ACCESS_IAM_ROLE in~/.metaflowconfig/config.json. When I run the script with python helloaws.py --datastore=s3 run everything executes fine, however if I try to use the conda environment flag python helloaws.py --environment=conda --datastore=s3 run the job hangs for an hour on the "Bootstrapping environment" step. After 45 minutes - hour the job crashes with an OOM error OutOfMemoryError: Container killed due to memory usage with an underlying error in the logs metaflow.datatools.s3.MetaflowS3Exception: Getting S3 files failed. First prefix requested: s3://mf-test-bucket-28393/mf/conda/conda.anaconda.org/conda-forge/noarch/click-7.0-py_0.tar.bz2/e8fb50fd9833010f105f989fa806deb3/click-7.0-py_0.tar.bz2 -- Has anybody seen this before or is able to provide any insights?
    2 replies
    orasphong
    @panangam
    Is there a way to
    • run a step after a foreach step that doesn't join?
    • run another foreach after a foreach step?
    3 replies
    something like
             foreach (a) -- continue (a)
            /                           \
    start --                             -- end
            \                           /
             foreach (b) -- continue (b)
    Ji Xu
    @xujiboy
    May I know if this issue has been looked into? #179
    I really would like to take advantage of @conda but don't have root access.
    2 replies
    derrickk23
    @derrickk23
    Is there a way to specify dynamically memory requirements for the @batch decorator at runtime? My workflow has a fan out with many parallel jobs where each batch job uses between 10GB of memory to 1TB, and the amount of memory per job can be determined at runtime based its input, created dynamically in a previous step. Obviously, I would like to avoid having to allocate an EC2 instance with 1TB of memory for each (small) job.
    2 replies
    Wooyoung Moon
    @wmoon5

    Hi is there any way to run generic flows that read from a flow parameters json/yml file? So that I can do something like this:

    python my_flow.py run --parameters my_flow_parameters.yml

    5 replies
    tkr789
    @tkr789
    Hi, what are the requirements for running a docker image in @batch ? I need to set up an container that has a java process running in the background in addition to python, so I need both python and java installed and running. In addition I need some packages that are only pip and git installed (so no conda decorator). As far as I've been able to tell the java process can only run on one instance of ec2 (can't thread) so I need to parallelize on multiple instances rather than multiple cpu's, hence the container.
    4 replies
    derrickk23
    @derrickk23

    Hello, Is there a way to get the full s3 url path of a metaflow artifact, which was stored in a step?

    I looked at Metaflow's DataArtifact class but didn't see an obvious s3 path property.

    5 replies
    Joseph Bentivegna
    @jbentivegna15
    Hi all, has anyone experienced a situation where the flow runs to completion with no errors but when trying to view outputs from the flow, certain pandas dataframes are accessible while others are not and throw the error: S3 datastore operation _get_s3_object failed (An error occurred (400) when calling the HeadObject operation: Bad Request). Retrying 7 more times..
    40 replies
    Bahattin Çiniç
    @bahattincinic

    Hi all, I have a question regarding the inheritance of the Flow.
    We have an algorithm that has multiple versions (v1.0, v1.1 v1.2, etc.). Mostly these versions are similar to the primary version. So we only want to override some steps. When I checked if Metaflow supports it, I saw Netflix/metaflow#245 this ticket.

    it looks like we have 2 options to implement this;

    Do you think which one is the best choice in the Metaflow ecosystem? or different ideas about this?

    Thanks.

    3 replies
    Denis Maciel
    @denispmaciel_gitlab
    Hi, is there a way to add a custom step decorator without touching the source code?
    10 replies
    Malay Shah
    @malay95
    Hi, is there a way to tag the flow based on another parameter object in the start step?
    11 replies
    adKatta
    @adKatta
    hi I have deployed metaflow on AWS and am trying to run a job and I always get this error:
    Batch error: Task crashed due to OutOfMemoryError: Container killed due to memory usage .This could be a transient error. Use @retry to retry.
    How can i increase the memory availability in the cloudformation template
    adKatta
    @adKatta
    I have tried various combinations of @batch(memory=3072). I think it is the ECS capacity issue
    12 replies
    Malay Shah
    @malay95
    Hello all, I am provisioning an ECS for the batch instances. And I can monitor the metrics of the ECS using the memory utilization and memory reservation. When I run the metaflow step on batch I get the error OutOfMemory but when I look at the utilization its around 7% and the reservation is 86%. Where can I monitor the exact memory usage of the step or find out the exact issue. When I run the same step on local, the memory usage is around 2GB and I have kept 4GB in the batch decorator. Thanks in advance.
    9 replies
    adKatta
    @adKatta

    Internally we export our flows to Meson (Netflix's workflow orchestrator) and shortly we are going to release a similar integration with AWS Step Functions - Netflix/metaflow#2

    Hi @savingoyal when can we hope for this feature? I have seen the google docs for this and we are excited and keen to try this feature out. Also is there a nightly-build we can access?

    2 replies
    Peter Wilton
    @ammodramus

    Hi all, I am having some trouble with FlowSpec.merge_artifacts. I have a scatter-join sequence of steps that looks like this:

    @step
    def set_up(self):
        self.foreach_tuple = TUPLE_OF_VALUES_TO_PROCESS
        self.next(self.process, foreach='foreach_tuple')
    
    @step
    def process(self):
        self.value = self.input
        # (process self.value, not touching self.foreach_tuple)
        self.next(join_step)
    
    @step
    def join_step(self, inputs):
        self.merge_artifacts(inputs)

    Running my code results in a MergeArtifactsException due to input.foreach_tuple having a different value in each input in inputs. This despite the fact that I haven't touched foreach_tuple since assignment in set_up.

    The odd thing is that when I look at inputs in FlowSpec.merge_artifacts, the values of input._datastore['foreach_tuple'] are the same for each input in inputs, as expected. However, the SHAs (as accessed via input._datastore.items()) are all different.

    Any idea about what may be causing the SHAs too all differ (triggering the MergeArtifactsException) while the values are all the same?

    Thanks in advance.

    13 replies
    Martin Cheong
    @martincheong-myob

    Hi, we're running into an issue where the AWS Batch job will succeed but the flow won't proceed to subsequent steps. It looks like it's hanging waiting for the Batch job to complete even though it already has. Any ideas as to what might be going on here? We're just running the 00-helloworld flow with a @batch decorator for the hello step. The logs are as below:

    2020-07-21 11:36:24.118 Workflow starting (run-id 23):
    2020-07-21 11:36:24.182 [23/start/65 (pid 19754)] Task is starting.
    2020-07-21 11:36:25.281 [23/start/65 (pid 19754)] HelloFlow is starting.
    2020-07-21 11:36:25.609 [23/start/65 (pid 19754)] Task finished successfully.
    2020-07-21 11:36:25.847 [23/hello/66 (pid 19811)] Task is starting.
    2020-07-21 11:36:26.597 [23/hello/66 (pid 19811)] [5c92ef5f-6525-4b76-8e4e-96d75049d087] Task is starting (status SUBMITTED)...
    2020-07-21 11:36:29.747 [23/hello/66 (pid 19811)] [5c92ef5f-6525-4b76-8e4e-96d75049d087] Task is starting (status RUNNABLE)...
    2020-07-21 11:36:59.777 [23/hello/66 (pid 19811)] [5c92ef5f-6525-4b76-8e4e-96d75049d087] Task is starting (status RUNNABLE)...
    2020-07-21 11:37:29.831 [23/hello/66 (pid 19811)] [5c92ef5f-6525-4b76-8e4e-96d75049d087] Task is starting (status RUNNABLE)...
    2020-07-21 11:38:00.027 [23/hello/66 (pid 19811)] [5c92ef5f-6525-4b76-8e4e-96d75049d087] Task is starting (status RUNNABLE)...
    2020-07-21 11:38:04.489 [23/hello/66 (pid 19811)] [5c92ef5f-6525-4b76-8e4e-96d75049d087] Task is starting (status STARTING)...
    2020-07-21 11:38:30.738 [23/hello/66 (pid 19811)] [5c92ef5f-6525-4b76-8e4e-96d75049d087] Task is starting (status RUNNING)...
    2020-07-21 11:53:04.709 1 tasks are running: e.g. ....
    2020-07-21 11:53:04.709 0 tasks are waiting in the queue.
    2020-07-21 11:53:04.709 0 steps are pending: e.g. ....

    Any help would be appreciated. Thanks.

    23 replies
    Zhaozhufeng1
    @Zhaozhufeng1
    I have Metaflow configured with AWS, and can successfully run the job without batch on AWS. But when I try to run a job with batch, it return the error below:
    30 replies
    Screen Shot 2020-07-21 at 5.13.53 PM.png
    Savin
    @savingoyal
    Screenshot 2020-07-21 at 2.32.56 PM.png
    2 replies
    simon-lyons
    @simon-lyons
    Screenshot 2020-07-22 at 22.07.37.png

    Hi, I just had a quick question about debugging a metaflow run. It seems like there's some sort of buffering to stdout taking place. If I open a console in debug mode and execute print('foo\nbar'), the system will only print foo to the console. I'll have to print something else to see bar.

    Any idea how I might find a workaround for this issue? It gets triggered if you try to print the contends of a pandas DataFrame, which can be a real pain when you're debugging

    4 replies
    Juan Daza
    @dazajuandaniel
    Hi,
    Sorry for the early enter press! I'm trying to use KMS for authentication in AWS for writing to S3. Is this supported by Metaflow? I couldn't find any documentation about this.
    5 replies
    Taleb Zeghmi
    @talebzeghmi
    3 replies
    Jack Wells
    @jackwellsxyz
    Screen Shot 2020-07-24 at 16.10.58.png
    Hi, I think the answer is no, but does Metaflow support a feature store like Airbnb's Zipline (closed source)? I'm really racking my brain for a solution that helps turn our database of information on our customers into something usable for modeling.
    5 replies
    Martin Cheong
    @martincheong-myob
    Hi. I'm trying to dockerise the Metaflow CLI to provide a consistent experience and abstract away the configuration. I've noticed the caching of the conda dependencies is no longer happening and was wondering if I can specify the cache path. I saw METAFLOW_CLIENT_CACHE_PATH in the codebase but providing that as an env var doesn't seem to be doing anything. Any suggestions? Thanks.
    9 replies
    Ji Xu
    @xujiboy
    Hi may I know how I can access data artifact without using self.?
    13 replies
    Ville Tuulos
    @tuulos
    wohoo! Step Functions (aka Production Scheduler) integration is out finally! https://netflixtechblog.com/unbundling-data-science-workflows-with-metaflow-and-aws-step-functions-d454780c6280

    also we published a new Administrators Guide to Metaflow, which should be of interest for many people here https://admin-docs.metaflow.org/

    The admin guide was inspired by your questions and comments here, so a huge thanks to everyone! I hope you will find it useful. Please give feedback especially if you notice something missing or misrepresented.

    russellbrooks
    @russellbrooks
    :heart: :clap: thank you all for the outstanding work and can't wait to try it out – really been looking forward to this integration!
    13 replies
    Kolja
    @koljamaier
    Hi @tuulos thanks for the update, great to hear. Will internal Netflix teams also adapt SF now instead of meson?
    2 replies
    Kolja
    @koljamaier
    Reading through https://docs.metaflow.org/going-to-production-with-metaflow/scheduling-metaflow-flows it is not clear to my if it is possible to setup external triggers (e.g. another SFN task finished loading our ETL data, so the metaflow job/SF can be triggered). Is this possible or are only time based schedules supported?
    3 replies
    Hao Yuan
    @hyuan-integrate
    I have read this doc, https://docs.metaflow.org/metaflow/dependencies but it does not mention private python package repo. we have an internal pipy repo host the internal python packages. is there a way to pull packages from this private python repo?
    5 replies
    Ritesh Agrawal
    @ragrawal
    hi, I am starting to explore Metaflow for my use case . It's a standard machine learning problem. I would like to train a model on some data and use the trained model for making inference. However, I am not able to find a simple example that shows the two process (training and inference). Will appreciate any pointer to a simple example
    16 replies
    Ritesh Agrawal
    @ragrawal
    hi..I just got setup with AWS and running into challenges with some libraries as they are not available as part of Conda installation. Is there a way I can specify environment.yml file instead and included both pip and non-pip packages
    3 replies
    w76
    @w76

    Hi I have a question running a project using Metaflow on AWS. How does metaflow on batch handle module imports of other files that are required by the python script that is being run using --with batch?
    If for instance, in the tutorial example,
    python 02-statistics/stats.py --with batch run --max-workers 4,
    "stats.py" had a module import such as 'import moduleX', and the functions in moduleX are used by stats.py, how does batch handle the dependency on moduleX as this would typically be on my local filesystem and the --with batch only runs stats.py on batch

    In my project I have a library of modules and helper functions that would be called within the script that is run as a metaflow DAG. I'm unclear how I can make the other modules available when running on batch when they are not in the file that defines the metaflow DAG tasks

    2 replies
    bishax
    @bishax

    Hi, I'm struggling to work out how to use images not on dockerhub, e.g. I'd like to use images from AWS ECR.

    metaflow configure aws hints that if METAFLOW_BATCH_CONTAINER_REGISTRY is not set then "https://hub.docker.com/" is used; however if I explicitly set it to that value then I get AWS Batch error:,CannotPullContainerError: invalid reference format This could be a transient error. Use @retry to retry.

    edit: everything works normally if I don't configure METAFLOW_BATCH_CONTAINER_REGISTRY

    2 replies
    Ritesh Agrawal
    @ragrawal
    when using InputFile, how do you covert it to DataFrame
    16 replies
    Ritesh Agrawal
    @ragrawal
    once I have configured AWS, is there a way to run things again locally. I am trying to debug my pipeline and don't want to connect to AWS..is there some parameter that I can set that forces Metaflow to use local datastore
    2 replies
    VahidTehrani
    @VahidTehrani
    @benjaminbluhm and @tuulos Hey! did you guys figure out how to load parquet files from s3? Any example? Thanks
    Max Nikolaus Pagel
    @moodgorning

    Hi, has anyone gotten the conda dependency management to work with pycharm and be able to debug?
    I am running it with a miniconda interpreter and it seems to start fine, but never actually starts executing anything in the flow. However I don't really get any error either. If I run it from my command line it works just fine. Runing this on OSX

    runfile('/Users/maxpagel/Developer/infrastructure/TimescaleDB/Scripts/metaFlowPlayground.py', args=['--environment=conda', 'run'], wdir='/Users/maxpagel/Developer/infrastructure/TimescaleDB/Scripts')
    Metaflow 2.1.1 executing BranchFlow for user:maxpagel
    Validating your flow...
    The graph looks good!
    Running pylint...
    Process finished with exit code 1

    Max Nikolaus Pagel
    @moodgorning
    Hmm turns out it works if I use the system interpreter, but not if I use a conda environment as interpreter that should be fine then