Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
    Salvador Gimeno
    @salvagimeno-ai
    HI, is there any alternative to @conda decorator? How can I get a list and documentation of all decorators available?
    11 replies
    zachary-mcpher
    @zachary-mcpher
    Hello, I am interested in overriding the default log driver for the Batch job definition so that we can ingest logs from the Metaflow Batch jobs running in ECS into Datadog as opposed to Cloudwatch logs. How would I go about overriding the default job definition which Metaflow creates on my behalf when sending tasks to Batch job queues?
    4 replies
    Guy Berdugo
    @guy_b:matrix.org
    [m]
    Hi All, I've been trying to use the tutorials and I'm hitting an error on Episode 04, trace output: 2021-08-15 10:58:48.074 [1629014326579786/start/1 (pid 1667187)] File "/tmp/tmpmdb41gsv/metaflow/cli.py", line 1005, in main
    2021-08-15 10:58:48.074 [1629014326579786/start/1 (pid 1667187)] start(auto_envvar_prefix='METAFLOW', obj=state)
    2021-08-15 10:58:48.074 [1629014326579786/start/1 (pid 1667187)] File "/home/guy/miniconda3/envs/metaflow_PlayListFlow_linux-64_179c56284704ca8e53622f848a3df27cdd1f4327/lib/python3.7/site-packages/click/core.py", line 829, in call
    2021-08-15 10:58:48.074 [1629014326579786/start/1 (pid 1667187)] return self.main(args, kwargs)
    2021-08-15 10:58:48.074 [1629014326579786/start/1 (pid 1667187)] File "/home/guy/miniconda3/envs/metaflow_PlayListFlow_linux-64_179c56284704ca8e53622f848a3df27cdd1f4327/lib/python3.7/site-packages/click/core.py", line 782, in main
    2021-08-15 10:58:48.075 [1629014326579786/start/1 (pid 1667187)] rv = self.invoke(ctx)
    2021-08-15 10:58:48.075 [1629014326579786/start/1 (pid 1667187)] File "/home/guy/miniconda3/envs/metaflow_PlayListFlow_linux-64_179c56284704ca8e53622f848a3df27cdd1f4327/lib/python3.7/site-packages/click/core.py", line 1259, in invoke
    2021-08-15 10:58:48.075 [1629014326579786/start/1 (pid 1667187)] return _process_result(sub_ctx.command.invoke(sub_ctx))
    2021-08-15 10:58:48.075 [1629014326579786/start/1 (pid 1667187)] File "/home/guy/miniconda3/envs/metaflow_PlayListFlow_linux-64_179c56284704ca8e53622f848a3df27cdd1f4327/lib/python3.7/site-packages/click/core.py", line 1066, in invoke
    2021-08-15 10:58:48.075 [1629014326579786/start/1 (pid 1667187)] return ctx.invoke(self.callback, ctx.params)
    2021-08-15 10:58:48.075 [1629014326579786/start/1 (pid 1667187)] File "/home/guy/miniconda3/envs/metaflow_PlayListFlow_linux-64_179c56284704ca8e53622f848a3df27cdd1f4327/lib/python3.7/site-packages/click/core.py", line 610, in invoke
    2021-08-15 10:58:48.075 [1629014326579786/start/1 (pid 1667187)] return callback(args, kwargs)
    2021-08-15 10:58:48.156 [1629014326579786/start/1 (pid 1667187)] File "/home/guy/miniconda3/envs/metaflow_PlayListFlow_linux-64_179c56284704ca8e53622f848a3df27cdd1f4327/lib/python3.7/site-packages/click/decorators.py", line 21, in new_func
    2021-08-15 10:58:48.156 [1629014326579786/start/1 (pid 1667187)] return f(get_current_context(), args, kwargs)
    2021-08-15 10:58:48.156 [1629014326579786/start/1 (pid 1667187)] File "/tmp/tmpmdb41gsv/metaflow/cli.py", line 521, in step
    2021-08-15 10:58:48.156 [1629014326579786/start/1 (pid 1667187)] max_user_code_retries)
    2021-08-15 10:58:48.156 [1629014326579786/start/1 (pid 1667187)] File "/tmp/tmpmdb41gsv/metaflow/task.py", line 444, in run_step
    2021-08-15 10:58:48.156 [1629014326579786/start/1 (pid 1667187)] self._exec_step_function(step_func)
    2021-08-15 10:58:48.156 [1629014326579786/start/1 (pid 1667187)] File "/tmp/tmpmdb41gsv/metaflow/task.py", line 51, in _exec_step_function
    2021-08-15 10:58:48.156 [1629014326579786/start/1 (pid 1667187)] step_function()
    2021-08-15 10:58:48.156 [1629014326579786/start/1 (pid 1667187)] File "04-playlist-plus/playlist.py", line 71, in start
    2021-08-15 10:58:48.157 [1629014326579786/start/1 (pid 1667187)] self.dataframe = run['start'].task.data.dataframe
    2021-08-15 10:58:48.157 [1629014326579786/start/1 (pid 1667187)] File "/tmp/tmpmdb41gsv/metaflow/client/core.py", line 585, in getattr
    2021-08-15 10:58:48.157 [1629014326579786/start/1 (pid 1667187)] return self._artifacts[name].data
    2021-08-15 10:58:48.157 [1629014326579786/start/1 (pid 1667187)] File "/tmp/tmpmdb41gsv/metaflow/client/core.py", line 713, in data
    2021-08-15 10:58:48.157 [1629014326579786/start/1 (pid 1667187)] obj = pickle.load(f)
    2021-08-15 10:58:48.157 [1629014326579786/start/1 (pid 1667187)] AttributeError: Can't get attribute 'new_block' on <module 'pandas.core.internals.blocks' from '/home/guy/miniconda3/envs/metaflow_PlayListFlow_linux-64_179c56284704ca8e53622f848a3df27cdd1f4327/lib/python3.7/site-packages/pandas/core/internals/blocks.py'>
    3 replies
    D. Sturgeon
    @dasturge
    Is there still no place I can view more technical API documentation for different classes? I was trying to figure if Parameter supports a boolean flag mode
    4 replies
    Ben K Muller
    @bennnym

    Hey community,

    I am having issues with importing local modules.

    I have no idea why this behaviour appears to be unique to metaflow.

    I have tried the package list command to make sure everything I expect is being packaged up.

    My project structure is as so:

    .
    ├── README.md
    ├── python
    │   ├── db
    │   │   ├── __init__.py
    │   │   ├── database_adapter.py
    │   │   └── database_connector.py
    
    │   ├── sectional_model_flow.py
    │   ├── __init__.py
    ├── requirements.txt
    └── tests
        └── unit
            ├── aws
            │   ├── test_ssm.py
            │   └── test_sts.py
            └── db
                ├── test_database_adapter.py
                └── test_database_connector.py

    I tried this way because metaflow doesnt like me importing say from sectional_model_flow like so from python.db.database_connector import DatabaseConnector it gives me a ModuleNotFoundError.

    So I tried following the instructions here (Netflix/metaflow#175), and this allowed it to run, but now my tests can't run because the import above would change to from db.database_connector import DatabaseConnector

    What am I missing, why does this behave so differently to normal python implementations?

    1 reply
    ahmadhori
    @ahmadhori
    I have question which is connected to exception handling:
    using @catch decorator right now I can handle exception on the step level.
    My question is: how can I skip all steps and go directly to the last step if an exception has been raised in any step?
    I can ask the question in different way
    how can I handle exceptions on the flow level, I mean I want to catch any exception in any step and do an action according to that (send a request to a backend informing that a flow failed)
    4 replies
    latchapipeline
    @latchapipeline
    Sorry, this may be SUPER obvious. I have installed Metaflow on AWS with CloudFormation. I have created the config file with the data from the outputs of the CloudFormation template. When I try to run the 05 tutorial - I get ... Metadata request (/flows/HelloFlow) failed (code 403): {"message":"Forbidden"} ... Easiest way to debug this?
    latchapipeline
    @latchapipeline
    Just going to monologue this a bit.... So I am logged in an an SSO user. Presumably this SSO role is missing the permissions to do whatever it needs to do from the Metadata Service. So how to figure that out? Maybe look at the logs? (Presumably in CloudWatch?)
    latchapipeline
    @latchapipeline
    So I found this CloudWatch Log Group:
    /ecs/metaflow-infrastructure-metadata-service-v2
    That seems promising
    Not knowing what I am looking at ... it seems "right". Things like: INFO:AsyncPostgresDB:global:Connection established.
    latchapipeline
    @latchapipeline
    I can go to API Gatewaay (in AWS Console) and run a test invocation of /db_schema_status, which returns some stuff and seems normal. So that is good. (Which back in CloudWatch, I can see that call returning 200.)
    latchapipeline
    @latchapipeline
    I think I may be missing S3 bucket permissions ... maybe?
    latchapipeline
    @latchapipeline
    Nope that's not it.
    Hmmm...
    latchapipeline
    @latchapipeline
    So, maybe I have to assume a role to make this work and my SSO user does not have THAT permission.
    Savin
    @savingoyal
    @latchapipeline We have migrated the community to slack.outerbounds.co. We can help you out there!
    latchapipeline
    @latchapipeline
    ok
    Tom Ewing
    @tomewing1979_twitter
    Hi! I'm really new to Metaflow but really enjoying it so far! I have a question about logging - I was using a custom logger however it's producing strange results and from the reading I've done, Metaflow has it's own inbuilt one. Does anyone know how I can set this up to either spit out a log file to a folder or potentially to a sqlite db?
    1 reply
    Thanks in advance!
    Edvin Močibob
    @emocibob

    Hi, I have a question about Metaflow RDS DB migration.

    I set up the Metaflow CloudFormation stack in a number AWS accounts some time ago.
    Now I want to update the CF template and enable encryption for the RDS DB.
    Since this change will replace the DB, I want to safely migrate the data to new DB instances.

    Here's my migration plan: 1) deploy new Metaflow stack, 2) migrate data from old DB to new DB with pg_dump and pg_restore, 3) recreate .metaflowconfig files, and 4) delete old stack.

    How can I be sure that my older deployments of Metaflow will have the same tables/schema as newer deployments?

    Looking at the docs here, I'm not sure how to proceed.

    4 replies
    pjchungmd
    @pjchungmd
    Hi, I purchased Effective Data Science Infrastructure, and have since been super excited by what Metaflow brings. I have been testing it on my local environment, however my workplace currently is "married" to Azure. I realize there seems to be work on integrating Metaflow with other clouds, but until that happens, can I continue to use Metaflow locally, and then later "export" my data to the cloud? Alternatively, are there any known work arounds to at least persist data on something other than S3 (hopefully this makes sense)? Thank you!
    3 replies
    maxrausch
    @maxrausch
    How does metaflow persist artifacts? Is it simply using python pickle? Are there any limitations on what can be saved?
    I hate people
    @anthesles_twitter

    Hey everyone. Metaflow is awesome. I have a question, sorry if it's recurring. I'd like to open a connection in a database during the initialization phase of my class that inherits FlowSpec. I tried something like

    class MyFlow(FlowSpec):
        def __init__(self):
            self.qdb = querydb.QueryDB()
            FlowSpec.__init__(self)
    ...

    But when I run the flow I get an internal error. Any hints?

    I hate people
    @anthesles_twitter

    Hm, when I wrote it as:

    def __init__(self):
        self.qdb = querydb.QueryDB()
        super(MyFlow, self).__init__()

    I get the more informative error:
    2021-10-31 21:19:58.767 [161/start/2407 (pid 15355)] blob = pickle.dumps(obj, protocol=2)
    2021-10-31 21:19:58.767 [161/start/2407 (pid 15355)] TypeError: cannot pickle 'psycopg2.extensions.connection' object
    2021-10-31 21:19:58.767 [161/start/2407 (pid 15355)]

    The idea is to open the connection at init and share it among children (the QueryDB class is thread-safe).
    Because otherwise, I might end up opening hundreds of connections. Is there any workaround or any other approach?
    Jochen Gast
    @jochengast_twitter

    Hi everybody, I'm new to Metaflow and running into a few efficiency issues: I'm currently experimenting with Metaflow+batch to compute (PyTorch model) embeddings for datasets in the 10-100MM range where each such dataset is a S3 bucket full of JPGs (this input data format is a given as it's driven by customers).

    Q: What is best practice here to efficiently implement the data loading logic? The datasets are too big to store locally, and i thought one of the tricks here is to directly stream from S3? But any of Metaflow's S3.get functions it will make a local copy first right?

    Q: More precisely: Currently I have a written PyTorch Dataset that makes a single S3.get call for each s3uri (in the getitem function), but this is slow (which makes sense). Any leads about where going next?

    (I would be happy to see a snippet about doing large scale inference (not training) using Metaflow + PyTorch)

    Savin
    @savingoyal
    @jochen we have migrated the community to slack.outerbounds.co. Someone should be able to help you there.
    Skyler Erickson
    @skyler1253
    Hey all - I'm using metaflow 'get_many' from metaflow import S3 with S3(s3root='s3://my-bucket/savin/tmp/s3demo/') as s3: s3.get_many(['fruit', 'animal']) to fetch objects from S3. I'm wondering if there are any examples of how to effectively do this in a pytorch dataloader with multiple workers? There's a lot of multiprocesing baked into both libraries and I'm curious if there's a tutorial / best practice for how to do this with a pytorch dataloader? Thanks!
    D. Sturgeon
    @dasturge
    is there any way to setup a callback for spawning metaflow workers?
    2 replies
    Ghost
    @ghost~61e2223f6da03739848e5fc9
    Hello everyone,
    I have created metaflow tutorials for beginners. Please have a look: https://www.youtube.com/watch?v=CVuQ3I3sxi4&list=PL1JTerYNOdfGJXOIIGiOQZ_otDaA7Nx9Y&ab_channel=AIScientist
    Please like and subscribe the channel incase you like the content
    1 reply
    Savin
    @savingoyal
    We have a new support channel on slack now for Metaflow - http://slack.outerbounds.co
    Nissan Pow
    @npow
    Hi, I used the CloudFormation template to setup Metaflow on AWS, and I want to restrict access to the Metaflow UI to only my VPC (by default it's accessible from anywhere). I tried adding the CIDR block for my VPC to the ALB's security group, but that prevents AWS Batch from being able to connect to the UI. Any suggestions on how to restrict access properly?
    urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='metaf-albui-16iw6xoay172l-1453067463.us-east-1.elb.amazonaws.com', port=80): Max retries exceeded with url: /flows/ContentAgnosticModelBuilder (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fa74192e8d0>: Failed to establish a new connection: [Errno 110] Connection timed out'))
    2 replies
    Savin
    @savingoyal
    test
    Blanchon
    @blanchon:matrix.org
    [m]
    Hi, I think I'm missing something about Mlflow. I don't understand why I get a PermissionError: [Errno 13] Permission denied: '/app' error when trying to log artifact using my docker mlflow server
    1 reply
    Here is my dockerfile
    FROM python:3-slim
    
    WORKDIR /mlflow/
    RUN pip install --no-cache-dir mlflow==1.23.1
    EXPOSE 5000
    
    ENV BACKEND_URI sqlite:////app/mlflow/mlflow.db
    ENV ARTIFACT_ROOT /app/mlflow/artifacts
    
    CMD mlflow server \
        --backend-store-uri ${BACKEND_URI} \
        --default-artifact-root ${ARTIFACT_ROOT} \
        --host 0.0.0.0 \
        --port 5000
    maxupp
    @maxupp
    hey
    i'm debating adopting metaflow for my data science team
    i have some conceptual questions, if you would be so kind: what's the best practice if a flow is modified at a later point, but will work on the same data? Do you create an entirely new flow? Or is the original flow versioned?
    2 replies
    Josh Zastrow
    @JoshZastrow

    Hi, does metaflow's S3 API work with session tokens?

    MetaflowS3Exception                       Traceback (most recent call last)
    <ipython-input-318-a0128f9c1437> in <module>
          5     # url = s3.put('example_object', message)
          6     # print("Message saved at", url)
    ----> 7     s3.get('test.csv')
    
    ~/Github/venvs/lib/python3.9/site-packages/metaflow/datatools/s3.py in get(self, key, return_missing, return_info)
        612         addl_info = None
        613         try:
    --> 614             path, addl_info = self._one_boto_op(_download, url)
        615         except MetaflowS3NotFound:
        616             if return_missing:
    
    ~/Github/venvs/lib/python3.9/site-packages/metaflow/datatools/s3.py in _one_boto_op(self, op, url, create_tmp_file)
        930             # add some jitter to make sure retries are not synchronized
        931             time.sleep(2 ** i + random.randint(0, 10))
    --> 932         raise MetaflowS3Exception(
        933             "S3 operation failed.\n" "Key requested: %s\n" "Error: %s" % (url, error)
        934         )
    
    MetaflowS3Exception: S3 operation failed.
    Key requested: s3://demo-nonprod-sagemaker/joshzastrow/sandbox/test.csv
    Error: An error occurred (InvalidToken) when calling the GetObject operation: The provided token is malformed or otherwise invalid.

    I have aws_access_key , aws_secret_access_key and aws_session_token setup as my default and user profile in ~/.aws/credentials. I ran metaflow configure aws and only configured the AWS s3 as the storage backend.

    I've tried setting the profile both for aws and in metaflow (export METAFLOW_PROFILE=my_profile).

    I have been able to write to S3 using s3fs object:

    s3_file_system = s3fs.S3FileSystem(
            anon=False,
            s3_additional_kwargs={'ServerSideEncryption': 'AES256'},
            profile=PROFILE
        )

    I'm wondering if there is an additional encryption argument that needs to be set with Metaflow in order to work with it's S3 API?

    7 replies
    Alireza Keshavarzi
    @isohrab

    Hey,
    I deployed Metaflow with provided cloudformation stack with default values and everything works fine. But I received a security concern from AWS Support Security advisor that I should disable public IP of the task definition, because it is accessible from Internet. Why we need a public IP? can we disable it?

    I appreciate your explanation. I will forward it to my manager :)
    Best regards and thanks

    Savin
    @savingoyal
    You should be able to disable the public IP without any issues if memory serves me right. We are happy to dive into details at slack.outerbounds.co
    Alireza Keshavarzi
    @isohrab
    @savingoyal Thank you so much for you quick response. I will post it in the slack channel again and continue the thread there. See you there :)
    Pramod
    @pramodatre_twitter
    I'm trying to push a flow to Step Function so that we can trigger the flow based on some events (following documentation here: https://docs.metaflow.org/going-to-production-with-metaflow/scheduling-metaflow-flows). Metaflow documentation explains setup of cron like trigger or manually trigger the flow from command line. I need to trigger the Step Function based on s3 events (e.g., object creation). I checked the Step Function documentation from AWS and they say that we can define triggers on s3 bucket events to trigger the Step Function -- I was able to get this configured. However, I have an issue now -- I cannot find documentation anywhere on accessing event metadata that triggers the Step Function -- I need this since I want the flow to operate on a file that was placed on s3 which triggered the Step Function. Is there a way I can access the event metadata that triggered the Step Function within the metaflow step? I'm blocked by this and any help will be greatly appreciated!
    Ammar Asmro
    @ammarasmro
    Hi all
    The dependencies documentation (https://docs.metaflow.org/metaflow/dependencies) talks about private deps and mentions how Conda is the better solution but there are no examples on how to use a private dependency with the conda decorator
    The other part of the question is that I usually build the package using pip and register it on gemfury and use it in the conda environoment file similar to this comment: https://github.com/Netflix/metaflow/issues/24#issuecomment-698018767
    Anyway to do this in metaflow?
    Ghost
    @ghost~6258a0a86da0373984947c3c
    Hi all, very new to Metaflow but really enjoying it so far. I want to use a small test dataset locally for development testing and a much bigger dataset that lives in S3 that I would run on remotely when I want to actually train stuff. Is there an easy way to do that? Or should I just run everything on everything all the time? I'm currently using this for personal projects so want to keep costs down.
    1 reply
    Munkhbat Munkhjargal
    @munkhbat2_gitlab
    Metaflow service error:
    Metadata request (/flows/HelloFlow) failed (code 403): {"message":"Forbidden"}
    Does anyone know how to fix this?
    1 reply