Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
    Zachary King
    @zcking
    thanks @seeravikiran ! another question, albeit unrelated to conda: how would I go about setting up Metaflow on AWS without the use of a public IP (subnet) for the ECS service (metadata api)? I could spin it up in a private subnet and then create an SSH tunnel through a bastion host--that would allow me to connect to the ECS service from my local machine--but it seems that whatever you put in the METAFLOW_SERVICE_URL setting in ~/.metaflowconfig/config.json is what gets used in the AWS Batch Job; if that's the case, a port forward on my local machine would screw the Batch Job up by passing along "localhost:8080" to it. Thoughts?
    Or is the whole running-from-your-local-machine just a sandbox/learning approach? I suppose if you were executing all your metaflow code from a Jupyter notebook environment in AWS (e.g. Jupyterhub on EMR, or SageMaker) then you could just use Private subnets all around....
    Zachary King
    @zcking
    a better question, or perhaps followup, would be how does one customize the environment variables that go to the Batch job? i.e. if I wanted to change the METAFLOW_SERVICE_URL on Batch, or if I wanted to add application configuration for something like a database
    seeravikiran
    @seeravikiran
    To change the configuration we provide metaflow configure aws’
    This will allow you to set the URL that you want. Now I dint exactly know the answer to your port forwarding setup. You could also check out the project metaflow-service which allows a way to stand up the service locally leveraging docker. @ferras is our expert in this and hopefully he can chime in as well.
    Savin
    @savingoyal
    @zcking We do offer the @environment decorator which you might find useful for setting up env variables for the batch job. It's not currently documented - but here is some information - https://github.com/Netflix/metaflow/blob/master/metaflow/plugins/environment_decorator.py#L16
    Savin
    @savingoyal
    You can do @environment(vars={'METAFLOW_SERVICE_URL': 'value'}) to override the METAFLOW_SERVICE_URL default that we set up.

    Or is the whole running-from-your-local-machine just a sandbox/learning approach? I suppose if you were executing all your metaflow code from a Jupyter notebook environment in AWS (e.g. Jupyterhub on EMR, or SageMaker) then you could just use Private subnets all around....

    Internally our users seamlessly move between their local machines, notebook environment, virtual machines on AWS with the same execution characteristics, so running-from-your-local-machine is not just the sandbox approach. A fast follow to our first release is to provide adequate customizability for running your own setup on AWS and your feedback is helpful.

    Savin
    @savingoyal

    And once your service is up and responding to the auth endpoint, you can configure metaflow by setting ~/.metaflowconfig/config.json to {"METAFLOW_BATCH_JOB_QUEUE": "...", "METAFLOW_DATASTORE_SYSROOT_S3": "...", "METAFLOW_SERVICE_URL": "...", "METAFLOW_AWS_SANDBOX_INTERNAL_SERVICE_URL": "...", "METAFLOW_BATCH_CONTAINER_REGISTRY": "...", "METAFLOW_ECS_S3_ACCESS_IAM_ROLE": "...", "METAFLOW_AWS_SANDBOX_REGION": "...", "METAFLOW_AWS_SANDBOX_API_KEY": "...", "METAFLOW_AWS_SANDBOX_ENABLED": true}

    @zcking this might serve as a workaround if you want to set the METAFLOW_SERVICE_URL in the config. METAFLOW_AWS_SANDBOX_INTERNAL_SERVICE_URL will point to your private subnet.

    Zachary King
    @zcking
    thank you! the @environment decorator and the METAFLOW_AWS_SANDBOX_INTERNAL_SERVICE_URL seem like what I need; I'll take a look at them next
    actually, I may not be understanding the environment decorator right. What's the point of an environment variable decorator when it's still hard-coded (e.g. in the Python module itself)...? I suppose you could use a Parameter instead, but it just seems like an exploit to do that
    Savin
    @savingoyal
    The environment decorator is step specific - they are only available in the steps that you define them upon. Parameters are available to all the steps and are a flow-level concept.
    You can do - @environment(vars={'a': os.environ.get('a')}) as well to pass on local env vars to batch.
    OrBarda
    @OrBarda
    Hi, does Metaflow being supported with Kubernetes?
    Savin
    @savingoyal
    Not currently. We have feature requests on our GitHub page. Please upvote that.
    OrBarda
    @OrBarda
    Can you share a link to it?
    Savin
    @savingoyal
    OrBarda
    @OrBarda
    Thanks
    Romain
    @romain-intel
    @zcking : A quick note on top of what @savingoyal already said. Depending on your exact need for the metadata service, you can also use local metadata (and not the service metadata). This will bring back all metadata information to your local machine (laptop). You do lose a few things with this (but maybe you are not using them): (a) to inspect things (using the client API), you have to use your laptop (you can't launch a sagemaker notebook and expect it to read the info on your laptop) and (b) you can't use the client API in a step running on batch (basically the client API relies on the metadata information and it won't be able to find it if it is running on batch since that information is only available on your laptop). Does that make sense?
    chuzhe-bot
    @chuzhe-bot
    Hi! While I was checking the documents, I wonder if Metaflow supports scale-out in an arbitrary server cluster, instead of AWS? Is there any link that I should read? Thanks!!!
    Savin
    @savingoyal
    Currently we only support Batch. Please open an issue in GitHub for feature requests.
    Daniel Perez
    @sandman21dan
    Do you have/support/intent-to-adapt an infra Terraform file? as opposed to CloudFormation?
    Daniel Perez
    @sandman21dan
    And also, I've followed this to compose my own infra, I've set everything up, but I'm at a loss at what METAFLOW_SERVICE_URL should be or point to? maybe I do need to set up an LB to my metadata service?
    Savin
    @savingoyal
    @sandman21dan Would you mind opening an issue on GitHub for Terraform support? We don’t have any current plans but it is an interesting thought.
    Daniel Perez
    @sandman21dan
    Done @savingoyal Netflix/metaflow#38
    Savin
    @savingoyal
    Thanks
    @queueburt Do you know where in the console is the service URL visible?
    @sandman21dan Can you in the meanwhile set up an ELB and point the env var for URL to it?
    Savin
    @savingoyal
    Or an NLB. We will spruce up the docs a bit to clarify.
    Daniel Perez
    @sandman21dan
    Couldn't make it work, I gave up trying to set up the infra manually, gone the CF route for now, I'll let you know how I get on
    Daniel Perez
    @sandman21dan
    Using metaflow config aws I've run up to: Please enter the default container image to use:
    Is this meant to be something in the lines of python:3.7 or something around that line?
    Savin
    @savingoyal
    Yes indeed. We will make this optional in the next release. Metaflow doesn’t necessarily need you to specify an image but it doesn’t show up as an optional option in the configuration utility. @seeravikiran FYI.
    Daniel Perez
    @sandman21dan

    Got into a state where:

    Validating your flow...
        The graph looks good!
    Running pylint...
        Pylint is happy!
    S3 datastore operation _put_s3_object failed (Unable to locate credentials). Retrying 7 more times..

    I guess I need local auth to AWS, we make use of local profiles for our different creds environments, any way to tell metaflow which creds profile to use?

    Daniel Perez
    @sandman21dan
    Whohoooo got it running via the AWS_PROFILE and AWS_DEFAULT_REGION environment variables
    Zachary King
    @zcking
    @sandman21dan yeah it uses the default profile; glad you got it working! Not sure what the metaflow contribs think, but I would reckon it'd be nice to configure which profile metaflow uses in the future. For example, a METAFLOW_AWS_PROFILE environment variable in the config or something
    btw, for anyone interested, I'm working on writing the Terraform code to get AWS setup, for those of you that prefer TF over CloudFormation. I'll share a repository later when it's finished :)
    Daniel Perez
    @sandman21dan
    @zcking Indeed, considering most orgs using this would be multi-accounts
    @zcking I'd be massively interested in this
    Daniel Perez
    @sandman21dan

    How do you manage non conda dependencies on AWS?

    Just following the tutorial #6

    This results in a failure remotely No module named 'pandas'

    Does it copy the dependencies from local?

    I'm managing my local dependencies using pipenv, could that be the reason why they might not be packaged to AWS?

    Or should this simply be using conda based deps and there's a bug on this step?

    Ville Tuulos
    @tuulos
    @zcking +1 for Terraform! Let us know when it works :)
    Romain
    @romain-intel
    @sandman21dan: You are correct that it does not package the local dependencies except if you use the @conda decorator. On the sandbox environment we provide, we do package pandas in the image that is running on batch thus it works but if you run in your own environment, you need to either: (a) use an image that has the dependencies you need/want, (b) use os.system('pip install...') in the steps where you need something you don't have (or something equivalent) or (c) use the @conda decorator to replicate your environment. For simplicity and reproducibility, we recommend using (c).
    Zachary King
    @zcking
    what about custom Python packages that do not exist on PyPi etc.? For example, an internal library. How do you recommend tackling those?
    Daniel Perez
    @sandman21dan
    @zcking I'd say you'd have to bake them into a custom image, and have AWS Batch use that image
    @romain-intel I've used conda to add the missing packages from step 2 in the tutorial and ran just fine e2e
    Savin
    @savingoyal
    @zcking how do you currently ship your internal packages?
    It will be amazing if you can link your terraform template to Netflix/metaflow#38
    You can also ship your package to batch if you have a local conda channel containing your internal packages. No metaflow specific config needed :)
    Zachary King
    @zcking
    currently we use S3 to store eggs and tarballs, so pretty basic. I think the local conda channel you mentioned would be suitable for my purposes, thanks!
    Rob Hilton
    @queueburt
    @sandman21dan Did you get sorted with the METAFLOW_SERVICE_URL? It looks like you switched up to using CFN, the URL should be printed next to 'ServiceUrl'. Just to close the loop on the self-provisioned infra, the service URL becomes the public IP of the ECS service you create on Port 8080, e.g. 'http://55.55.55.55:8080'