by

Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
    saw
    @raijinspecial
    i think that is most likely, let me double check that im not messing with that
    ENTRYPOINT ["/src/cortex/serve/run.sh"]
    this should be at the bottom of the dockerfile?
    Robert Lucian Chiriac
    @RobertLucian
    @raijinspecial no need to add the ENTRYPOINT instruction at the end of the Dockerfile - if left unspecified, it will inherit the base image's one. But yeah, that's the instruction all Cortex images have in their Dockerfiles. So something else is wrong.
    saw
    @raijinspecial
    what is the correct pythonpath?
    i will try ENV PYTHONPATH="${PYTHONPATH}:/src"
    Robert Lucian Chiriac
    @RobertLucian
    @raijinspecial the PYTHONPATH can vary depending on whether python_path field is set in the cortex.yaml config. But at the very least, it should have /src and /mnt/project. In /src you have the cortex package and in /mnt/project you have the project's files (aka the root directory of the Cortex project as the user sees it).
    saw
    @raijinspecial
    awesome, I will try that, thank you so much for your patience
    Teimor Epstein
    @teimor_gitlab
    Hi, Do you have any documentation about best practice and correct infrastructure for Cortex when working on multiply models with a large team? or how you manage Coretx on your organization/team?
    David Eliahu
    @deliahu

    @teimor_gitlab Thanks for reaching out. The most straightforward way is to have a single Cortex cluster shared by the entire team. Only one person would need to be responsible for managing the cluster (e.g. creating it with cortex cluster up, etc), and everyone else can connect their cortex CLI to the cluster by running cortex env configure. All users of the CLI (other than the cluster manager) do not need any special permissions attached to their AWS credentials. See here for how to connect your CLI to an existing cluster, and here for a discussion of IAM permissions.

    Does that address your question? Feel free to follow up here if you have additional questions, or I'd be happy to jump on a call too if you think that would be helpful

    Malik Mujtaba
    @M-M-Mujtaba
    Hello, I am a student trying to deploy GPT2-xl at scale for my FYP. We have been successful in basic deployment but we are looking to optimize it further. If anyone with this experience can provide insight into optimization with Cortex that would be helpful.
    Teimor Epstein
    @teimor_gitlab
    @deliahu Thanks for the information & support
    Robert Lucian Chiriac
    @RobertLucian
    @M-M-Mujtaba I think you might have a couple of options here:
    1. Use the ONNX runtime in conjunction with the HuggingFace transformers (i.e. GPT-2). With the ONNX runtime, you can achieve higher levels of performance. More on that here.
    2. Depending on your target, whether it is to achieve higher throughput (in case there are lots of concurrent clients) or you want lower the inference latency or you just want to cut down the base costs as much as possible while still having the service up and running, you could - go with a cost-efficient GPU such as g4dn.xlarge for throughput, or with a cheap CPU instance for cutting down costs at the expense of latency.
    3. One other good option may be using AWS Inferentia instances (inf1.xlarge or inf1.2xlarge). With these instances, you can achieve ~4 times the throughput (as tested with ResNet-50) when compared to the most cost-efficient alternative, which would be the g4dn.xlarge instance, at the same price. The only thing you have to research is whether you can actually make a GPT-2 to work with Inferentia ASICs at this point in time - their drivers are still in their early days, so not all model architectures are supported yet. Since GPT-2 is just a transformer, maybe this tutorial from theirs is relevant. As for our documentation on using AWS Inferentia ASICs, check this out.
    4. Yet another option is if you go with TensorFlow. In that case, you can look at this guide here explaining how you can optimize TF models.
    Malik Mujtaba
    @M-M-Mujtaba
    @RobertLucian Thank you, this answer is quite insightful. We will experiment with this information and ask further questions if required.
    Phan Nguyên Bảo
    @pnbao

    Hi all,
    I got this error while deploying API using cortex v0.18.1

    Traceback (most recent call last):
      File "/src/cortex/serve/start_uvicorn.py", line 28, in <module>
        workers=int(os.environ["CORTEX_WORKERS_PER_REPLICA"]),
      File "/opt/conda/envs/env/lib/python3.6/os.py", line 669, in __getitem__
        raise KeyError(key) from None
    KeyError: 'CORTEX_WORKERS_PER_REPLICA'

    my cortex.yaml like this:

    - name: my-api
      predictor:
        type: python
        path: predictor.py
        config:
          device: cuda
          bucket: my-bucket
          model_key: model/my-model.pkl
        image: me/cortex-base:0.17.1
      compute:
        cpu: 1
        gpu: 1
      autoscaling:
        min_replicas: 1
        max_replicas: 10
        max_replica_concurrency: 4
        upscale_stabilization_period: '0s'

    Am I missing anything ?

    Vishal Bollu
    @vishalbollu

    @pnbao It appears to be a version mismatch. CORTEX_WORKERS_PER_REPLICA was renamed to CORTEX_PROCESSES_PER_REPLICA in v0.18. The v0.18.1 Cortex operator set the environment variable CORTEX_PROCESSES_PER_REPLICA but your image (which I assume was built on cortex 0.17.1 base images) was expecting the value to be stored in CORTEX_WORKERS_PER_REPLICA.

    Can you try rebuilding your image me/cortex-base:0.17.1 using the Cortex images as base 0.18.1 and let us know if the issue still persists?

    Phan Nguyên Bảo
    @pnbao
    Hi @vishalbollu, thanks for your suggestion. I have success deploy API by rebuilding my image using the cortex base 0.18.1.
    Vishal Bollu
    @vishalbollu
    @pnbao That is great to hear. There is a version check but it occurs a little too late in v0.17. In v0.18 it was updated to check the version much earlier so you should run into version mismatch errors before running into errors due to discrepancies. Hopefully, that should make debugging these kinds of errors a little easier.
    Abhijith Nair
    @abhipn
    Hello, I am trying to deploy multiple models and get inference from each model. Is it possible to do it via cortex? I haven't come across any documentation about deploying multiple models. Also, does cortex support GCP?
    Robert Lucian Chiriac
    @RobertLucian

    @abhipn by multiple models, I assume you're referring to the idea of deploying numerous models within a single API. For that, the answer is yes, as of version 0.18, Cortex supports deploying multiple models per API endpoint. We have a guide to show you how that's done for all 3 types of predictors (PythonPredictor, TensorFlowPredictor, ONNXPredictor). This is the guide: https://docs.cortex.dev/guides/multi-model.

    Or, if you don't want to deploy multiple models per API endpoint, you could just have multiple APIs, each with its model. This may be better if the pre-processing is very different across different models. You'd generally go with the multi-model API endpoint when the models you are serving are interchangeable and don't require any code change to the predictor's implementation.

    Also, we are currently working on a way to deploy a large number of models by employing a dynamic caching mechanism for models. This will be available soon. Here's the ticket that tracks it: cortexlabs/cortex#619.

    As for GCP support, Cortex doesn't support it, but it is on our roadmap to support it eventually. Here's the ticket that tracks it: cortexlabs/cortex#114. Could you tell us more about the reasons for wanting GCP support?

    Abhijith Nair
    @abhipn
    @RobertLucian Thank you. I'll checkout the guides. The main reason for wanting GCP is we have been using it for all our machine learning projects.
    Saketh Saxena
    @sakethsaxena

    hey!

    so i'm trying to run cortex(0.18.1) locally using cortex-deploy, I have a custom docker image built from cortexlabs/python-predictor-cpu-slim:0.18.1

    when I deploy the api locally and get the api status it shows as error and the logs gives : __init__: Unable to locate credentials: runtime exception

    I believe this is due to trying to download my model from s3 inside the __init__ method for PythonPredictor .. not sure if I came across something in the docs where I can specify aws credentials because I believe that is the problem.

    Saketh Saxena
    @sakethsaxena

    hey!

    so i'm trying to run cortex(0.18.1) locally using cortex-deploy, I have a custom docker image built from cortexlabs/python-predictor-cpu-slim:0.18.1

    when I deploy the api locally and get the api status it shows as error and the logs gives : __init__: Unable to locate credentials: runtime exception

    I believe this is due to trying to download my model from s3 inside the __init__ method for PythonPredictor .. not sure if I came across something in the docs where I can specify aws credentials because I believe that is the problem.

    My bad, I was using expired access keys in the environment.

    Vishal Bollu
    @vishalbollu

    @sakethsaxena Glad to hear that you got it working. Hmm... that is a fairly unintuitive error message.

    FYI there are few ways to pass in AWS credentials to your Predictor

    1. You can write your AWS credentials environment variables to a .env file that is in the same folder as your cortex.yaml. The .env file should look like:
      AWS_ACCESS_KEY_ID=value
      AWS_SECRET_ACCESS_KEY=value
    2. You can pass in the credentials in the yaml via predictor.env (albeit a little dangerous to write the credentials in the cortex.yaml file)

    And if you are deploying locally cortex configure --env local will prompt for you for aws credentials. These credentials will be loaded into your Predictor.

    Saketh Saxena
    @sakethsaxena
    @vishalbollu thanks! I used cortex configure --env local
    Mitchell Scott
    @beam_me_up_Scotty_gitlab

    Hey everyone! Just discovered cortex and I'm super excited to see if I can use it for the service I'm developing.

    I'm creating a service that has one or many large pytorch NLP models per user. These models would be used quite infrequently (once a month to a few times per day), so a small delay for inferencing is not the end of the world, but hosting all those models with dedicated APIs while they're not being used would result in a pretty astronomical AWS bill.

    With cortex, is it possible to auto-scale models to 0 if they haven't been used within the past hour? I imagine this could be done with smart use of cortex deploy and cortex delete maybe?

    Is this the kind of thing that would be enabled with ticket #619 (https://github.com/cortexlabs/cortex/issues/619)?

    Phan Nguyên Bảo
    @pnbao
    Hi all,
    I am trying to deploy a large model that requires high computation (e.g. 16GB GPU), currently, I am testing with P3.2xlarge instance but the price is too high when it scaling to handle multiple requests (even with using spot instances).
    Not so relevant, but can I convert the model to be used in the Inferentia instance for better inference?
    balakrishna222111
    @balakrishna222111
    Hello Team ,
    is anyone deployed cortex in docker ?
    if yes please tell me process
    Robert Lucian Chiriac
    @RobertLucian

    @beam_me_up_Scotty_gitlab we are really happy to see our users excited about using Cortex.

    You are correct to suggest that cortexlabs/cortex#619 is the ticket that would address this. How would it work? You'd be able to specify a dir S3 path to your cortex.yaml config within which a bunch of models can be found (can have 1 model, 2 models, or even 10k models in there).

    Alongside that, you'll also specify a cache size cache_size value that will control the number of models the API can hold in its memory at any point in time - say you can have 10k models in that S3 path and the cache size could be set to something like 15. Whenever a model is required for a new request that's not in memory yet, another model will get evicted and the new one will take place - this is done based on an LRU policy. This is totally transparent to the user.

    The thing to keep in mind is that at minimum, there will be at least one API replica running. The good thing about this, as you have pointed out correctly, is that you'd no longer have to have one API per model, which would lead to astronomical AWS bills.

    We are currently working on this feature. How urgent is this feature for you?

    7 replies
    Robert Lucian Chiriac
    @RobertLucian

    @pnbao could you tell us more about the model you are trying to deploy? Specifics about it. Knowing more about it might give us the chance to offer more accurate suggestions. When it's about Inf, there are certain limitations that should be considered. Also, what kind of traffic are you expecting to get? Volumes?

    Technically, it should be possible to convert the model to be used with Inferentia and you should be getting a better bang for the buck with it. Keep in mind that there's a bug cortexlabs/cortex#1123 with Inferentia that only allows 1 single Inf chip per API replica - this should still be okay since it comes with plenty of cache memory per Inf chip.

    We have a small walkthrough for Inferentia with Cortex here: https://docs.cortex.dev/deployments/inferentia

    Robert Lucian Chiriac
    @RobertLucian
    @balakrishna222111 if by Docker you mean deploying an API locally using your local Docker client/runtime, then the answer is yes, you can do this. The only limitation in doing this is that your API won't be able to scale and it won't benefit from Cortex's built-in monitoring tool. We have two relevant reads for you here:
    7 replies
    sp-davidpichler
    @sp-davidpichler
    What's the best way to add additional capacity to an existing cortex cluster?
    Robert Lucian Chiriac
    @RobertLucian
    @sp-davidpichler are you referring to configuring the instances' disk space sizes on the fly on a running cluster?
    sp-davidpichler
    @sp-davidpichler
    What I want to do is increase the maximum number of instances
    OK I just found https://docs.cortex.dev/troubleshooting/stuck-updating so I'll follow the instructions in here.
    Robert Lucian Chiriac
    @RobertLucian
    @sp-davidpichler in that case, you open your cluster.yaml config, increase max_instances to your desired size, and then you do cortex cluster configure --config cluster.yaml. Hitting enter on that will probably take a few minutes to update. Here's this CLI command described here:
    https://docs.cortex.dev/miscellaneous/cli#cluster-configure

    OK I just found https://docs.cortex.dev/troubleshooting/stuck-updating so I'll follow the instructions in here.

    Yep, you are right.

    sp-davidpichler
    @sp-davidpichler
    Is there any way to generate the config file for a given cortex cluster?
    Robert Lucian Chiriac
    @RobertLucian

    @sp-davidpichler you can run the cortex cluster info command. If you don't provide a config file, then the prompt will show up and you'll have to fill in the cluster's name and region, and in return, it will output the cluster's configuration to stdout. You can expect the output to be like this:

    aws access key id:                             ********************
    cluster version:                               master
    cluster name:                                  <my-cortex-cluster-name>
    aws region:                                    eu-central-1
    availability zones:                            [eu-central-1a, eu-central-1c, eu-central-1b]
    s3 bucket:                                     <my-bucket>
    instance type:                                 t3.medium
    min instances:                                 0
    max instances:                                 1
    tags:                                          {"cortex.dev/cluster-name": "<my-cortex-cluster-name>"}
    instance volume size (Gi):                     50
    instance volume type:                          gp2
    instance volume iops:                          <null>
    use spot instances:                            yes
    spot instance distribution:                    [t3.medium]
    spot on demand base capacity:                  0
    spot on demand percentage above base capacity: 0
    spot max price ($ per hour):                   0.048
    spot instance pools:                           1
    on demand backup:                              yes
    cloudwatch log group:                          <my-log-group-name>
    subnet visibility:                             public
    nat gateway:                                   none
    api load balancer scheme:                      internet-facing
    operator load balancer scheme:                 internet-facing
    telemetry:                                     true
    operator image:                                xxx.dkr.ecr.eu-central-1.amazonaws.com/cortexlabs/operator:latest
    manager image:                                 xxx.dkr.ecr.eu-central-1.amazonaws.com/cortexlabs/manager:latest
    downloader image:                              xxx.dkr.ecr.eu-central-1.amazonaws.com/cortexlabs/downloader:latest
    request monitor image:                         xxx.dkr.ecr.eu-central-1.amazonaws.com/cortexlabs/request-monitor:latest
    cluster autoscaler image:                      xxx.dkr.ecr.eu-central-1.amazonaws.com/cortexlabs/cluster-autoscaler:latest
    metrics server image:                          xxx.dkr.ecr.eu-central-1.amazonaws.com/cortexlabs/metrics-server:latest
    inferentia image:                              xxx.dkr.ecr.eu-central-1.amazonaws.com/cortexlabs/inferentia:latest
    neuron rtd image:                              xxx.dkr.ecr.eu-central-1.amazonaws.com/cortexlabs/neuron-rtd:latest
    nvidia image:                                  xxx.dkr.ecr.eu-central-1.amazonaws.com/cortexlabs/nvidia:latest
    fluentd image:                                 xxx.dkr.ecr.eu-central-1.amazonaws.com/cortexlabs/fluentd:latest
    statsd image:                                  xxx.dkr.ecr.eu-central-1.amazonaws.com/cortexlabs/statsd:latest
    istio proxy image:                             xxx.dkr.ecr.eu-central-1.amazonaws.com/cortexlabs/istio-proxy:latest
    istio pilot image:                             xxx.dkr.ecr.eu-central-1.amazonaws.com/cortexlabs/istio-pilot:latest
    istio citadel image:                           xxx.dkr.ecr.eu-central-1.amazonaws.com/cortexlabs/istio-citadel:latest
    istio galley image:                            xxx.dkr.ecr.eu-central-1.amazonaws.com/cortexlabs/istio-galley:latest

    For the moment, the CLI doesn't support outputting this to a specialized file (such as json or yaml), but we do have a ticket for addressing this cortexlabs/cortex#1161. It's on our radar.

    sp-davidpichler
    @sp-davidpichler
    Got it, thank you Robert
    Robert Lucian Chiriac
    @RobertLucian
    Anytime! Let us know if you've got any other questions down the road.
    Saketh Saxena
    @sakethsaxena
    Hi,
    I'm trying to deploy an api on a p2.xlarge instance using a custom docker image for the api built from cortexlabs/python-predictor-gpu-slim:0.18.1 and when I deploy it, I get the following error:
    __init__: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location=torch.device('cpu') to map your storages to the CPU.: runtime exception
    seems like the docker container is not using/mounting the instance's GPU .. ?
    Robert Lucian Chiriac
    @RobertLucian
    @sakethsaxena the GPU should be visible to the docker container, what I'm more concerned about is the CUDA driver. Could you show us your Docker image?
    Saketh Saxena
    @sakethsaxena
    @RobertLucian
    This is the docker file I use
    # Dockerfile
    FROM cortexlabs/python-predictor-gpu-slim:0.18.1
    
    COPY custom_py_package custom_py_package
    
    # install pip dependencies
    RUN pip install --no-cache-dir pandas \
        && conda install -y conda-forge::rdkit \
        && pip install --no-cache-dir -r custom_pypackage/requirements.txt \
        && pip install --no-cache-dir ./custom_pypackage \
        && conda clean -a
    Robert Lucian Chiriac
    @RobertLucian
    @elliotzeitgold we are addressing cortexlabs/cortex#1218 in PR cortexlabs/cortex#1218. Thanks again for reporting this!
    1 reply
    @sakethsaxena thanks for showing us the Dockerfile. Looks okay to me. Could you show us the cortex.yaml too and the __init__ constructor of your predictor? We'd want to see if we can reproduce it.
    David Eliahu
    @deliahu

    @sakethsaxena @RobertLucian another possibility is that the cpu-only version of PyTorch is listed in custom_pypackage/requirements.txt. For example, I believe that if just pytorch is listed, then it will not install the GPU version. Since Cortex used cuda 10.1, it should work if you list torch==1.6.0+cu101 and torchvision==0.7.0+cu101 and update the command to pip install --no-cache-dir -f https://download.pytorch.org/whl/torch_stable.html -r custom_pypackage/requirements.txt. See https://pytorch.org/ for full installation instructions.

    Alternatively, it should be possible to in-line it in your Dockerfile with pip install torch==1.6.0+cu101 torchvision==0.7.0+cu101 -f https://download.pytorch.org/whl/torch_stable.html (in which case I believe you could leave the simple pytorch listed in requirements.txt and the requirement should already be satisfied), or conda install -y pytorch torchvision cudatoolkit=10.1 -c pytorch.

    Let us know if this was the issue; if not, we'll be happy to try to reproduce it on our end!

    8 replies
    balakrishna222111
    @balakrishna222111
    Hello @RobertLucian and Team ,
    any one deployed cortex cluster
    then please tell me the process or Share the documentation
    I don't want to deploy GPU instance in AWS
    I only want to deploy CPU instance in AWS
    Robert Lucian Chiriac
    @RobertLucian

    @balakrishna222111 in case you want to go with the aws provider and not the local provider, then you only have to configure your cluster.yaml to use CPU instances. In your cluster.yaml, you'd have to set instance_type field to a CPU-only instance type, like t3.medium for instance - can be any other instance though. More on that can be found here: https://docs.cortex.dev/cluster-management/config

    Does this answer your question?

    17 replies