by

Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
    Vishal Bollu
    @vishalbollu

    No, you don't need kubectl, the cortex cluster info --debug command should have run all of the necessary kubernetes commands.

    cortex-debug/k8s/pod contains the description for each pod that was running on your cluster at the time you ran the cortex cluster info --debug

    I would find the pod(s) for your API. You can filter for your pods in the list by searching for apiName=iris-classifier in the labels section for the pod. Once you find the your API, can you share the events section for that pod?

    Additionally, you can find the resource utilization for the pod in cortex-debug/k8s/pods.metrics. You can find the metrics for your API pod by search the file for your API name.

    noisyneuron
    @noisyneuron
    hey! has anybody deployed gpt-2 using the gpt-2-simple python package? I'm looking for help with this. Not that great with python, so hoping that I can use this, similar to how the author has used it in gpt-2-cloud-run .. links: https://github.com/minimaxir/gpt-2-simple & https://github.com/minimaxir/gpt-2-cloud-run
    Robert Lucian Chiriac
    @RobertLucian

    @noisyneuron so judging from what I see with those 2 projects, this is what I can tell:

    1. https://github.com/minimaxir/gpt-2-simple is for generating/fine-tuning a GPT-2 model. This is supposed to be done separately on a dev machine. The resulting model should then be loaded in a Cortex API - by using any of the available predictors. Since this would result in a SavedModel model, you'd just probably go with the TensorFlow predictor.

    2. https://github.com/minimaxir/gpt-2-cloud-run appears to a project intended for GCP's Cloud Run - which I think it's sort of an AWS Lambda. Either way, this would be totally incompatible with Cortex and redundant anyway.

    In conclusion, the only thing you need is the fine-tuned model in step 1. We've already got an example using GPT-2. Check it out here.

    noisyneuron
    @noisyneuron
    @RobertLucian thank you for the helpful response! I have a fine-tuned model already -- I wanted to use gpt-2-simple since it is in easier interface for me to understand, as I don't know much about the internals of tensorflow. How would I use the example? I see you have a tutorial here -- not sure how to plug my model into this?
    David Eliahu
    @deliahu

    @noisyneuron Cortex's Python Predictor interface is pretty flexible, check out the docs here: https://docs.cortex.dev/deployments/predictors#python-predictor

    In summary, you can download / initialize your model in __init__() (storing any state, like the loaded model itself, in self), and then generate and return your prediction in predict(). Do you have a script that you use to generate predictions locally? If so, you should be able to fit it into the pattern described above.

    Vaclav Kosar
    @vackosar
    @vishalbollu thanks. It wouldn't be bad to have an "debug" section in the documentation, in case anyone else needs it.
    Vaclav Kosar
    @vackosar
    @deliahu when I configure cluster to "instance_volume_size: 250" does that mean each instance will have ephemeral storage of 250GB? That space is available where on the path? Is that space available in work directory of execution or mounted somewhere in "/mount/volumeX"?
    Vishal Bollu
    @vishalbollu
    @vackosar Yes,instance_volume_size: 250 means that each instance will have 250 GB storage. That space is available in the working directory of execution.
    Vishal Bollu
    @vishalbollu
    Thanks for the recommendation. We can add a debugging guide to display various general-purpose debugging strategies.
    sm-hossein
    @sm-hossein
    Hi everyone,
    We have some machine learning services and we want to migrate our services to the cortex, but our communications are based on grpc as I mentioned before. We run our services in the local mode only (not AWS) and we want to contribute to developing grpc in local mode (we have no access to AWS services). Can anyone help us to get started with it?
    Vishal Bollu
    @vishalbollu

    @sm-hossein Cortex currently uses FastAPI/Uvicorn to respond to HTTP requests. Some of the Cortex API configuration, metrics, monitoring and, autoscaling functionality assume that FastAPI/Uvicorn is being used.

    I don't believe adding gRPC support is on the immediate roadmap. However, it may be possible to get gRPC working for your use case.

    It looks like you have two options:

    1. Deploy a gRPC container that accepts gRPC requests and makes HTTP requests to Cortex APIs. This approach will add an extra hop in traffic and adds organizational complexity. We would love to take a look at this code if gRPC becomes a part of the roadmap.

    2. You can build a new Docker container from scratch that uses gRPC. You can configure predictor.image in your api configuration to use your Docker image. You can look at this code to see how your api configuration is used to deploy Docker containers. If you take this approach, you will have to do additional work to get some of the Cortex features such as metrics (request count and avg latency) working.

    sm-hossein
    @sm-hossein
    @vishalbollu Thank you for your comprehensive explanation! We'll try second approach and share the result with you.
    Christopher Shelley
    @basiclaser
    hi i updated cortex a while back to use the local cortex feature. i see that on AWS some old cortex instances are running but they're not listed in my cortex get - how can i remove all EC2 instances without them restarting?
    Vishal Bollu
    @vishalbollu
    @basiclaser you can try cortex cluster down to take down the cluster on AWS. You will be prompted for your cluster name and region. If you forgot your cluster name, you can find it by looking at the value of the tag alpha.eksctl.io/cluster-name associated with any of the the cortex cluster's EC2 instances.
    Christopher Shelley
    @basiclaser
    ayyyy @vishalbollu thanks that got it - i accidentally ran up a $550 bill :P
    David Eliahu
    @deliahu
    @basiclaser I'm sorry to hear that. You might want to create a support ticket with AWS and explain your situation, there is a chance they will reimburse you
    alexdiment
    @alexdiment
    Hi, is there a way of having not two but three different containers running for a tf predictor (one for tf serving, one for the predictor, and one with a custom preprocessing tool). I'm struggling combining the latter two into one Dockerfile without a conflict, would be marvelous I could just run the pre-processing one separately and communicate with it through a pre-defined port.
    Christopher Shelley
    @basiclaser
    oh i double checked this morning and the EC2 are still running :'( not sure how to stop this at all
    Robert Lucian Chiriac
    @RobertLucian

    oh i double checked this morning and the EC2 are still running :'( not sure how to stop this at all

    You will have to go to CloudFormation and delete the stack(s) for the associated region. Once triggered, that will take some time (10-15 minutes), so you'll have to keep an eye on it. Did you already try cortex cluster down as @vishalbollu has suggested?

    @alexdiment you can do the preprocessing in the predict method upon receiving the payload. Then, once the preprocessing is done, the result is handed off to client.predict which does the actual inference. Why not have the pre-processing done in the predict method? Could you tell us more what you're trying to achieve?
    alexdiment
    @alexdiment
    The problem is that the pre-processing stage is not just a python function, but a complex tool with dependencies (kaldi ASR) which I have troubles to compile and install in dependencies.sh or in the modified predictor's Dockerfile. It has, however, a docker implementation (lowerquality/gentle), which runs easily. It would be nice if I could have that container running as such, and in my predict method I would just send requests to it.
    Robert Lucian Chiriac
    @RobertLucian

    @alexdiment I hear you. I see, it looks like the local install can only run on MacOS, whereas the container can be run on anything. I wonder if giving access to the docker socket would be a good idea for you. Like, make it such that you can start containers within the serving container with this Python SDK: https://docker-py.readthedocs.io/en/stable/.

    You'd start the container in the constructor and then interact with it like you'd usually do in the predict method.

    What do you think?

    alexdiment
    @alexdiment
    Ok, didn't realise I can start a container from the inside. Thanks, I'll give it a try!
    Robert Lucian Chiriac
    @RobertLucian
    @alexdiment I think I might have not been clear. At the moment, this is not possible because the docker socket isn't bind-mounted. But it could be if we were to configure it this way. We're exploring this possibility.
    alexdiment
    @alexdiment
    I see! I appreciate your efforts.
    Robert Lucian Chiriac
    @RobertLucian

    @alexdiment While we think of ways to supporting the use of your own container with Cortex, it looks like it may be possible to create a Cortex compatible image based on https://hub.docker.com/r/lowerquality/gentle/dockerfile.

    Looking at their Dockerfile, https://hub.docker.com/r/lowerquality/gentle/dockerfile, we see that they're compiling a bunch of stuff and then they use the python3 executable to run the server.

    On the opposite, we've got the Cortex Dockerfile, https://github.com/cortexlabs/cortex/blob/master/images/python-predictor-cpu/Dockerfile, which is built from the same base image (ubuntu:18.04) and which uses a conda-installed Python runtime with conda-provisioned packages.

    In order for this to work, we'd change the base image of the Cortex Dockerfile from ubuntu:18.04 to lowerquality/gentle and then in dependencies.sh, we'd run the server in the background using the system-wide Python3 runtime and not the conda-provisioned one, which is used by default. You will probably have to look at the state of environment variables (pertaining to the Python runtime) before and after creating the customized version of the serving image - this will probably be required for putting in the right variables when running the Python server. Here's what I'm thinking of:

    # dependencies.sh
    
    cd /gentle && PYTHONPATH=/my/dir /usr/bin/python3 server.py &
    Laksh1997
    @Laksh1997
    Hi all
    long time no see
    I was wondering how we could log input and output to cloudwatch
    ?debug=true doesn't seem to work now
    Vishal Bollu
    @vishalbollu
    @Laksh1997 We recently removed the automatic logging of input and output via the debug query parameter. The easiest way to log input and output to cloudwatch is by adding print statements to your Predictor implementation.
    Laksh1997
    @Laksh1997
    I see
    what about just log statements?
    like logger.debug
    Robert Lucian Chiriac
    @RobertLucian
    @Laksh1997 yeah, that should work. I haven't personally tried it, but it should, just like prints already work. Let me know if you get into any issue.
    David Eliahu
    @deliahu
    @alexdiment We have a GitHub issue which is related to your request: cortexlabs/cortex#930. Actually your use case is more general-purpose than the one we originally envisioned when we created that issue, and I think it would be great to support it (and would actually make the API cleaner), so I have updated the issue and the proposed API to reflect your suggestion!
    Juan Zamora
    @autonomo_cr_twitter
    hi, can the cortex infrastructure be used with a custom cluster? I have stuff on a inap cluster and wondering about how to use this if I am not using AWS
    David Eliahu
    @deliahu
    @autonomo_cr_twitter At this time, it is only possible to run Cortex on AWS or locally. You can run "locally" on any cloud instance, however local Cortex does not support many of the production features such as autoscaling, rolling updates, etc.
    alexdiment
    @alexdiment
    @RobertLucian it worked, amazing! Thank you so much, I've spent like a week trying to combine these images.
    @deliahu This is great to hear!
    Robert Lucian Chiriac
    @RobertLucian
    @alexdiment this is really great! Happy to hear it works for you! :)
    ishan shahzad
    @ishanShahzad
    hi everyone, i need to ask
    how can i use aws simple queue service with cortex?
    David Eliahu
    @deliahu
    @ishanShahzad I'd be happy to advise, do you mind describing your use case a little more? What problem are you attempting to solve, and how are you hoping Cortex will fit in?
    ishan shahzad
    @ishanShahzad
    suppose i provided max_instances 5. then i as API requests grows the instances grows maximum 5. what will happen after 5 instances if API requests are increasing ? i need to put all the requests in the queue . i dont want to increase my max_instance limit?
    i hope u understand that
    @deliahu
    ishan shahzad
    @ishanShahzad
    @deliahu u there?
    David Eliahu
    @deliahu

    @ishanShahzad Sorry for the delay, I was in a meeting which just ended.

    The situation you described will be automatically handled by Cortex, so it should not be necessary to use SQS on top of Cortex for your use case. There is a queue within Cortex, so if the instances are at the max_instances limit, requests will be queued while waiting for previous requests to complete. Eventually if the queue grows too large, requests will be responded to with HTTP error code 503 (the queue size is configurable via max_replica_concurrency which is described here.

    ishan shahzad
    @ishanShahzad
    @deliahu thank you so much. let me check and get back to you
    David Eliahu
    @deliahu
    :+1: