Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
    David Eliahu
    @deliahu
    Ok so it sounds like a memory issue then
    Laksh1997
    @Laksh1997
    hmmm
    yeah and also when I do like 10,000 x 20 it takes much much longer to return
    David Eliahu
    @deliahu
    do you have to use json? there may be other serialization methods that don't use as much memory. Alternatively, you can try with more memory
    Laksh1997
    @Laksh1997
    well I return a df by df.to_dict()
    I don't know what else I could use
    David Eliahu
    @deliahu
    is your client that is calling the API written in Python?
    Laksh1997
    @Laksh1997
    yes
    uses requests
    David Eliahu
    @deliahu
    I can't vouch for this article since I only found it just now by Googling, but I would try some of these listed here:
    it's a pretty old article so there might be better solutions out there now, but this is a good starting point
    Laksh1997
    @Laksh1997
    thanks will look into it!|
    David Eliahu
    @deliahu
    You'll want to change the response type of your API to be bytes
    Laksh1997
    @Laksh1997
    Okay thanks!
    as in change the configuration file?
    David Eliahu
    @deliahu
    no changes need to be made to the configuration file, you can just return the "bytes" object
    it should set the appropriate headers on the response, but if you need more control over the response type header, you can return the Starlette object as shown in one of the examples I linked
    Laksh1997
    @Laksh1997
    What do you mean?
    I'm going to try messagepack
    David Eliahu
    @deliahu
    You should be able to return the msgpack-serialized object from your predict(), and then read the body (as bytes) using the requests library. I don't think you'll have to make any changes to the headers.
    Laksh1997
    @Laksh1997
    @deliahu cuda works now thanks!
    so requests.post(**args).body ?
    Laksh1997
    @Laksh1997
    @deliahu The request seems to work but I can't seem to read it on the client side
    In Predictor I'm returning
            context = pyarrow.default_serialization_context()
            df_bytestring = context.serialize(df).to_buffer().to_pybytes()
            return df_bytestring
    On the client side, I do:
    context = pyarrow.default_serialization_context()
    res = requests.post(**inp)
    context.deserialize(res.content)
    And I get: OSError: buffer_index out of range.
    However when I do the serialization and deserialization all on the client side (ie in a python notebook) it works fine
    Laksh1997
    @Laksh1997
    @deliahu when I upgrade to the latest pandas and pyarrow on client side it works
    David Eliahu
    @deliahu
    @Laksh1997 sounds good, so is everything now working as you expect?
    Laksh1997
    @Laksh1997
    Yep - everything works!
    Cheers
    David Eliahu
    @deliahu
    :+1:
    Laksh1997
    @Laksh1997
    It's also much better and handles the 10000 x 2000 df easily
    cheers for that!
    David Eliahu
    @deliahu
    awesome, glad to hear it!
    Antuan Vazquez
    @antuanvazquez_gitlab

    Hello, we were told here that any user in an AWS IAM user with AdministratorAccess could manage a cluster (inluding taking it down) regardless of what user deployed the cluster. But this doesn't seem to be the case.

    SEE THE FOLLOWING ERRORS:

    sudo cortex cluster info --config cluster.yaml

    cloudformation stack name status
    eksctl-cortex-alita-cluster CREATE_COMPLETE
    eksctl-cortex-alita-nodegroup-ng-cortex-operator CREATE_COMPLETE
    eksctl-cortex-alita-nodegroup-ng-cortex-worker-on-demand CREATE_COMPLETE

    syncing cluster configuration ...

    error: You must be logged in to the server (Unauthorized)


    sudo cortex cluster down --config cluster.yaml

    your cluster named "cortex-alita" in us-east-1 will be spun down and all apis will be deleted, are you sure you want to continue? (y/n): y

    ○ deleting api gateway ✓
    ○ deleting dashboard ✓
    ○ spinning down the cluster ...
    error: You must be logged in to the server (Unauthorized)

    [ℹ] eksctl version 0.19.0
    [ℹ] using region us-east-1
    [ℹ] deleting EKS cluster "cortex-alita"
    [ℹ] deleted 0 Fargate profile(s)
    [✔] kubeconfig has been updated
    [ℹ] cleaning up LoadBalancer services
    Error: cannot list Kubernetes Services: Unauthorized

    David Eliahu
    @deliahu
    @antuanvazquez_gitlab this is a known issue (cortexlabs/cortex#1316), and we are planning to look into it before our next release. In the meantime, is it a possibility to use the same credentials you used to create the cluster? If not, there may be an aws and/or kubectl command that can be run to grant access to the other IAM user, but we haven't looked into that yet (we can take a look today if that's a blocker for you, or feel free to let us know if you know how to grant access to the eks cluster)
    Antuan Vazquez
    @antuanvazquez_gitlab

    @deliahu We can get by for now by using a shared user. (Not ideal, but it will do) Another issue that we would love for you to look into is the following:

    We have Service Control Policies that prevent the use of the RunInstace api call unless the instance has certain tags that we use for cost tracking. The current method that you have for adding tags to Cortex deployed resources does NOT add the tags to the eksctl-cortex-nodegroup-ng-cortex-worker-on-demand launch template. It would be nice if the tags we specify in the cluster.yaml file get propagated to this templeate and proably every resource that generates a charge by AWS so that we can properly account for cost. For now I will try to manually add those tags to the launch template after the cluster is deployed so that we can reactivate our SCPs. Also I did find a bug with the current tagging mechanism. It appears as if any of the values from any key-value pair has a sapce, cortex manages to apply the tags sccessfuly to some resources (we veryfied that it added the tag to an S3 bucket correctly. But the cortex cluster deploymet errors out claiming that the tags are not formatted correctly. I tried every possibility of YAML or JSON syntax possible, nothing worked.

    David Eliahu
    @deliahu
    @antuanvazquez_gitlab we will definitely look into these issues before our next release; I'll keep you posted!
    thanks for bringing this to our attention
    we actually just added validations to the tags to prevent users from requesting "illegal" values (cortexlabs/cortex#1355). But we only enforced compatibility with AWS in the validation, and AWS supports spaces, so we'll look into what needs to be done to support it with cortex
    regarding the tag not propagating the the EC2 instances, we'll check that out in the next day or two and keep you posted
    1 reply
    David Eliahu
    @deliahu

    @antuanvazquez_gitlab I just confirmed that in our next release (v0.20), the tags specified in your cluster configuration file will propagate to all EC2 instances that Cortex creates (this was addressed in cortexlabs/cortex#1345). We are hoping to release v0.20 in the next week or two. If you need this to work before then, let me know and we should be able to build a custom image for you.

    I have not yet explored using tags that contain spaces, but I'm hoping to do that before our v0.20 release as well, and will keep you posted.

    Anthony Becker
    @becker929

    Hi Cortex team. Thanks for creating this awesome tool. I have a question for you.

    We are working with Cortex and finding that our API is not autoscaling well. We're thinking that this could be because our particular application is utilizing more GPU than is standard.

    We want to continue using Cortex because it's so wonderfully streamlined, so we are thinking that we can manually override the autoscaling on the EKS cluster. We want to use Horizontal Pod Autoscaler so that we can use the custom metrics.

    Could this be as easy as disabling the Cluster Autoscaler and applying the HPA? Or will there be monsters in these seas? Thanks so much in advance.

    David Eliahu
    @deliahu

    @becker929 thanks for reaching out, and I'm glad you're enjoying using Cortex!

    Assuming you are running a recent version of Cortex, the autoscaling is triggered based on the number of in-flight requests, and not CPU or GPU utilization. We actually used to use the HPA, but even with custom metrics, we weren't satisfied with the level of control we had, so we ended up implementing our own autoscaling algorithm. Here is the documentation: https://docs.cortex.dev/deployments/realtime-api/autoscaling

    Have you tried changing the default autoscaling configuration? Do you mind sharing the behavior that is not working well for you, and how you plan on addressing that with custom metrics? I'd also be happy to jump on a call if you think that would be easier, feel free to email me at david@cortex.dev

    Anthony Becker
    @becker929
    @deliahu Thanks for your quick response. It seems that you have had some extensive conversation with another team member here. I think we could benefit from a call. I will email you :)