Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
    Laksh1997
    @Laksh1997
    And I get: OSError: buffer_index out of range.
    However when I do the serialization and deserialization all on the client side (ie in a python notebook) it works fine
    Laksh1997
    @Laksh1997
    @deliahu when I upgrade to the latest pandas and pyarrow on client side it works
    David Eliahu
    @deliahu
    @Laksh1997 sounds good, so is everything now working as you expect?
    Laksh1997
    @Laksh1997
    Yep - everything works!
    Cheers
    David Eliahu
    @deliahu
    :+1:
    Laksh1997
    @Laksh1997
    It's also much better and handles the 10000 x 2000 df easily
    cheers for that!
    David Eliahu
    @deliahu
    awesome, glad to hear it!
    Antuan Vazquez
    @antuanvazquez_gitlab

    Hello, we were told here that any user in an AWS IAM user with AdministratorAccess could manage a cluster (inluding taking it down) regardless of what user deployed the cluster. But this doesn't seem to be the case.

    SEE THE FOLLOWING ERRORS:

    sudo cortex cluster info --config cluster.yaml

    cloudformation stack name status
    eksctl-cortex-alita-cluster CREATE_COMPLETE
    eksctl-cortex-alita-nodegroup-ng-cortex-operator CREATE_COMPLETE
    eksctl-cortex-alita-nodegroup-ng-cortex-worker-on-demand CREATE_COMPLETE

    syncing cluster configuration ...

    error: You must be logged in to the server (Unauthorized)


    sudo cortex cluster down --config cluster.yaml

    your cluster named "cortex-alita" in us-east-1 will be spun down and all apis will be deleted, are you sure you want to continue? (y/n): y

    ○ deleting api gateway ✓
    ○ deleting dashboard ✓
    ○ spinning down the cluster ...
    error: You must be logged in to the server (Unauthorized)

    [ℹ] eksctl version 0.19.0
    [ℹ] using region us-east-1
    [ℹ] deleting EKS cluster "cortex-alita"
    [ℹ] deleted 0 Fargate profile(s)
    [✔] kubeconfig has been updated
    [ℹ] cleaning up LoadBalancer services
    Error: cannot list Kubernetes Services: Unauthorized

    David Eliahu
    @deliahu
    @antuanvazquez_gitlab this is a known issue (cortexlabs/cortex#1316), and we are planning to look into it before our next release. In the meantime, is it a possibility to use the same credentials you used to create the cluster? If not, there may be an aws and/or kubectl command that can be run to grant access to the other IAM user, but we haven't looked into that yet (we can take a look today if that's a blocker for you, or feel free to let us know if you know how to grant access to the eks cluster)
    Antuan Vazquez
    @antuanvazquez_gitlab

    @deliahu We can get by for now by using a shared user. (Not ideal, but it will do) Another issue that we would love for you to look into is the following:

    We have Service Control Policies that prevent the use of the RunInstace api call unless the instance has certain tags that we use for cost tracking. The current method that you have for adding tags to Cortex deployed resources does NOT add the tags to the eksctl-cortex-nodegroup-ng-cortex-worker-on-demand launch template. It would be nice if the tags we specify in the cluster.yaml file get propagated to this templeate and proably every resource that generates a charge by AWS so that we can properly account for cost. For now I will try to manually add those tags to the launch template after the cluster is deployed so that we can reactivate our SCPs. Also I did find a bug with the current tagging mechanism. It appears as if any of the values from any key-value pair has a sapce, cortex manages to apply the tags sccessfuly to some resources (we veryfied that it added the tag to an S3 bucket correctly. But the cortex cluster deploymet errors out claiming that the tags are not formatted correctly. I tried every possibility of YAML or JSON syntax possible, nothing worked.

    David Eliahu
    @deliahu
    @antuanvazquez_gitlab we will definitely look into these issues before our next release; I'll keep you posted!
    thanks for bringing this to our attention
    we actually just added validations to the tags to prevent users from requesting "illegal" values (cortexlabs/cortex#1355). But we only enforced compatibility with AWS in the validation, and AWS supports spaces, so we'll look into what needs to be done to support it with cortex
    regarding the tag not propagating the the EC2 instances, we'll check that out in the next day or two and keep you posted
    1 reply
    David Eliahu
    @deliahu

    @antuanvazquez_gitlab I just confirmed that in our next release (v0.20), the tags specified in your cluster configuration file will propagate to all EC2 instances that Cortex creates (this was addressed in cortexlabs/cortex#1345). We are hoping to release v0.20 in the next week or two. If you need this to work before then, let me know and we should be able to build a custom image for you.

    I have not yet explored using tags that contain spaces, but I'm hoping to do that before our v0.20 release as well, and will keep you posted.

    Anthony Becker
    @becker929

    Hi Cortex team. Thanks for creating this awesome tool. I have a question for you.

    We are working with Cortex and finding that our API is not autoscaling well. We're thinking that this could be because our particular application is utilizing more GPU than is standard.

    We want to continue using Cortex because it's so wonderfully streamlined, so we are thinking that we can manually override the autoscaling on the EKS cluster. We want to use Horizontal Pod Autoscaler so that we can use the custom metrics.

    Could this be as easy as disabling the Cluster Autoscaler and applying the HPA? Or will there be monsters in these seas? Thanks so much in advance.

    David Eliahu
    @deliahu

    @becker929 thanks for reaching out, and I'm glad you're enjoying using Cortex!

    Assuming you are running a recent version of Cortex, the autoscaling is triggered based on the number of in-flight requests, and not CPU or GPU utilization. We actually used to use the HPA, but even with custom metrics, we weren't satisfied with the level of control we had, so we ended up implementing our own autoscaling algorithm. Here is the documentation: https://docs.cortex.dev/deployments/realtime-api/autoscaling

    Have you tried changing the default autoscaling configuration? Do you mind sharing the behavior that is not working well for you, and how you plan on addressing that with custom metrics? I'd also be happy to jump on a call if you think that would be easier, feel free to email me at david@cortex.dev

    Anthony Becker
    @becker929
    @deliahu Thanks for your quick response. It seems that you have had some extensive conversation with another team member here. I think we could benefit from a call. I will email you :)
    Naga Hemanth
    @hemanthsunny
    Hello Cortex team! Thanks a ton for this great tool.
    Vishal Bollu
    @vishalbollu
    @hemanthsunny I'm happy to hear that you are enjoying Cortex, thanks for reaching out!
    Antuan Vazquez
    @antuanvazquez_gitlab
    Hi all is there a way to choose the VPC CIDR that gets assigned to the Cortex VPC?
    Vishal Bollu
    @vishalbollu
    @antuanvazquez_gitlab it isn't possible to configure the VPC CIDR at the moment. By default the VPC CIDR is 192.168.0.0/16. Can you explain your use case for wanting to configure it? Is it for setting up VPC peering or having a larger ip range?
    Antuan Vazquez
    @antuanvazquez_gitlab
    @vishalbollu we are setting up VPC peering. But the issue that we are running into is that this CIDR overlaps with our Site-to-Site VPN CIDR... We are still exploring what kind of issues this could bring up.
    balakrishna222111
    @balakrishna222111
    Any one know how to do this ?
    i have a aws eks cluster with in the dev vpc
    now i have few ec2 instance in the test-vpc
    so now i need add the add test-vpc instance to the existing existing cluster. can we do. ?
    Vishal Bollu
    @vishalbollu
    @antuanvazquez_gitlab Are you setting up VPC peering just between the Site-to-Site VPN CIDR and Cortex VPC. Are there more VPCs involved than just the two?
    @balakrishna222111 Have you considered setting up VPC peering between the eks cluster VPC and your test-vpc. You can find the VPC peering setup instructions here.
    balakrishna222111
    @balakrishna222111
    @vishalbollu
    vpc peering done it's working also
    only thing we need to do is add the test vpc node to the cluster
    Antuan Vazquez
    @antuanvazquez_gitlab
    We have a complex networking scenario with multiple VPCs from multiple AWS accounts connected together. But we think we can solve the issue with relatively minimal effort by changing the CIDR of our site-to-site VPN. I'll reach back out if this doesn't solve the conflict. Thanks!
    sp-davidpichler
    @sp-davidpichler
    Will the local environment of cortex support testing batch endpoints?
    Robert Lucian Chiriac
    @RobertLucian
    @antuanvazquez_gitlab I see. That would be great if you could change the CIDR of your site-to-site VPC. If that doesn't work, then maybe you have a couple of alternatives that all would involve the use of a 3rd VPC:
    Antuan Vazquez
    @antuanvazquez_gitlab
    Thanks @RobertLucian. We'll look into those options as well!
    balakrishna222111
    @balakrishna222111
    @RobertLucian can u please check my issues
    Vishal Bollu
    @vishalbollu
    @balakrishna222111 Apologies, I misunderstood your question. I don't think it is possible to add external nodes to the Cortex cluster. Can you please explain why you are trying to add external instances to the Cortex cluster?
    balakrishna222111
    @balakrishna222111
    @vishalbollu it's a requirement for one of project
    Vishal Bollu
    @vishalbollu

    @balakrishna222111 The short answer is that it isn't supported at the moment. This is a relatively new use case so it would be helpful if you can explain your use case in further detail.

    What are the advantages of being able to add external nodes to the Cortex cluster? Did you want to add other nodes to the Cortex cluster because currently Cortex only supports 1 type of nodes per cluster? If Cortex were to support adding external nodes, were you hoping to run Cortex APIs on these new nodes?

    Robert Lucian Chiriac
    @RobertLucian

    @sp-davidpichler it is not on our roadmap to support batch endpoints for local environments.

    On the other hand, by making a couple of modifications to your batch API project, you can test it as a RealtimeAPI API instead:

    1. In your cortex.yaml, change kind from BatchAPI to RealtimeAPI.
    2. In your predictor.py script, make sure the job_spec and batch_id are not used.
    3. In your predictor.py script, modify your predict method to process the payload accordingly, using the same prediction engine (i.e. self.model or however you set it up in the constructor).

    All the documentation for this can be found here:

    For streamlining this process while testing, you can pass into predictor.config (in your cortex.yaml config) a development variable that it's either set to true or false depending on whether predictor.py should be run as a batch API or a realtime API. That config is then passed into the predictor's constructor as config. Here's an example for the PythonPredictor class.

    1 reply
    David Eliahu
    @deliahu
    @antuanvazquez_gitlab I wanted to follow up on your previous message regarding using spaces in AWS tags: we just merged cortexlabs/cortex#1374 which enables spaces (as well as the other special characters that AWS supports), so this will be in our next release
    4 replies
    Naga Hemanth
    @hemanthsunny
    Am a beginner in Cortex & ML. Would be great if anyone could help me with this -
    I'm trying to implement multiple pre-trained models of different predictor types from a single RealtimeApi. Is it possible to use PythonPredictor & TensorflowPredictor in the same predictor.py file?
    Currently, I'm taking help from here
    Robert Lucian Chiriac
    @RobertLucian

    @hemanthsunny the answer is no. You can only have a single predictor type per API deployment. The reason is not technical, but architectural. If you need another predictor type (presumably for another model for a different ML framework), you can then just create another API. You don't necessarily have to have multiple cortex.yaml configs. You can have something like:

    # cortex.yaml
    
    - name: python-predictor-ila
      predictor:
        type: python
        path: predictor.py
        # ...
    
    - name: tensorflow-predictor-xhg
      predictor:
        type: tensorflow
        path: predictor.py
        model_path: s3://<model-bucket-path>
        # ...
    
    # ...

    Do you think that this is not clear enough in the documentation? We can change it slightly to address this if it proves to be unclear.

    Jinhyung Park
    @jinhyung
    Hello! :) I'm new to Cortex - one quick question... can we specify local path for model_path option?
    Jinhyung Park
    @jinhyung
    Mmm.. based on repo search results, model_path should be always s3 path. it would be great if we can pass a local path to it, for example, by using the prefix, 'local://
    Vishal Bollu
    @vishalbollu
    @jinhyung the model_path can be a local path only if you are deploying the api to the local environment (the API will be running locally on the machine). If you are attempting to deploy an API to the Cortex cluster, the model_path must be an S3 path.
    Jinhyung Park
    @jinhyung
    oh?! can I pass the local pass when I deploy locally?
    @vishalbollu could you please give me an example input for model_path of context.yaml?
    *cortex.yaml
    model_path: /mnt/models/a_model would work?