context = pyarrow.default_serialization_context() df_bytestring = context.serialize(df).to_buffer().to_pybytes() return df_bytestring
context = pyarrow.default_serialization_context() res = requests.post(**inp) context.deserialize(res.content)
OSError: buffer_index out of range.
Hello, we were told here that any user in an AWS IAM user with
AdministratorAccess could manage a cluster (inluding taking it down) regardless of what user deployed the cluster. But this doesn't seem to be the case.
SEE THE FOLLOWING ERRORS:
sudo cortex cluster info --config cluster.yaml
cloudformation stack name status
syncing cluster configuration ...
error: You must be logged in to the server (Unauthorized)
sudo cortex cluster down --config cluster.yaml
your cluster named "cortex-alita" in us-east-1 will be spun down and all apis will be deleted, are you sure you want to continue? (y/n): y
￮ deleting api gateway ✓
￮ deleting dashboard ✓
￮ spinning down the cluster ...
error: You must be logged in to the server (Unauthorized)
[ℹ] eksctl version 0.19.0
[ℹ] using region us-east-1
[ℹ] deleting EKS cluster "cortex-alita"
[ℹ] deleted 0 Fargate profile(s)
[✔] kubeconfig has been updated
[ℹ] cleaning up LoadBalancer services
Error: cannot list Kubernetes Services: Unauthorized
kubectlcommand that can be run to grant access to the other IAM user, but we haven't looked into that yet (we can take a look today if that's a blocker for you, or feel free to let us know if you know how to grant access to the eks cluster)
@deliahu We can get by for now by using a shared user. (Not ideal, but it will do) Another issue that we would love for you to look into is the following:
We have Service Control Policies that prevent the use of the RunInstace api call unless the instance has certain tags that we use for cost tracking. The current method that you have for adding tags to Cortex deployed resources does NOT add the tags to the
eksctl-cortex-nodegroup-ng-cortex-worker-on-demand launch template. It would be nice if the tags we specify in the cluster.yaml file get propagated to this templeate and proably every resource that generates a charge by AWS so that we can properly account for cost. For now I will try to manually add those tags to the launch template after the cluster is deployed so that we can reactivate our SCPs. Also I did find a bug with the current tagging mechanism. It appears as if any of the values from any key-value pair has a sapce, cortex manages to apply the tags sccessfuly to some resources (we veryfied that it added the tag to an S3 bucket correctly. But the cortex cluster deploymet errors out claiming that the tags are not formatted correctly. I tried every possibility of YAML or JSON syntax possible, nothing worked.
@antuanvazquez_gitlab I just confirmed that in our next release (v0.20), the tags specified in your cluster configuration file will propagate to all EC2 instances that Cortex creates (this was addressed in cortexlabs/cortex#1345). We are hoping to release v0.20 in the next week or two. If you need this to work before then, let me know and we should be able to build a custom image for you.
I have not yet explored using tags that contain spaces, but I'm hoping to do that before our v0.20 release as well, and will keep you posted.
Hi Cortex team. Thanks for creating this awesome tool. I have a question for you.
We are working with Cortex and finding that our API is not autoscaling well. We're thinking that this could be because our particular application is utilizing more GPU than is standard.
We want to continue using Cortex because it's so wonderfully streamlined, so we are thinking that we can manually override the autoscaling on the EKS cluster. We want to use Horizontal Pod Autoscaler so that we can use the custom metrics.
Could this be as easy as disabling the Cluster Autoscaler and applying the HPA? Or will there be monsters in these seas? Thanks so much in advance.
@becker929 thanks for reaching out, and I'm glad you're enjoying using Cortex!
Assuming you are running a recent version of Cortex, the autoscaling is triggered based on the number of in-flight requests, and not CPU or GPU utilization. We actually used to use the HPA, but even with custom metrics, we weren't satisfied with the level of control we had, so we ended up implementing our own autoscaling algorithm. Here is the documentation: https://docs.cortex.dev/deployments/realtime-api/autoscaling
Have you tried changing the default autoscaling configuration? Do you mind sharing the behavior that is not working well for you, and how you plan on addressing that with custom metrics? I'd also be happy to jump on a call if you think that would be easier, feel free to email me at email@example.com
@balakrishna222111 The short answer is that it isn't supported at the moment. This is a relatively new use case so it would be helpful if you can explain your use case in further detail.
What are the advantages of being able to add external nodes to the Cortex cluster? Did you want to add other nodes to the Cortex cluster because currently Cortex only supports 1 type of nodes per cluster? If Cortex were to support adding external nodes, were you hoping to run Cortex APIs on these new nodes?
@sp-davidpichler it is not on our roadmap to support batch endpoints for local environments.
On the other hand, by making a couple of modifications to your batch API project, you can test it as a
RealtimeAPI API instead:
predictor.pyscript, make sure the
batch_idare not used.
predictor.pyscript, modify your
predictmethod to process the payload accordingly, using the same prediction engine (i.e.
self.modelor however you set it up in the constructor).
All the documentation for this can be found here:
For streamlining this process while testing, you can pass into
predictor.config (in your
cortex.yaml config) a
development variable that it's either set to
false depending on whether
predictor.py should be run as a batch API or a realtime API. That
config is then passed into the predictor's constructor as
config. Here's an example for the
@hemanthsunny the answer is no. You can only have a single predictor type per API deployment. The reason is not technical, but architectural. If you need another predictor type (presumably for another model for a different ML framework), you can then just create another API. You don't necessarily have to have multiple
cortex.yaml configs. You can have something like:
# cortex.yaml - name: python-predictor-ila predictor: type: python path: predictor.py # ... - name: tensorflow-predictor-xhg predictor: type: tensorflow path: predictor.py model_path: s3://<model-bucket-path> # ... # ...
Do you think that this is not clear enough in the documentation? We can change it slightly to address this if it proves to be unclear.