Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
    baaastijn
    @baaastijn
    GitHub project for DeOldify : https://github.com/jantic/DeOldify
    Nicocg70
    @NicoCG70_twitter
    You could say that before even if it was an experimental platform :\
    baaastijn
    @baaastijn

    Hello there !

    We are really glad to announce the launch of OVHcloud AI Training as a paid beta, fully integrated in our public cloud openstack ecosystem.

    You may know that data-related projects are booming, and we want to provide you all the required tools to build you own data and AI dreamed platform.

    Since 2 years we added new services (Apache Hadoop big data clusters, Data processing powered by Apache Spark, ML Serving to deploy models in production, …).

    Now, with AI Training, we simplify the life of data scientists and data engineers for neural network trainings over GPUs.

    You just have to push your code in a docker and your data on Object Storage and the platform takes care of the rest (data sync, workload deployment, user management, ...).

    If you don't know docker, it doesn't matter, you can use our prebuilt images with your favorite framework (HuggingFace, Pytorch, Tensorflow, MXNet, Fast.ai, ...) with JupyterLab notebooks and VScode.

    All of that with as always, a simple and nice pricing : 1.75€ /hour /GPU NVIDIA V100s (more flavors to come…)

    What does it solve ? lot of things ! easiness to scale, orchestration, no more infrastructure to manage, cost control, … focus on code 😊

    Don’t hesitate to try it out, to share feedback or questions ! it’s on your Public Cloud control panel (AI training in left menu) or via CLI

    Product page : https://www.ovhcloud.com/en/public-cloud/ai-training/
    Documentation : https://docs.ovh.com/gb/en/ai-training/
    Roadmap : https://github.com/ovh/public-cloud-roadmap/projects/2

    Have a good day !

    Sriharsh Bhyravajjula
    @darthbhyrava

    Hello!

    My company is already using OVH servers on the cloud, and we're really interested in ML Serving, too. Could someone let me know what GPUs the ml1-*-standard nodes come with?

    baaastijn
    @baaastijn
    hello @darthbhyrava sorry for the delay
    we do not provide GPU on ML serving so far.
    we are refactoring the backend to provide exact same flavors as you can find in AI Training
    in the neat future GPU will be « NVIDIA V100s » for serving :)
    PS : if you have a bit of time, you may try this alternative : you build a docker image with you model and an api inside, with flask for example, and you deploy it via AI TRaining. example tutorial : https://github.com/christophe-rannou/sample_vision_api_yolov4
    Bertrand BENOIT
    @bertrand-benoit_gitlab
    Hi everyone, I think there is bug in ovh Python package => when performing a PUT request with no data (for instance for ai/job/.../kill), ovh python package translate **kwargs as {}, which leads to content addition.
    And OVH API answers: ovh.exceptions.BadParametersError: You provided an input body while none was expected
    Bertrand BENOIT
    @bertrand-benoit_gitlab
    Hello, 1/is it possible to read access source code of ovhai CLI?
    2/ what is the OVH API allowing to create a new application token (equivalent to ovhai token create token_test --role operator -l customer=xxx) ?
    Thanks.
    Guillaume Salou
    @jagwar
    Hello,
    1/ Not open source (yet, we will schedule it)
    2/ Will be in APIv6 after the next prod (/token) (in 2 weeks).
    Don't hesitate if you have another question :)
    Bertrand BENOIT
    @bertrand-benoit_gitlab
    Hello, thanks for your answer.
    Indeed, I have some more questions :
    1/ Is it possible to create block storage via API
    2/ and then to attach/mount it on AI Training instance (in addition to some Object storage ones)
    3/ do you have any advice to allow multi-user usage on a single AI Training instance ?
    Thanks
    Guillaume Salou
    @jagwar
    1) Yes it is. Have a look at APIV6 or Openstack APIs.
    2) No, but we can discuss on discord about your need if you want: https://discord.gg/uZpsNAK5aU
    3) Yes you have to ways to do it, you can create openstack users (AI Training reader role) or you can create token with the CLI.
    Antoine ROLLET
    @antoinerollet_gitlab
    AI training has been down for a few hours. I get an "internal error" when trying to connect to jobs' urls, although I can start/stop jobs. Any update on this?
    Christophe Rannou
    @ChrisRannou_twitter
    Hey @antoinerollet_gitlab, it's a cookie issue, it's fixed, you'll need to relogin
    Antoine ROLLET
    @antoinerollet_gitlab
    @ChrisRannou_twitter Thanks a lot, salutations
    Bertrand BENOIT
    @bertrand-benoit_gitlab
    Hello everyone, could you confirm we can create API Keys (https://eu.api.ovh.com/createToken/) only with main account credentials; I don't find a way to use a sub-user's credentials
    Bertrand BENOIT
    @bertrand-benoit_gitlab
    And 2. is it possible to clean/destroy or hide old jobs to get a clean dashboard? Many thanks.
    Adrien Carreira
    @XciD
    Hello @bertrand-benoit_gitlab, yes only with main account. If the purpose is to interact with AI Solution you can create token through our API directly.
    For clean old jobs, for now no. We will consider this in the future.
    Bertrand BENOIT
    @bertrand-benoit_gitlab

    Hello @XciD thanks for the answer.

    Is it possible to get filtered job according to some fields (I only see status and updatedAfter atm) => for instance, we want to create an AI Training instance for a specific user; is is possible to (filtered) list job of a specific user to master the fingerprint of the result? (instead of listing 100% jobs and filtering on our side)

    Is is possible to renew the AI Training instance timeout? Or is there any possibility to postpone the shutdown ?

    Antoine ROLLET
    @antoinerollet_gitlab
    Also, is it possible to relaunch a job but switch between GPUs and CPUs?
    Bertrand BENOIT
    @bertrand-benoit_gitlab

    Hello, some other questions: we tried to mount the same 'Object Storage' (for same user, RW permission, and same prefix) on several AI Training instances; but the files/data created on them seem not updated in real time ?
    During our first tests, after the end/death of AI Training instances, there was nothing on the share; but it seems OK now (files are available after new AI Training instance creation with same share).

    2/ we tried it simultaneously on 2 running AI Training instances, and nothing is visible on the other side (file1 created on instance1 is not shown on instance2, and vice versa); may be another way to see the same issue.
    => after creation of new AI Training instances, with same share, then we can see files created from both the previous instances.

    How could we quickly fix that please (to get a real-time workspace/share space)?

    Bertrand BENOIT
    @bertrand-benoit_gitlab
    @antoinerollet_gitlab (to be confirmed by OVH members), according to what I had experimented, 'relaunching' a job is in reality starting a new job with the same specs. of the original one; so I'm not sure you can "switch between GPUs and CPUs"
    Christophe Rannou
    @ChrisRannou_twitter

    Hello @bertrand-benoit_gitlab, right now it is indeed only possible to filter jobs based on status and update date. We will add the ability to filter based on users in our backlog.
    It is not possible at the moment to update the timeout of a job, the feature is planned but was not prioritised so far, we'll move it up.

    About the sharing, upon mounting an Object Storage container on a job, you need to add the cache option to have the same mount on every jobs. If the cache option is not specified it is actually independent copies of the same Object Storage container that are mounted on each job.

    Christophe Rannou
    @ChrisRannou_twitter
    @antoinerollet_gitlab about relaunching @bertrand-benoit_gitlab is exactly right, it is in fact a resubmit of the exact same job spec, you cannot update it in the manager though. However if you use the ovhai CLI you can do so by resubmitting from an existing job and overriding the ressources:
    ovhai job run --from <job-id> --cpu 2
    Bertrand BENOIT
    @bertrand-benoit_gitlab
    Thanks for your answers.
    Eric B.
    @Pooouf_twitter

    Hi,
    I'm testing the Notebook AI beta. i'm not sure if this is the right place to ask questions about this service. Please tell me if there is a more appropriate chat.

    Question: I'm trying to connect to the Jupyter Notebook with my credentials.
    The doc states:

    Any person who is in the same Public Cloud project as you will also have access to this notebook using their own credentials

    My NIC-handle is allowed in the Public Cloud / Project Management / Contacts and Rights section with Rread/Write access. Even though, the Jupyter sign in fails.

    Invalid user or password

    If I create a user in Public Cloud / Project Management / Users & Roles (with AI Training Operator and Object Storage operator), I can sign in onto the Jupyter Notebook.

    What am I doing wrong?

    Eric B.
    @Pooouf_twitter
    Ok, I have been invited to the Discord channel of AI Notebooks. I'll ask there too
    Eric B.
    @Pooouf_twitter
    FYI, the answer is it's based on Public Cloud user (Openstack users) not your OVH account
    Cev
    @Hillan_gitlab
    Hello ! I belong to a french tech company and we are testing GPU Cloud solutions ; we would really like to go with OVH but for the ML Serving service the fact that it is CPU only and not Tensorflow 2 compatible is a show stopper. Do you have any ETA for those ? Keep up the good work by the way !
    Cev
    @Hillan_gitlab
    @jagwar @XciD any help ?
    Christophe Rannou
    @ChrisRannou_twitter
    Hello @Hillan_gitlab. The ML Serving part is currently being refactored. The first version of this refactor should be available by the end of the year. You can follow our public roadmap here: https://github.com/ovh/public-cloud-roadmap/projects/2 (new announcements should be published soon).
    In the meantime it is possible to AI Training to deploy your API while leveraging GPUs, you can find an example of how to do so here: https://github.com/christophe-rannou/sample_vision_api_yolov4 and I can provide you additional help if needed.
    Do not hesitate to join us on the Discord server: https://discord.com/invite/vXVurFfwe9 (on which we are more active lately)
    Selimonder
    @Selimonder

    Hello,

    I am trying to use OVH AI notebook. It seems great; but I am not able to install any pip package because of permissions :( We are also unable to feed a custom image for AI notebook. Is there a workaround?

    Guillaume Salou
    @jagwar
    Hello @Selimonder , it's weird because it's authorized, which framework did you choose ?
    Selimonder
    @Selimonder
    This issue has been addressed (all-in-one image), thanks! (sorry just received the notification)
    sushi163
    @sushi163:matrix.org
    [m]
    Hello everyone! I launched a notebook a few days ago and I was able to access it this morning. However, now I'm not, the page is taking forever to load. Any idea why this could be so? Thanks!
    sushi163
    @sushi163:matrix.org
    [m]
    UPDATE : Now page is opening but I get this error.
    Guillaume Salou
    @jagwar
    Hello @sushi163:matrix.org we had an issue with our datastore backend today. The issue is fixed now but in the meantime, we have stopped all the jobs and notebooks.
    Guillaume Salou
    @jagwar
    If you were using ai-notebook, you just have to start it, your data has been automatically saved.
    sushi163
    @sushi163:matrix.org
    [m]
    Yes, now it works fine!! Thanks!
    Yûki Vachot
    @NyxiumYuuki
    Hi, is it possible to add a collaborative flag for jupyterlab to enable Real Time Collaboration ? Seems like the flag isn't supported yet ?
    Antoine ROLLET
    @antoinerollet_gitlab
    We're experiencing issues with Notebooks since yesterday. When starting one, it gets stuck during data pulling. It stays stuck at 'Completed: 0' and 'Transfered: 0 B'. After a while the notebook's state switches to 'Failed'. Are you aware of this issue?
    Guillaume Salou
    @jagwar
    Hello, Antoine. Sorry for the outage. We have identified a bug. We have to fix it. We will keep you informed when it's done.
    Antoine ROLLET
    @antoinerollet_gitlab
    Thank Guillaume. Good luck :-)
    Guillaume Salou
    @jagwar
    Hey @antoinerollet_gitlab it must be good ! Raise your hand if it's not
    baaastijn
    @baaastijn
    Hello @/all , following our customer request we are moving from Gitter to Discord for all AI discussions :) please use https://discord.com/invite/vXVurFfwe9
    More than AI you'll be able to discuss about OVHcloud Databases, Data Processing, Object storage, domain names, ,,,
    thank you !