Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
    impolitepanda
    @impolitepanda
    Let me know if that's works for you as well
    To use --packages with the API (as this is what you're using, if I remember correctly), iyou have to add it in your engine_parameters
    EricB
    @Pooouf

    I added spark.jars.packages=org.postgresql:postgresql:42.2.20to spark.conf
    That was the key part I was missing in order to get rid of the No suitable driver errors.

    Now, I need to let the ODP service access my DB. What it the IP address (range) from which the Spark job will try to reach the DB, so I can allow that address ?

    EricB
    @Pooouf
    Our DB server runs in a Hosted Private Cloud at OVH. I figured out that creating the Public Cloud project of ODP also created a new vRack. So ODP was in a separate vRack than our database. I moved the project to the same vRack as the DB, but that is not enough.
    EricB
    @Pooouf
    impolitepanda
    @impolitepanda
    Hello @Pooouf ! Sorry, I missed your message last week. I think you will not be happy with the answer I can give you at this moment, though. Data Processing doesn't support vRack yet. We are working on it, but as we are talking about a "managed by OVH" service, interconnecting with customers vRack raises a lot of security questions we need to answer, so progress is quite slow on that front.
    As for IP range, it's also something we were working on, but because the service is highly dynamic, it's not a really suitable solution. In addition, vRack interconectivity requests from customers were far more numerous, so we decided to shift on that subject first, as it would solve most problems. We will keep you updated as soon as we can to give you an ETA, but it's probably not coming in June or July.
    EricB
    @Pooouf
    Oh
    So I cannot reach the DB directly. I need to deploy an API to bridge data between ODP jobs and the DB, at least until the 4th quarter, correct?
    impolitepanda
    @impolitepanda
    Unfortunately, that's the only way to expose your data from your databases for Data Processing to use, yes. That, or you could also duplicate your data in swift/S3, for instance. But that's probably more of a pain for you than creating an API. Basically, any way where your credentials can be stored inside your code (or a property file) for your job and exposes an endpoint you feel is secured enough will work. Sorry I can't give you a better answer than that yet :(
    EricB
    @Pooouf
    Ok. We'll try to extract data to files in the object storage, and retrieve results there too.
    CameronTodd1123
    @CameronTodd1123
    Hello, Does anyone know the specs for the spark clusters in terms of disk space for each node? I received an error about a node running out of disk and just wanted to know. I have to use .cache() alot for my iterative algorithms but could switch it out to .checkpoint() to the s3 bucket and free up the nodes disk.
    impolitepanda
    @impolitepanda
    Hello @CameronTodd1123 ! We just checked and this information is indeed missing from our capabilities and limitations documentation. We will fix that asap. To answer you, the current limits for local storage is of 50Gb per executor. It clearly seems it's not enough for your needs. Would you have an estimate of what you would need ?
    Also, the 50Gb is used for your jobs local storage but also for logs storage. With the log rotation we recently implemented (you can now find all logs from drivers and executors in your swift odp-logs container), the amount of space taken by logs isn't that much at maximum, but still need to be accounted for.
    CameronTodd1123
    @CameronTodd1123
    Thanks for the fast response. I'm still tuning my job and understanding how much I can afford to keep in memory vs offload to disk so I don't have an exact number on required disk. I do have one step where the data size explodes and goes far beyond my average memory required and so spills to disk. Anyway, thanks for the response, now I can change my code based on all the specs.
    impolitepanda
    @impolitepanda
    ok ! Just a bit of warning regarding the use of checkpoints to S3. OVHCloud current S3 offering is based on openstack swift and is eventually consistent only. So it could create consistency problems in your jobs if you use it (trying to immediately read a file after writing it will probably fail). We are also working on itegrating the next gen offering for S3 (based on OpenIO) but we can't give you an ETA on that yet. So checkpoints might be usable for you, depending on how and where you use it, but we cannot guarantee that it will work as you expect it to do.
    To sum up, OVHcloud's S3 is good as a data source or to store the final result of your job, but for intermediary storage, it will probably fail.
    CameronTodd1123
    @CameronTodd1123
    Ah yep! I had read this might be a problem. Thanks for raising it to my attention again. Alot of the spark graph algorithms and my custom graph algorithm requires the use of checkpoint, this was managed by hadoop with HDFS alot easier last time. How would you recommend we manage this with OVH setup with eventually consistent S3 and or local disk?
    impolitepanda
    @impolitepanda
    So, this is one of the problem we are having with Data Processing at the moment (and why we are working on the OpenIO integration). There is no good solution to support checkpoints/savepoints for now. If you have your own HDFS, you can try and use it with webhdfs but we never tried it (and the performance will definitely drop a lot if it works).
    Also, just a correction on what I said before regarding local disk space: it's actually not 50Gb per executor, but 9GiB per core per executor. So at maximum, as we support up to 16 cores per executor, you can have 144GiB per executor. But if you run only 2 cores per executors, you will only have access to 18GiB
    CameronTodd1123
    @CameronTodd1123
    Ok. Thanks for your comments! Very helpful!
    CameronTodd1123
    @CameronTodd1123
    Has anyone else had problems where the data processing dashboard of the amount of executors does not match the spark dashboard? I'm currently running a large spark job and I had requested 2 executors with 8 cores each + the driver node. But in the spark dashboard when reviewing my job under Executors, I only see the driver node and 1 executor with 8 cores, so i'm missing half the CPU's and memory. I'm submitting my job through the ovh-spark-submit command line
    impolitepanda
    @impolitepanda
    @CameronTodd1123 could you send us your job ID by any chance ? We've never herad about that and definitely need to investigate
    CameronTodd1123
    @CameronTodd1123
    Sure, it's happened to me twice now. Here is my recent job id - 108711d3-4d05-458e-b39d-cad75dc5794e
    impolitepanda
    @impolitepanda
    Thanks. Can I ask you the other job ID as well in that case ? If two jobs did that, there might be a pattern that would allow us to trace the issue's root cause faster.
    We already started investigating with our own tests but couldn't reproduce the issue yet. We will increase our tests number to increase the odds of the issue appearing in the meantime. Will keep you updated once the issue has been found and give you an ETA on fix asap
    CameronTodd1123
    @CameronTodd1123
    I can't remember the job ID sorry, it was a few weeks ago and thought it was a glitch at the time.
    impolitepanda
    @impolitepanda
    ok, no problem :) We will keep you updated as soon as we have an answer
    CameronTodd1123
    @CameronTodd1123
    Thanks!
    CameronTodd1123
    @CameronTodd1123
    Hi again, I have another live example of differences in resources requested. I'm currently running a job that I requested 3 executors + driver executor. However in spark UI, I only see the driver and 2 executors. Current running job ID is 10cf90b9-5bb7-4270-9af7-bef949993d6b
    CameronTodd1123
    @CameronTodd1123
    Ah, ok it appears the other slave executor stood up 10-15mins later.... not sure if that's normal behaviour.
    David Morin
    @morind
    Hello @CameronTodd1123 Sorry, we had temporary issues concerning the connexion to Swift/network. Just one executor was impacted.
    We've found a fix to mitigate this kind of problem and recover faster.
    We have to do more tests and prod it if tests are ok.
    We'll keep you updated as soon as it's done.
    David Morin
    @morind
    Hello @CameronTodd1123
    We found the root cause of the problem that was appearing randomly. In fact, it was located onto one node only.
    The problem is fixed now. We did some tests to reproduce the problem.
    These tests executed regularly have been succeeded since the fix was proded last tuesday (Jul 06)
    In parallel, the feature I've mentioned in my last post to mitigate this kind of problem in the future is currently in validation.
    CameronTodd1123
    @CameronTodd1123
    Ok Thanks @morind .
    Regarding my problem above about writing intermediary results to S3 and it being eventual consistent. Can we write to the local disk of the server e.g. /tmp/ ? Also can you add a way to configure the disk space of the spark cluster we request?
    CameronTodd1123
    @CameronTodd1123
    Does anyone know if it's possible to read a file e.g. config file from the same s3 bucket that your spark code is submitted from without the s3 credentials. It says here https://docs.ovh.com/ie/en/data-processing/object-storage-java/ "Everything in OVHcloud object storage container in which you uploaded your code, will be downloaded to the Data Processing cluster....". So if I have a main config file for my job in my code S3 bucket, how would I read this file in java?
    Hugo Larcher
    @Hugoch
    Hello @CameronTodd1123 , if your config file is alongside your code you can read it directly from the local file system
    CameronTodd1123
    @CameronTodd1123
    what would the path be? Say my code and config file is in "s3://code/sparkjob.jar" and "s3://code/sparkjob.json"
    Hugo Larcher
    @Hugoch
    that would be /opt/spark/workdir/sparkjob.json
    CameronTodd1123
    @CameronTodd1123
    Thanks, that worked.
    Hugo Larcher
    @Hugoch

    Thanks, that worked.

    nice!

    Eric B.
    @Pooouf_twitter
    Hi,
    Let's say I have a job and I would like that job to read/write my database. One way would be to open a VPN tunnel to the network where my database is. In other words, the ODP job needs to open a VPN connection.
    It probably requires that the Driver environment embeds a VPN client, and the Spark Driver has enough rights to start/use the VPN client.
    Is this possible ? Or is there some way to achieve a VPN connection from the Spark nodes ?
    Hugo Larcher
    @Hugoch
    Hello @Pooouf_twitter , for now you cannot reach a VPN from a dataprocessing job. We have plans to get dataprocessing jobs able to reach you vRack network. When that will be possible, you will be able to reach resources inside your vRack. Then you will need to setup a gateway between your vRack and your VPN. So for now you need to allow access to you DB coming from dataprocessing public IPs.
    Eric B.
    @Pooouf_twitter
    Ok. Do you have an ETA for the vRack ?
    Hugo Larcher
    @Hugoch
    Unfortunately we don't have an ETA to share yet
    Eric B.
    @Pooouf_twitter
    :/
    I guess it means: don't hold your breath
    thanks anyway
    Hugo Larcher
    @Hugoch
    Exactly, don’t wait to deploy you job using public routing if possible for this particular project
    rest-solution
    @rest-solution:matrix.org
    [m]
    Hello, I have a spark.stop() at the end of my code but my spark (scala) jobs don't finish, status remains InProgress
    David Morin
    @morind
    Hi @rest-solution:matrix.org You're right. It should work. We have to investigate in logs and probably we will need a code sample based on your use case. Now we can switch to the private channel through the support ticket.
    Could you please switch to Discord for future communications ? We switched from gitter to Discord. Here after the link: https://discord.gg/VVvZg8NCQM
    Thanks
    See ya on support about your ticket..