Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
    hubert stefani
    @hstefani_gitlab
    (le truc est là, sous nos yeux, pourtant!!)
    impolitepanda
    @impolitepanda
    Comme je disais, je n'étais pas au courant non plus, et je bosse sur le produit, donc.... honte à moi ^^
    baaastijn
    @baaastijn
    oui alors chez moi j’ai aussi « découvert ce menu » car il est assez bas dans le scrolling
    sur un 13’’ il ne s’affiche pas directement
    impolitepanda
    @impolitepanda
    Concernant votre question sur les aguments du CLI, le help/Readme vous suffisent-ils ? N'hésitez pas si je peux apporter des précisions
    Pirionfr
    @Pirionfr
    @hstefani_gitlab si je comprend bien ton probleme c'est que tu a des paramettre de type --<nom_parametre>=<value> ?
    hubert stefani
    @hstefani_gitlab
    oui c'est cela
    no big deal
    Pirionfr
    @Pirionfr
    ok on va regarder
    Pirionfr
    @Pirionfr
    @hstefani_gitlab j'ai une solution de contournement pour vous concernant votre probleme d'argument de type --foo=bar
    il faut ajouter -- devant le nom du fichier
    exemple : ./ovh-spark-submit --class ovhcloud.com.test --driver-cores 1 --driver-memory 4G --executor-cores 1 --executor-memory 4G --num-executors 1 --
    swift://bucket/my-jar-file.jar --foo=bar
    hubert stefani
    @hstefani_gitlab
    c'est noté. merci (entre temps j'avais développé moi aussi mon contournement).
    pour le client ovh-spark-submit, quel est l'équivalent de de l'option --jars du spark-submit (par exemple pour ajouter un driver jdbc) ?
    Pirionfr
    @Pirionfr
    cette option n'est pas disponible pour le moment
    Pirionfr
    @Pirionfr
    nous avons une solution de contournement que nous n'avons pas encore testé
    l'idée est d'utiliser le fichier de propriété du job spark
    et d'ajouter spark.jars=jars/your_jar1.jar,...
    et vous uploadez dans votre conteneur les libs dans le repertoire jars
    nous allons le tester
    hubert stefani
    @hstefani_gitlab
    le fichier de propriété du job , où le précise-t-on ?
    Pirionfr
    @Pirionfr
    le fichier doit etre au meme niveau que le jar
    et doit etre chargé dans le code
    nous testons actuellement
    nous reviendrons vers vous de que nous aurons la solution complete
    hubert stefani
    @hstefani_gitlab
    parfait.
    Pirionfr
    @Pirionfr
    une autre solution est de faire un fatjar
    il y a aussi la methode SparkContext.addJar(...) mais pas testé
    Mojtaba Imani
    @mojtabaimani
    Hi, OVHcloud Data Processing (ODP) is on production now. You can check out my introduction blog post here: https://www.ovh.com/blog/try-the-new-ovhcloud-data-processing-service/
    Miguel Ângelo
    @xmovu_twitter
    Hey there. Anyone from OVH here? Specially, OVH US. Quite urgent. 30 minutes for a server to be shutdown permanently by OVH's abuse team. Plenty of customer data there. We're just a webhosting company from UK with all abuse reports handled, can't find any help, tried to reach everyone everywhere.
    Pirionfr
    @Pirionfr
    hi @xmovu_twitter, this chan is a tech chan for data Processing Project. you can open a ticker for your problem on the manager https://us.ovhcloud.com/manager/dedicated/#/ticket
    Pirionfr
    @Pirionfr
    clemencedubuc
    @clemencedubuc
    Hi !
    I have a problem with my spark on OVH. I tried to submit a job with "ovh-spark-submit" but I have an error "Unable to submit job: Error 404: "This service does not exist".
    thank you for your help
    I launch : ./ovh-spark-submit --projectid 1234 --driver-cores 1 --driver-memory 4G --executor-cores 1 --executor-memory 4G --num-executors 1 swift://my_container/my_job.py
    Kureeru
    @Kureeru
    Hello ! Can you check that you did all the requirements of this documentation please ?
    clemencedubuc
    @clemencedubuc
    Hi ! yes I already check. It's okay I found my mistake. I want to use kafka with my application. How can import the package ? It is possible to use --packages with spark-submit but it does not work with ovh-spark-submit
    David Morin
    @morind
    Hello ! yes --packages is not yet implemented in ovh-spark-submit. With the current version, to use kafka you can create a fatjar that includes your kafka dependencies.
    clemencedubuc
    @clemencedubuc
    Hi ! thank you for your answer. I'm just wondering: my cide is in python, can a still use a fatjar ?
    David Morin
    @morind
    Sorry My bad. I thought it was in java. So, We will implement --packages and --repositories. We work on it. We will get back to you as soon as these options are available.
    clemencedubuc
    @clemencedubuc
    Ok thank you for your help
    clemencedubuc
    @clemencedubuc
    Hi ! i have an other question. Where should I put my truststore and my keystore for the kafka? In the jar or in the container ? Because I tried in the container but it does not work..
    David Morin
    @morind
    Hello ! Yes, you're right. The content of the container is available on all workers. So you can put the truststore and keystore in it.
    Question: Which values have you defined for ssl.truststore.location and ssl.keystore.location in kafkaParams ?
    https://spark.apache.org/docs/2.4.3/streaming-kafka-0-10-integration.html#ssl--tls
    clemencedubuc
    @clemencedubuc
    Hi David ! Sorry I did not answer ! Yes the problem was the name. I face a new difficulty since one week and I don't know why. I can reach the kafka with a Spark on my local machine and it works well but when I use OVH data processing with java and a jar, the kafka seems block in "discover group coordinator" and I have some error like "broker may not be available". I don't know where is the problem. Do you think It comes from my dependancies or my kafka ?
    Or maybe it is my kafka properties. I have no idea so I ask you , maybe you know that problem.. Thank you
    David Morin
    @morind
    Hi @clemencedubuc ! Thanks for your feedback. About your current issue concerning the connection to Kafka I have a few questions because this error message "discover group coordinator.." is quite generic in Kafka
    • Which version of Kafka are you using ? broker side and consumer side (Spark)
    • Which port number your Kafka is listening on ? (9092, 9093..)
    • Do you have a reference to zookeeper in your kafka properties (consumer side) ?
    clemencedubuc
    @clemencedubuc
    Hi ! I am using kafka 2.5.0 on both sides. i am listening on port 9092, 9093 and 9094 . I have no reference to zookeeper in my kafka properties on the consumer side.
    clemencedubuc
    @clemencedubuc
    Hi everyone, I have another question. When I submit multiple job in parallel it's slow down the execution a lot. To avoid that, the spark have to run in cluster mode but I don't know if the spark of OVH can work like that. Thank you for your help.
    David Morin
    @morind
    Hi @clemencedubuc ! Yes we use client mode to follow the progress of the Spark job and interact with it. The driver is located in the same cluster as executors. Thus this is the same network without the problem of latency we can have with a local driver and remote executors.
    Concerning the way executors retrieve your app, with OVHcloud Data Processing these files are downloaded locally onto each executor and a reference to these ones are done. Thus the download is not required.
    Latency and downloading of files are the main drawbacks of client mode but we have mitigated their impact.
    Do you have any idea on what is the bottleneck or where is it located ?
    Is it the bootstrap of your Job (the job is slow to start) ? Or at the end of the job during the retrieval of data and status of executors by the driver ?
    I agree that this is not always easy to investigate but perhaps you can find some informations in logs, metrics(Grafana) or in the Spark Web UI.
    The parallelism defined for your job could also be helpful: number of executors, memory per executor, .. ?