Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
    Ivan Ermilov
    @earthquakesan
    To have several apps run in parallel, you will need to use YARN.
    I did not understand your question regarding streaming, can you elaborate?
    waterponey
    @waterponey
    I was wondering how it would work if you wanted to have several concurrent streaming application. One possibility would be to statically declare one spark cluster per app and define the resources that swarm would allow to each spark cluster at startup. The other would be to define one specific "streaming spark cluster" in swarm and let the spark ressource manager do the allocation to each streaming app.
    Ivan Ermilov
    @earthquakesan
    you will still need to specify amount of resources per executor for your apps, even when you use resource manager
    waterponey
    @waterponey
    I'm not sure which level of resource manager you're talking about, swarm or spark ?
    Ivan Ermilov
    @earthquakesan
    do you mean docker swarm?
    waterponey
    @waterponey
    yes
    I mean you deploy docker container containing a spark cluster on a swarm cluster, or am I missing something ?
    Ivan Ermilov
    @earthquakesan
    The proper setup would be as follows:
    1. Deploy Spark cluster with YARN/Mesos and restrict CPU/Memory usage in Swarm + configure resource managers to see that limit (unfortunately you need to do it manually/ansible). Let's say you give it 64 cores and 256G of RAM.
    2. When deploying spark apps inside your cluster restrict CPU usage and memory per application (e.g. number of executors, cores per executor, memory per executor). If you want to run 2 applications, then you will do the setup, where each app consumes 32 cores and 128G of RAM.
    that's correct, I am clarifying that you are not using swarm as a general term for a cluster %)
    waterponey
    @waterponey
    ok so now I'm confused, why would I need to keep docker swarm if I have to deploy yarn or mesos ?
    Ivan Ermilov
    @earthquakesan
    you deploy YARN in swarm as well
    that's just another container
    Stephen Baynham
    @CannibalVox
    Hey I'm sorry, I'm trying to run https://github.com/big-data-europe/docker-hive and the data nodes are failing with Datanode denied communication with namenode because hostname cannot be resolved (ip=10.0.0.10, hostname=10.0.0.10): DatanodeRegistration(0.0.0.0:50010, datanodeUuid=d64d014a-4467-4065-95e6-596590148f75, infoPort=50075, infoSecurePort=0, ipcPort=50020, storageInfo=lv=-56;cid=CID-ab7b488d-d3c3-470d-aea0-c6e6ac6708b9;nsid=2064164277;c=0)
    If you have any clarity I'd appreciate it!
    Ivan Ermilov
    @earthquakesan

    Hi Stephen! @CannibalVox

    Which docker-compose are you using? From master branch?

    SasidharT
    @SasidharT
    Hi Guys, whether anyone deployed spark cluster using docker-compose.yml on ECS in AWS
    ?
    Anton Kulaga
    @antonkulaga
    @earthquakesan is it possible for you to publish docker with spark 2.3.1 ? I have some minor dependency clashes with 2.3.0 container using SANSA-RDF dependencies
    Anton Kulaga
    @antonkulaga
    Any plands for alluxio docker-swarm configs?
    comboo
    @wings-xue
    I can't open http://<dockerhadoop_IP_address>:8088/ , and i see run.sh , there is not run commod about yarn ?
    Peter Viskovics
    @jr.visko_gitlab
    Hi,
    when running docker pull bde2020/hive I face this problem:
    hive_1 | mkdir: Call From 57c424f72c21/172.22.0.2 to 57c424f72c21:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
    hive_1 | mkdir: Call From 57c424f72c21/172.22.0.2 to 57c424f72c21:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
    hive_1 | chmod: Call From 57c424f72c21/172.22.0.2 to 57c424f72c21:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
    hive_1 | chmod: Call From 57c424f72c21/172.22.0.2 to 57c424f72c21:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
    and beeline cannot join to localhost:10000
    ➜ epam-qa-metrics git:(master) ✗ docker-compose exec hive bash
    root@57c424f72c21:/opt# /opt/hive/bin/beeline -u jdbc:hive2://localhost:10000
    SLF4J: Class path contains multiple SLF4J bindings.
    SLF4J: Found binding in [jar:file:/opt/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    SLF4J: Found binding in [jar:file:/opt/hadoop-2.7.4/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
    SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
    Connecting to jdbc:hive2://localhost:10000
    18/11/14 13:11:30 [main]: WARN jdbc.HiveConnection: Failed to connect to localhost:10000
    Could not open connection to the HS2 server. Please check the server URI and if the URI is correct, then ask the administrator to check the server status.
    Error: Could not open client transport with JDBC Uri: jdbc:hive2://localhost:10000: java.net.ConnectException: Connection refused (Connection refused) (state=08S01,code=0)
    Beeline version 2.3.2 by Apache Hive
    beeline> CREATE TABLE pokes (foo INT, bar STRING);
    No current connection
    this is my docker version:
    ➜ ~ docker -v
    Docker version 18.06.1-ce, build e68fc7a
    can anyone help please?
    Peter Viskovics
    @jr.visko_gitlab
    The issue is resolved, but I think it may be worth to mention on bde2020/hive that this is just part of the whole project, which can be found on https://github.com/big-data-europe/docker-hive
    Andrii Gakhov
    @gakhov
    Hi guys! For anyone interested in learning space-efficient data structures and fast algorithms that are extremely useful in modern Big Data applications, take a look at my recently published book "Probabilistic Data Structures and Algorithms for Big Data Applications" (ISBN: 978-3748190486). In this book, you can find algorithms and data structures for Membership querying (Bloom filter, Counting Bloom filter, Quotient filter, Cuckoo filter), Cardinality (Linear counting, probabilistic counting, LogLog, HyperLogLog, HyperLogLog++), Frequency (Majority algorithm, Frequent, Count Sketch, Count-Min Sketch), Rank (Random sampling, q-digest, t-digest), and Similarity (LSH, MinHash, SimHash). Check at Amazon or the book's webpage
    Zhoodar
    @zhoodar
    Hello there, does someone familiar with this isue big-data-europe/docker-hadoop#38
    It appeared when I tried to write data into remote hdfs.
    purbanow
    @purbanow
    Hi guys, When running a job with 1 spark worker + hadoop setup everything is going well by when im trying to run with 2 workers im getting : JvmPauseMonitor: Detected pause in JVM or host machine (eg GC)
    any ideas?
    Juan Santillana
    @ratasxy_twitter
    Hi I have a cuestiones in big-data-europe/hbase-docker for use the port 9090
    just should I expose the port?
    Xining Li
    @xiningli
    Hello, I am new here.
    Diego Quintana
    @diegoquintanav
    just should I expose the port?
    to connect a client to hdfs from the docker host, yes
    @ratasxy_twitter ^

    I'm also running into problems with that repo. How should I connect a client using the java API?

                    Configuration config = HBaseConfiguration.create();
                    config.set("hbase.zookeeper.quorum", "localhost");
                    config.set("hbase.zookeeper.property.clientPort", "2181");
                    HBaseAdmin.checkHBaseAvailable(config);

    Returns org.apache.hadoop.hbase.MasterNotRunningException: java.net.UnknownHostException: can not resolve hbase-master,16000,1592487871967

    Diego Quintana
    @diegoquintanav
    I'm getting that error
    Diego Quintana
    @diegoquintanav
    any ideas?
    billsteve
    @billsteve
    could you "ping hbase-master " or "telnet hbase-master 16000"?
    luigi-asprino
    @luigi-asprino
    Hi all, anyone can help me with this issue big-data-europe/docker-hadoop#79
    Anatoly Danilov
    @anatolyD
    Hey guys, the 3.0.0 is announced in README.md although i fail to find it in the docker registry, anyone had it working?
    billsteve
    @billsteve
    Which image?
    yunkai
    @JanYunkai
    java.net.ConnectException: Call to 5830b8c484ea/127.0.0.1:42667 failed on connection exception: org.apache.hbase.thirdparty.io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: 5830b8c484ea/127.0.0.1:42667
    docker-compose-standalone.yml
    yunkai
    @JanYunkai
    skip
    ktpktr0
    @ktpktr0
    Hey guys, I installed big-data-europe/docker-hadoop on swarm and they run successfully. But I found a problem. The datanode ip in hdfs dfsadmin -report does not match the actual datanode. This error prevents me from installing hbase correctly. I don’t know if anyone has encountered such a problem
    Marcel-Jan Krijgsman
    @Marcel-Jan
    Hi everyone,
    I created a shared Hadoop-Spark-Hive docker-compose based on your great Dockerized Hadoop, Spark and Hive versions.
    https://github.com/Marcel-Jan/docker-hadoop-spark
    And I blogged about it how I got it working here:
    https://marcel-jan.eu/datablog/2020/10/25/i-built-a-working-hadoop-spark-hive-cluster-on-docker-here-is-how/
    Ben Baysinger
    @BennyBaysinger_twitter
    @Marcel-Jan Have you been able to get the Namenode UI to allow upload and download of data?
    Marcel-Jan Krijgsman
    @Marcel-Jan
    @BennyBaysinger_twitter I've just tried. I can create a directory, but I can't upload a file. I'll look into that.
    4 replies