Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
    Jonathan Dinu
    @hopelessoptimism
    @amitkrdutta it should be alright, I haven't done it with t2.large however so I cannot validate myself unfortunately
    amitkrdutta
    @amitkrdutta
    @Jay-Oh-eN when I am running following pssh command from the master pssh -h /root/spark-ec2/slaves yum install -y python27 python27-devel
    Could not open hosts file: No such file or directory . Do we have to set up any host file I followed exact steps in the document.
    amitkrdutta
    @amitkrdutta
    After that I tried to add a hosts.txt file with the public dns names and ran the command : pssh -h /root/spark-ec2/slaves/hosts.txt yum install -y python27 python27-devel but that too threw error [1] 19:43:40 [FAILURE] ec2-xyz1.us-west-2.compute.amazonaws.com Exited with error code 255
    [2] 19:43:40 [FAILURE] ec2-xyz2-.us-west-2.compute.amazonaws.com Exited with error code 255
    [3] 19:43:40 [FAILURE] ec2-xyz3.us-west-2.compute.amazonaws.com Exited with error code 255 can you please help
    amitkrdutta
    @amitkrdutta
    This message was deleted
    Jonathan Dinu
    @hopelessoptimism
    @amitkrdutta what version of Spark and Hadoop did you start the cluster with? The /root/spark-ec2/slaves file is what the Spark ec2 setup scripts create. But this is for Spark version 1.4.1 (prebuilt for Hadoop 2.4 or earlier).
    amitkrdutta
    @amitkrdutta
    @Jay-Oh-eN I am using 1.6 with hadoop 2.4
    amitkrdutta
    @amitkrdutta
    @Jay-Oh-eN Thanks for your reply. The issue is resolved ... I think the cluster was not installed properly. I recreated it and it worked fine. Now I could see the slaves folder...
    vishal mishra
    @vishalmishra14
    hello
    i m having problem while installing spark locally..
    please help me ..
    whenever i m typing pyspark in my command line in ubuntu, it is asking , no such command found
    vishal mishra
    @vishalmishra14
    saying**
    Jonathan Dinu
    @hopelessoptimism
    @vishalmishra14 have you followed along through videos 1.5 - 1.7 here? http://my.safaribooksonline.com/video/operating-systems-and-server-administration/apache/9780134393490
    vishal mishra
    @vishalmishra14
    yes.
    please help me.
    Jonathan Dinu
    @hopelessoptimism
    @vishalmishra14 which version of Spark did you download/install?
    @vishalmishra14 it most likely is a PATH issue but I would need more information to help. You can run pyspark from the ./bin/ folder of the download: http://spark.apache.org/docs/latest/programming-guide.html#linking-with-spark
    vishal mishra
    @vishalmishra14
    i have installed spark-1.6.2-bin-hadoop2.4
    my bash_profile is
    export SPRAK_HOME=/home/vishal/spark-1.6.2-bin-hadoop2.4
    export PYTHONPATH=/home/vishal/spark-1.6.2-bin-hadoop2.4/python/:$PYTHONPATH
    export PATH=/home/vishal/Anaconda2-4.1.0-Linux-x86_64.sh/bin:$PATH
    export PATH="/home/vishal/anaconda2/bin:$PATH"
    vishal mishra
    @vishalmishra14
    vishal@ubuntu:~$ cd /home/vishal/bin/pyspark
    bash: cd: /home/vishal/bin/pyspark: No such file or directory
    Jonathan Dinu
    @hopelessoptimism
    @vishalmishra14 it looks like you misspelled your SPARK_HOME: export SPRAK_HOME=/home/vishal/spark-1.6.2-bin-hadoop2.4
    @vishalmishra14 also the materials have been developed with Spark 1.4.1, it should work with Spark 1.6.* but I unfortunately cannot make any guarantees or troubleshoot issues that arise from using any version other than 1.4.1
    @vishalmishra14 but in either case you should be able to cd into the Spark download itself to run pyspark. It looks like you are trying to run pyspark from your computer's bin/ directory rather than the bin/ directory in the Spark download. If you have downloaded Spark to your home directory: cd /home/vishal/spark-1.6.2-bin-hadoop2.4/ then ./bin/pyspark. Here are the official docs for 1.6.2: http://spark.apache.org/docs/latest/programming-guide.html#using-the-shell
    vishal mishra
    @vishalmishra14
    vishal@ubuntu:~/spark-1.6.2-bin-hadoop2.4/bin$ pysark.cmd
    pysark.cmd: command not found
    vishal@ubuntu:~/spark-1.6.2-bin-hadoop2.4/bin$ pyspark
    pyspark: command not found
    vishal@ubuntu:~/spark-1.6.2-bin-hadoop2.4/bin$ spark-shell
    spark-shell: command not found
    vishal@ubuntu:~/spark-1.6.2-bin-hadoop2.4/bin$
    i am using ubuntu in virtual machine on windows 10 lappy
    Jonathan Dinu
    @hopelessoptimism
    @vishalmishra14 if you are in a Ubuntu virtual machine you shouldn't need .cmd at the end of the commands. Unfortunately I can't help debug much more with what information I have. Just make sure that you can find the executable (pyspark) through a series of ls and cd commands. Once you find it you should be able to run the executable. I recommend following along with the official docs if you are using 1.6.2: http://spark.apache.org/docs/latest/quick-start.html
    @vishalmishra14 the other option is to go back through the videos and double check that you followed each step correctly
    Jonathan Dinu
    @hopelessoptimism
    This message was deleted
    pyspark.png
    @vishalmishra14 it look like you are not executing the program correctly. It should be all in one command: ~/spark-1.6.2-bin-hadoop2.4/bin/pyspark
    Jonathan Dinu
    @hopelessoptimism
    @vishalmishra14 hope that helps, and to get up to speed quickly on a Linux command line environment if you are not familiar, I recommend: https://www.learnenough.com/command-line-tutorial
    vishal mishra
    @vishalmishra14
    thank jonathan ... I hope you are not offended ... sorry for annoying you for long.
    Jonathan Dinu
    @hopelessoptimism
    @vishalmishra14 oh no not offended at all! That's what I am here for. Its just hard to debug operating/computer issues remotely especially since there is a somewhat unlimited amount of possible combinations of versions/OS/virtual machines/backgrounds/etc. to know for sure what the issues are
    vishal mishra
    @vishalmishra14

    In [1]: import pyspark as ps

    In [2]: sc
    Out[3]: ''

    In [4]: sc =ps.Spark

    AttributeError Traceback (most recent call last)

    <ipython-input-4-e285b1799222> in <module>()
    ----> 1 sc =ps.Spark

    AttributeError: 'module' object has no attribute 'Spark'

    pyspark
    vishal mishra
    @vishalmishra14
    i was getting an error for sc = ps.Spark
    so what i did was, i wrote , from pyspark import SparkConf, SparkContext, SparkFiles, SparkJobInfo, SparkStageInfo
    is it ok to work on with
    vishal mishra
    @vishalmishra14
    and it gives me an error , if i type sc. to see what all commands i can use with sparkcontext.
    vishal mishra
    @vishalmishra14
    do i have to set the path for py4j in bash file.?
    vishal mishra
    @vishalmishra14

    vish = person('vishal', 'google')

    In [13]: vish.name
    Out[13]: 'vishal'

    In [14]: vish.company
    Out[14]: 'google'

    In [15]: def say_hello(self):
    ....: return "hello my name is {0} and I work at {1}".format(self.name, self.company)
    ....:

    In [16]: person.say_hello = say_hello

    In [17]: vish.say_hello()
    Out[17]: 'hello my name is vishal and I work at google'

    In [18]: import person as p

    ImportError Traceback (most recent call last)

    <ipython-input-18-e97d19571fee> in <module>()
    ----> 1 import person as p

    ImportError: No module named person

    Jonathan Dinu
    @hopelessoptimism
    @vishalmishra14 what are you trying to do with ps.Spark? I don't believe ps.Spark is a module in the library. To use the Pyspak library you almost always initialize it with: ps.SparkContext()
    @vishalmishra14 to import person you need to have a separate Python file named person.py in the same directory. This is just a standard Python module import, you can read more here: https://docs.python.org/2/tutorial/modules.html
    vishal mishra
    @vishalmishra14
    This message was deleted
    vishal mishra
    @vishalmishra14
    Screenshot from 2016-07-24 13^%17^%10.png
    Venkat
    @venkat01
    Hi Jonathan. I am at lesson 3.3 and rdd_csv_corrrect = rdd_no_header.map(lambda line: csv.reader([line]).next()) is giving an error in .next(). i am using python 3 here. The error is AttributeError: '_csv.reader' object has no attribute 'next()'
    Can you help me here?
    ALLEN KENDRA
    @ALLENKENDRA5_twitter
    hello here
    can someone just explain to me whats this group for?