These are chat archives for thunder-project/thunder

3rd
Nov 2015
andrew giessel
@andrewgiessel
Nov 03 2015 15:36
hello friends! I am trying to submit python scripts to to an ec2 thunder cluster that I launched with thunder-ec2
I have opened port 7077 on master
but get what appears to be connection timeouts
i’m using thunder-submit in case there needs to be matching versions
the spark version i’m running is brew’s 1.5.1 and i’m using thunder-python-0.5.1
if anyone’s tried this, I’d love some pointers
Jeremy Freeman
@freeman-lab
Nov 03 2015 15:38
yo! just so i understand, are you submitting from your laptop to the ec2 cluster?
andrew giessel
@andrewgiessel
Nov 03 2015 15:38
ya
Hi dude
Jeremy Freeman
@freeman-lab
Nov 03 2015 15:38
ah, so i’m fairly certain that doesn’t work, at least not as things are configured now
andrew giessel
@andrewgiessel
Nov 03 2015 15:39
I am trying to basically figure out how to submit jobs non-REPL style
Jeremy Freeman
@freeman-lab
Nov 03 2015 15:39
you’d log into to the cluster’s master
and then call thunder-submit on the master
andrew giessel
@andrewgiessel
Nov 03 2015 15:39
ah, i see
Jeremy Freeman
@freeman-lab
Nov 03 2015 15:39
basically, thunder-ec2 login and then thunder-submit
andrew giessel
@andrewgiessel
Nov 03 2015 15:39
can i use spark-submit in client mode?
Jeremy Freeman
@freeman-lab
Nov 03 2015 15:39
remote job submission is being explored by some other libraries (IBM is working on something called livy I think)
possible, yes =)
andrew giessel
@andrewgiessel
Nov 03 2015 15:40
ok =)
Jeremy Freeman
@freeman-lab
Nov 03 2015 15:40
i haven’t explored the neccessary configuration for that in detail
you can definitely have a driver (doing job submission) that’s not the master (scheduling the workers)
but i’m fairly certain there are requirements about everyone being on the same network
andrew giessel
@andrewgiessel
Nov 03 2015 15:40
the idea is that we have a whole infrastructure (using celery) that runs python functions on a schedule or on demand - i’d like those things to run spark jobs if needed
Jeremy Freeman
@freeman-lab
Nov 03 2015 15:40
ah interesting
so the infrastructure is running on EC2?
andrew giessel
@andrewgiessel
Nov 03 2015 15:41
yes
Jeremy Freeman
@freeman-lab
Nov 03 2015 15:41
and you submit jobs to it from a local machine?
and want to have spark jobs be part of that process?
yeah, would love to hear more if you dig in
andrew giessel
@andrewgiessel
Nov 03 2015 15:41
yes except for submit local machien
it’s a task queue system
so it’s a service that runs on a host
Jeremy Freeman
@freeman-lab
Nov 03 2015 15:41
you could do something cludgy like execute a remote command on the master via ssh
andrew giessel
@andrewgiessel
Nov 03 2015 15:42
and listens for enqueued jobs on a AMQP server (those jobs are enqueued by node, or internally )
ya the remote command occured to me too
thanks for the quick response
another approach which might make this work is to do somethign crazy like dual-purpose all our task workers as spark cluster nodes as well
aka trade devops induced insanity for ec2 costs
andrew giessel
@andrewgiessel
Nov 03 2015 15:48
FWIW, yes, thunder-submit works locally on master
Jeremy Freeman
@freeman-lab
Nov 03 2015 16:46
cool thanks for confirming
that other idea sounds pretty crazy :scream:
Shanmugam Ramasamy
@shansrockin
Nov 03 2015 23:12
Hello. I am a new user of thunder. Its a pretty cool software which I want to learn. I just have a small doubt. I created a customized 2D dataset in matlab and imported it into thunder as a Series object. However I am not able to compute the dimensions of it. Not sure why. The traceback says TypeError: argument 3 to map() must support iteration

The following are the commands I typed .

data = tsc.loadSeries(seriespath + '/kmeansdataset.mat', inputFormat='mat', varName='data', minPartitions=5)
data.first()
RETURNS (0, array([ 0.53766714, -0.86365282]))
data
RETURNS Series
RETURNS nrecords: 100
RETURNS dtype: float64
RETURNS dims: None (inspect to compute)
RETURNS Index: [0 1]
data.dims
RETURNS Returns an error saying ( line 17, in merge, self.min = tuple(map(min, self.min, value)) TypeError: argument 3 to map() must support iteration )