These are chat archives for thunder-project/thunder

7th
Oct 2015
alexandrelaborde
@AlexandreLaborde
Oct 07 2015 00:00
yeaahh
i just placed the files in /home/manager/orange/img
in orange and loadImages didn't complain.
I think it made some I will do it later spark thing
Jason Wittenbach
@jwittenbach
Oct 07 2015 00:02
so did it work?
alexandrelaborde
@AlexandreLaborde
Oct 07 2015 00:02
progress nontheless
Jason Wittenbach
@jwittenbach
Oct 07 2015 00:03
if you have that same share mounted on all of the machines, it should be good
alexandrelaborde
@AlexandreLaborde
Oct 07 2015 00:04
but calling data.dims sent me back to the Py4JJavaError FileNotFoundError: [Errno 2] No such file or directory: '/home/manager/orange/img/img0000.tif
I assume this happend in the workers since i got the same error 3 times
Jason Wittenbach
@jwittenbach
Oct 07 2015 00:05
there just needs to be a single path that gets you to the data on all the machines
workers, master, and driver
alexandrelaborde
@AlexandreLaborde
Oct 07 2015 00:07
assuming there is a problem there, if it works for the master shouldn't it work for at least one worker ? the on that is on tha same machine as the master ?
Jason Wittenbach
@jwittenbach
Oct 07 2015 00:09
yeah
though it definitely has to work on the driver as well
in fact, it probably doesn’t strictly have to work on the master
alexandrelaborde
@AlexandreLaborde
Oct 07 2015 00:13
ok this is even weirder.. if I do data=tsc.loadImages(path,inputFormat='tif') it goes instantaneously
and looking at data gives me
data
Images
nrecords: 5625
dtype: None (inspect to compute)
dims: None (inspect to compute)
and that is actually the number of images that I have
and then the data.dims doesn't work because the files weren't found
Jason Wittenbach
@jwittenbach
Oct 07 2015 00:21
that is strange
not sure what would cause that behavior
alexandrelaborde
@AlexandreLaborde
Oct 07 2015 00:26
And I get the same error if I do loadImageAsSeries or convertImagesToSeries
Jeremy Freeman
@freeman-lab
Oct 07 2015 00:26
@AlexandreLaborde that last bit suggests the filesystem is able to list the files, but not read them
some of this sounds like a permissions issue...
alexandrelaborde
@AlexandreLaborde
Oct 07 2015 00:26
so anything that has to actually work with the images
@freeman-lab It really does
Since Py4J is java related things, which version of java are you using ? I am using openjdk-7
Well... It is 1:30 AM so I giving up for today.. @jwittenbach I will mount this thing as NFS then see if that helps. Thanks for your time
Jeremy Freeman
@freeman-lab
Oct 07 2015 00:31
yeah that should be fine
i meant more, different permissions for different parts of your system
master can't read from workers
or something like that
Jason Wittenbach
@jwittenbach
Oct 07 2015 00:36
@AlexandreLaborde Hope you can find some success tomorrow. Before switching to NFS, might want to check that code snippit I posted earlier….and check things like permissions
also might be worth trying some simple PySpark things to make sure your Spark install is good
alexandrelaborde
@AlexandreLaborde
Oct 07 2015 00:40
i ran your code
and it work
Jason Wittenbach
@jwittenbach
Oct 07 2015 00:41
ah cool
so it works from the Driver and the Workers?
and finds all the files?
alexandrelaborde
@AlexandreLaborde
Oct 07 2015 00:43
sorry who is the driver ? orange ?
Jason Wittenbach
@jwittenbach
Oct 07 2015 00:45
whoever you are trying to run tsc.loadImages from
alexandrelaborde
@AlexandreLaborde
Oct 07 2015 00:48
orange then
yes it worked on it as well
Jason Wittenbach
@jwittenbach
Oct 07 2015 00:51
dang
could be permission then
just curious: does PySpark work?
alexandrelaborde
@AlexandreLaborde
Oct 07 2015 12:41

@jwittenbach @freeman-lab I changed from SSHFS to NFS and it appears to have solved the permissions issue...As before, using loadImages now returns the correct number of images I have but when I check the dims it returns a new error.

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-11-b1134f8a59b7> in <module>()
----> 1 data.dims

/tmp/spark-1d5624d6-9802-4213-bcb3-34cfd491ed08/userFiles-fa784cf1-71ee-4fff-a390-3fb484fa77c2/thunder_python-0.5.1-py2.7.egg/thunder/rdds/images.py in dims(self)
     27     def dims(self):
     28         if self._dims is None:
---> 29             self.populateParamsFromFirstRecord()
     30         return self._dims
     31 

/tmp/spark-1d5624d6-9802-4213-bcb3-34cfd491ed08/userFiles-fa784cf1-71ee-4fff-a390-3fb484fa77c2/thunder_python-0.5.1-py2.7.egg/thunder/rdds/images.py in populateParamsFromFirstRecord(self)
     40 
     41     def populateParamsFromFirstRecord(self):
---> 42         record = super(Images, self).populateParamsFromFirstRecord()
     43         self._dims = Dimensions.fromTuple(record[1].shape)
     44         return record

/tmp/spark-1d5624d6-9802-4213-bcb3-34cfd491ed08/userFiles-fa784cf1-71ee-4fff-a390-3fb484fa77c2/thunder_python-0.5.1-py2.7.egg/thunder/rdds/data.py in populateParamsFromFirstRecord(self)
     76         from numpy import asarray
     77 
---> 78         record = self.rdd.first()
     79         self._dtype = str(asarray(record[1]).dtype)
     80         return record

/home/user/Downloads/spark/python/pyspark/rdd.pyc in first(self)
   1293         ValueError: RDD is empty
   1294         """
-> 1295         rs = self.take(1)
   1296         if rs:
   1297             return rs[0]

/home/user/Downloads/spark/python/pyspark/rdd.pyc in take(self, num)
   1275 
   1276             p = range(partsScanned, min(partsScanned + numPartsToTry, totalParts))
-> 1277             res = self.context.runJob(self, takeUpToNumLeft, p, True)
   1278 
   1279             items += res

/home/user/Downloads/spark/python/pyspark/context.pyc in runJob(self, rdd, partitionFunc, partitions, allowLocal)
    896         port = self._jvm.PythonRDD.runJob(self._jsc.sc(), mappedRDD._jrdd, partitions,
    897                                           allowLocal)
--> 898         return list(_load_from_socket(port, mappedRDD._jrdd_deserializer))
    899 
    900     def show_profiles(self):

/home/user/Downloads/spark/python/pyspark/rdd.pyc in _load_from_socket(port, serializer)
    139     try:
    140         rf = sock.makefile("rb", 65536)
--> 141         for item in serializer.load_stream(rf):
    142             yield item
    143     finally:

/home/user/Downloads/spark/python/pyspark/serializers.pyc in load_stream(self, stream)
    137         while True:
    138             try:
--> 139                 yield self._read_with_length(stream)
    140             except EOFError:
    141                 return

/home/user/Downloads/spark/python/pyspark/serializers.pyc in _read_with_length(self, stream)
    162         if len(obj) < length:
    163             raise EOFError
--> 164         return self.loads(obj)
    165 
    166     def dumps(self, obj):

/home/user/Downloads/spark/python/pyspark/serializers.pyc in loads(self, obj, encoding)
    419     else:
    420         def loads(self, obj, encoding=None):
--> 421             return pickle.loads(obj)
    422 
    423 

/home/user/anaconda/lib/python2.7/site-packages/PIL/Image.pyc in __setstate__(self, state)
    633         Image.__init__(self)
    634         self.tile = []
--> 635         info, mode, size, palette, data = state
    636         self.info = info
    637         self.mode = mode

ValueError: too many values to unpack

As far as I know, this error is related to number of parameters being used in the attribution. I am digging deeper into the Image.py so that I can see what is inside state

alexandrelaborde
@AlexandreLaborde
Oct 07 2015 13:05
this is the content of state when calling data.dimsafter data=tsc.loadImages(path,inputFormat='tif').
{
    'info': {'compression': 'raw', 'dpi': (2.54, 2.54)},

    'category': 0,

    'palette': None,

    'fp': <_io.BytesIO object at 0x7fac30206d10>,

    'decodermaxblock': 65536,

    'decoderconfig': (),

    'tag': <PIL.TiffImagePlugin.ImageFileDirectory object at 0x7fac30213890>,

    'filename': '',

    'ifd': <PIL.TiffImagePlugin.ImageFileDirectory object at 0x7fac30213890>,

    '_TiffImageFile__frame': 0,

    'readonly': 1,

    '_compression': 'raw',

    '_TiffImageFile__fp': <_io.BytesIO object at 0x7fac30206d10>,

    '_planar_configuration': 1,

    '_TiffImageFile__next': 0,

    'tile': [('raw', (0, 0, 900, 900), 227, ('I;16B', 0, 1))],

    'im': None,

    'size': (900, 900),

    '_TiffImageFile__first': 8,

    'mode': 'I;16B'
}
do you guys spot anything unusual ?
Jeremy Freeman
@freeman-lab
Oct 07 2015 13:48
hm i don't, though we have run into occasional issues with incompatible tif formats
can you load these tifs locally (not with thunder) using PIL?
alexandrelaborde
@AlexandreLaborde
Oct 07 2015 14:06
Yes I can
I did im = Image.open ( '/tmp/folder/f1/images0005.tif') and im.show() displayed the image properly
the problem appears to be in Image.__setstate__
Jeremy Freeman
@freeman-lab
Oct 07 2015 14:14
is the version of PIL on your master and workers the same?
alexandrelaborde
@AlexandreLaborde
Oct 07 2015 14:20
yap.. everybody has PIL v 1.1.7
Jeremy Freeman
@freeman-lab
Oct 07 2015 14:24
mind posting an example tif somewhere?
alexandrelaborde
@AlexandreLaborde
Oct 07 2015 14:26
I was about to ask you for a tif myself:)
Jeremy Freeman
@freeman-lab
Oct 07 2015 14:26
oh is this still loading the example tifs? or your own?
in that case, you already have some =)
I was loading some images from our LSM trials
Jeremy Freeman
@freeman-lab
Oct 07 2015 14:32
ah ok cool, looks like i don't have access to that drive
alexandrelaborde
@AlexandreLaborde
Oct 07 2015 15:08
sorry about that
this should work
does thunder support 16bit TIF images ?
alexandrelaborde
@AlexandreLaborde
Oct 07 2015 15:14
we have our own tiff writter.. do you think it may have some problem and that is causing all of this ?
Jeremy Freeman
@freeman-lab
Oct 07 2015 15:16
that could definitely be an issue!
though if PIL loads it for you locally, I surprised it doesn't work within Thunder
that link isn't working either
nevermind, got it
alexandrelaborde
@AlexandreLaborde
Oct 07 2015 15:19
ok :smile: I am testing with the example fish images
ok our images are the problem
Jeremy Freeman
@freeman-lab
Oct 07 2015 15:21
maybe... i seem to be loading your image fine =)
alexandrelaborde
@AlexandreLaborde
Oct 07 2015 15:21
I copied your examples to my shared folder and it works
Jeremy Freeman
@freeman-lab
Oct 07 2015 15:21
data = tsc.loadImages('impro0005.tif', inputFormat='tif')
data.dims
>> (672, 456)
data.nrecords
>> 1
that's on my local machine
alexandrelaborde
@AlexandreLaborde
Oct 07 2015 15:22
ok let me check if I can load locally
alexandrelaborde
@AlexandreLaborde
Oct 07 2015 15:27
I can open locally but not in the cluster
I am going to open locally from a local thunder inside one of the cluster's machines
Jeremy Freeman
@freeman-lab
Oct 07 2015 15:28
interesting, could try
if i had to bet, i'd say that there is some library, either PIL or a lower level tif library, that has a different version on your worker nodes
alexandrelaborde
@AlexandreLaborde
Oct 07 2015 15:35
ok I can run it locally in all of the machines
but not on the cluster
Jeremy Freeman
@freeman-lab
Oct 07 2015 15:40
and can you load the test example data across the cluster?
alexandrelaborde
@AlexandreLaborde
Oct 07 2015 15:47
yes
fish images
data
Images
nrecords: 20
dtype: uint8
dims: min=(0, 0, 0), max=(75, 86, 1), count=(76, 87, 2)
i will convert the fish images from uint8 to uint16 and see if that is the problem
alexandrelaborde
@AlexandreLaborde
Oct 07 2015 16:17
@freeman-lab @jwittenbach I guess the problem is the 16bit . I converted My 16bit tif to 8bit and I can manipulate it in the cluster just fine
do you have any 16bit image that you know it works onThunder ?
alexandrelaborde
@AlexandreLaborde
Oct 07 2015 16:49
I definitely PIL and the uncompressed 16bit tif combination stackoverflow thread
which is the type of images you ussually work with ?
Jason Wittenbach
@jwittenbach
Oct 07 2015 17:00
the example files are 8-bit TIF
alexandrelaborde
@AlexandreLaborde
Oct 07 2015 17:42
@jwittenbach Do you have any 16bit image that works on thunder ?I do not care if it is a black square.. just that it works
Jason Wittenbach
@jwittenbach
Oct 07 2015 18:04
@AlexandreLaborde Ah sorry, can’t say that I do