Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
Davis Bennett
@d-v-b
i don't understand how reading from a single file can be faster than reading from multiple files in a parallel context, but whatever works!
Kyle
@kr-hansen
It's a difference of 3-7 seconds or so from my testing, so not much, but it is faster in both cases on Spark than it is in local mode.
Gilles Vanwalleghem
@Yassum
Hello, so I was wondering if anyone has any idea how to produce these kind of plots : http://research.janelia.org/zebrafish/trajectories.html
As far as I see from the paper, that's supposed to be the online methods, but it's a bit sparse in details
mshahabi
@mshahabi
Hello all, I am new to thunder, I have a quick question I was wondering if thunder is an appropriate tool for applying large scale multi variable clustering with a mixture of spatial and temporal data... here is how my data looks like ['lable', 'Origin [lat, long]', 'Destination' [lat,long], 'distance', 'departure_time', 'total_travel_time', 'arrival_time',
]
mellertd
@mellertd
Hello everyone. My organization is trying to get Thunder going on a Yarn cluster using virtual environments. Has anyone tried this? It seems to be a little outside of the norm from what I have found, and it is certainly different from how things were done at Janelia when I was there. I am wondering whether this is an okay idea that we should continue to pursue
Boaz Mohar
@boazmohar
@mellertd As far as I understand, yarn mode is different for how you deploy your spark cluster, in Janelia it is in standalone mode. As far as thunder is concerned, they are both the same. You would need a Jupyter notebook and a way to get a spark context in it, which might be different then how it is done in Janelia. Is this the part you need help with?
mellertd
@mellertd
I guess the question had more to do with the virtual environment issue. I recall at Janelia, Thunder and dependencies were installed on all the nodes, so you could launch standalone clusters and everything worked. For various reasons, we don’t want to maintain installs on all the nodes and would rather let users manage their own environments. This is possibwith yarn mode, but it is rather clunky and is not officially supported in interactive mode (i.e. in Jupyter)
I am just wondering if anyone had any experience with this, because there seem to be many degrees of freedom to get things working well
Boaz Mohar
@boazmohar
I ah
I have used a virtual environment with thunder, you should change the environmental variable SPRAK_PYTHON and both the driver and workers would see the same python virtual environment.
mellertd
@mellertd
It is actually much more
Complicated than that
This is whatwe are currently tring:
Hm this chat is unusuable on iOS safari, I’ll paste a link when I can get to my laptop
Our tests seem to work with virtualenv, but it is very slow. Haven't gotten it to work with Conda yet, but it should not work any differently. I am currently trying to figure out how we might speed things up
Boaz Mohar
@boazmohar
I am definitely not an expert here, and have not used any of these using spark submit. But for interactive mode I have used this code to make sure I am in the right enviorenmt for the driver and workers:
import numpy
print(numpy.__file__)
def test1(x):
    import numpy
    return numpy.__file__
data = sc.range(10).map(test1).collect()
print(data[0])
and PYSPARK_PYTHON worked by pointing it to the virtual environment from conda: export PYSPARK_PYTHON=/groups/svoboda/home/moharb/anaconda2/envs/py35/bin/python
Sid-Sloth
@Sid-Sloth
As we all know,spark support two kind of changes denote transform and acrion.so how can I know the API in the thunder belong to transform or action? @boazmohar
Renato Marinho
@renatomarinho
This message was deleted
Vimmal Sivakumar
@bluefoo19_twitter
after installing thunder what do i have to do in order to start interacting with the software
srinu989
@srinu989

Hi

We are trying to connect hive tables from python on windwos,while connecting facing issues.We are sending our full details please help on this.
we installed python version 2.7.15 and anaconda version 2.7.14
We installed below packages

pip install sasl
pip install thrift
pip install thrift-sasl
pip install PyHive

we written the below code to connect hive tables from python script

from pyhive import hive
conn = hive.Connection(host="172.16.17.196", port=10000, username="mapr", database="default")
cursor = conn.cursor()
cursor.execute("SHOW DATABASES")
for result in cursor.fetchall():
use_result(result)

We are getting below error

==========================

Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Users\mapr\Anaconda2\lib\site-packages\pyhive\hive.py", line 64, in connect
return Connection(args, *kwargs)
File "C:\Users\mapr\Anaconda2\lib\site-packages\pyhive\hive.py", line 162, in init
self._transport.open()
File "C:\Users\mapr\Anaconda2\lib\site-packages\thrift_sasl__init__.py", line 79, in open
message=("Could not start SASL: %s" % self.sasl.getError()))
thrift.transport.TTransport.TTransportException: Could not start SASL: Error in sasl_client_start (-4) SASL(-4): no mechanism available: Unable to find a callback: 2

srinu989
@srinu989
Please help on this issue...if anybody know
Renato Marinho
@renatomarinho
This message was deleted
Eco_Econ_Heartbeat
@Heart_Beacon_twitter

China, #Russia, #Germany setting up their own #SWIFT #financial #transaction networks.. what could go wrong? Stochastic Harmonization over the proposed UTZ Universal Time Zone? Common #OPSCODE syntax lexicon #Rosetta Stone? https://www.activistpost.com/2017/04/russia-china-preparing-alternative-banking-architecture.html

anciubo
@anciubo
Hi you all, img_mean = data.seriesMean().pack()
Hi you all, img_mean = data.seriesMean().pack() when trying to run this from one of the examples I got the following error: AttributeError: 'Series' object has no attribute 'seriesMean'. Is there any updated documentation for thunder?
Thanks
Suman S
@Sumans169

hello all, if you ready to secure your data then take through Information security management system(ISO 27001) Certification with Certivatic just click ISO Certification in UAE for more details.

https://certivatic.com/iso-certification-in-uae/