Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
  • Jan 22 05:42
    dos65 commented #554
  • Jan 21 17:12
    deveshkataria opened #555
  • Jan 20 07:27
    daffydee13 commented #554
  • Jan 19 17:02
    dos65 commented #554
  • Jan 19 15:50
    daffydee13 commented #554
  • Jan 19 12:05
    dos65 closed #553
  • Jan 19 12:05
    dos65 commented #553
  • Jan 19 12:03
    dos65 commented #554
  • Jan 19 08:39
    daffydee13 commented #554
  • Jan 19 08:33
    daffydee13 edited #554
  • Jan 19 08:32
    daffydee13 edited #554
  • Jan 19 07:53
    daffydee13 opened #554
  • Jan 18 16:23
    daffydee13 commented #553
  • Jan 17 17:59
    blvp commented #553
  • Jan 17 17:21
    deveshkataria opened #553
  • Dec 17 2019 12:13
    dos65 opened #552
  • Dec 17 2019 12:13
    dos65 labeled #552
  • Oct 07 2019 09:48
    dos65 closed #550
  • Oct 07 2019 09:48
    dos65 closed #551
  • Oct 07 2019 09:18
    dos65 opened #551
Ali Emirhan Kurt
@AEKurt
@dos65 ty
Vadim Chelyshov
@dos65
@AEKurt also, I made a new release - 1.1.3. It includes fixes for problem that you discovered
Ali Emirhan Kurt
@AEKurt
thank you if i discover anything else i will inform you
balauppalapati
@balauppalapati

Hi.. I have migrated my project to 2.12. I am using spark 2.4.0 with mist 1.1.3. I have created assembly jar and submitted to mist-cli. None of the jobs are being listed as functions in mist ui. Anamolies which i have noticed are:

  1. When I have submitted jar to mist-cli, only get url's are being listed. Earlier post url's along with corresponding inputs used to be listed.
  2. Found this type of error corresonding to each job in infoprovider.log,
    2019-07-30 20:37:17 INFO FunctionInfoProviderActor:107 - Responding with err on GetFunctionInfo(com.scienaptic.spark.jobs.PMMLEvaluatorJob$,/home/bala/Downloads/mist/mist-1.1.3/data/artifacts/iris_0.0.1.jar,pmml-evaluator,EnvInfo(PythonEntrySettings(python,python))): class java.lang.NoClassDefFoundError Could not initialize class com.scienaptic.spark.jobs.PMMLEvaluatorJob$
    where as the given job exists in the corresponding path.

Pretty much sure that conf is proper given similar config was working fine earlier. Is there anything which i am missing ?

Vadim Chelyshov
@dos65
Are you sure that you use mist and spark distros that were built for scala 2.12?
balauppalapati
@balauppalapati
To run mist locally, picked it from http://repo.hydrosphere.io/hydrosphere/static/mist-1.1.3.tar.gz
Similarly included mist 2.12 build in dependencies of project. I am skeptic about spark build. I tried with https://archive.apache.org/dist/spark/spark-2.4.0/spark-2.4.0-bin-hadoop2.7.tgz.
Also tried spark build listed in mist release 1.1.0 - http://repo.hydrosphere.io/hydrosphere/static/spark/spark-2.4.0-bin-hadoop2.7-scala-2.12.tgz
Vadim Chelyshov
@dos65
balauppalapati
@balauppalapati
Ok. Will try with this build
Vadim Chelyshov
@dos65
And I'm not fully sure about spark build from repo.hydrosphere.io. I just built it from sources to check that mist works on 2.12. I can't remember if I tested python on it or not.
balauppalapati
@balauppalapati
Have tried it with above mist and spark build listed in release. It worked.
Thanks @dos65. I think it would be better to update docs for 2.12.
Vadim Chelyshov
@dos65
Great!
You right about docs, these things should be mentioned )
Rajan Chauhan
@cw-rajanchauhan
I am trying to build a functionality which can provide the user estimated time for the job completion.
For achieving that the app-id being used by the spark is required. Currently the rest endpoint of mist doesn't expose this information in the api response instead it is mist job id.
Ref : https://spark.apache.org/docs/2.4.3/monitoring.html#rest-api
Vadim Chelyshov
@dos65
@cw-rajanchauhan
There is a field workerId and it's used as a spark app-id. This field only appears when the state of job changes to `running.
Rajan Chauhan
@cw-rajanchauhan
workerId is used as the app name but can be easily used to extract app-id via spark rest REST api.
But Thanks for the pointing out.
SemanticBeeng
@SemanticBeeng
Hello @dos65 - looking to understand well how Mist cooperates with the various cluster managers that Spark works with and how the worker management works more precisely. What is the best place to get this from please?
SemanticBeeng
@SemanticBeeng

Also, I would like to get the data out in Apache Arrow format and not to marshall it to any of the types of channels currently available (Kafka and the like): instead would like to take advantage of the integration of Spark and Arrow and the ability of Arrow to serialize large data chunks efficiently and in platform independent ways.

How would we do that in Mist? Can we discuss please?

Please see these for an understanding of Apache Arrow use cases and get inspiration for how Mist could be advanced to take part in them.

https://twitter.com/semanticbeeng/status/1139789288856571904
https://twitter.com/semanticbeeng/status/1153219404009791488
https://twitter.com/semanticbeeng/status/1139926450914680834
https://twitter.com/semanticbeeng/status/1159373385635381248
https://twitter.com/semanticbeeng/status/1162751545613705216

Vadim Chelyshov
@dos65

@SemanticBeeng
Hi. Mist works with any cluster manager that Spark supports but it's hard to say that it cooperates with them. Except for some internal things. To execute a job it just submits a special spark-driver application into Spark.

To configure cluster and other spark settings use sparkConf section in context configuration. See more here - https://hydrosphere.io/mist-docs/contexts.html

Vadim Chelyshov
@dos65

About Arrow. Unfortunately, it's not possible.
At first, returning large responses from functions isn't possible by design. Response size is limited and it should be used only to provide the most important information. For example, instead of returning large resulting datasets, you may return a hdfs-path where you saved it.

Also, it's almost impossible to provide support for different output formats in functions. The main communication channel for mist is HTTP, so it's hard to return anything meaningful except JSON.

SemanticBeeng
@SemanticBeeng
I do not mean to "return" large data sets: in fact large data sets should not be serialized if it can be avoided, especially while still in the middle of the processing logic like in case of feature extraction.

Thinking to use something like Apache Crail https://crail.incubator.apache.org/overview/ for high-efficiency zero-copy of large data sets between processes (and even machines / containers).

Crail uses DiSNI, a user-level network and storage stack for the Java virtual machine. DiSNI allows data to be exchanged in a zero-copy fashion between Java I/O memory and remote storage resources over RDMA.

If you can / care to have an interest in this direction please let me know.

Al Malek
@almalek
I like to run mist with scala 2.13.1. Or, can I add Scala 2.12 on my mac and run mist
Mithun Raj Arackal
@mithunvillae

@dos65 I was trying to run the example python job given in the repo
https://github.com/Hydrospheredata/hello_mist. I was able to upload the artifact to the mist-server and I am able to see the function in UI. But when I try to run the hello-mist job it's throwing up the following error

ERROR 2019-12-16T16:28:00.574 FailedEvent with Error:
java.lang.Exception: Error in python code:
at io.hydrosphere.mist.python.PythonCmd$class.invoke(PythonExecuter.scala:84)
at io.hydrosphere.mist.python.PythonFunctionExecutor.invoke(PythonExecuter.scala:142)
at io.hydrosphere.mist.worker.runners.python.PythonRunner.run(PythonRunner.scala:16)
at io.hydrosphere.mist.worker.WorkerActor$$anonfun$io$hydrosphere$mist$worker$WorkerActor$$run$1.apply(WorkerActor.scala:97)
at io.hydrosphere.mist.worker.WorkerActor$$anonfun$io$hydrosphere$mist$worker$WorkerActor$$run$1.apply(WorkerActor.scala:97)
at io.hydrosphere.mist.worker.CancellableFuture$$anon$1.run(CancellableFuture.scala:17)
at java.lang.Thread.run(Thread.java:748)

I haven't changed anything in the example repo as well as mist source code. Still the issue persists.
The example was working for Scala. I also wrote custom functions in Scala and was able to run them without any issues.

Mithun Raj Arackal
@mithunvillae
Tried older versions of mist viz. 1.0.0, 1.1.1, 1.1.2 from http://repo.hydrosphere.io/hydrosphere/static/
But still issue persists
Vadim Chelyshov
@dos65
@mithunvillae there might be a problem with python executable. Do you have python2 on your machine?
If not, you can fix it by changing https://github.com/Hydrospheredata/hello_mist/blob/master/python/conf/10context.conf#L5 - this line and set the correct name of python executable.
Mithun Raj Arackal
@mithunvillae
@dos65 I just checked my machine. It does have python2.
Mithun Raj Arackal
@mithunvillae
Tried changing to python3. Still receiving the same error.
Vadim Chelyshov
@dos65
Hmm, I need some time to try to reproduce it and remember how to get more information from pythonExecutor
Mithun Raj Arackal
@mithunvillae
Cool. Thanks :)
Vadim Chelyshov
@dos65

@mithunvillae

It's hard to say what is happening. Some error occurs with py4j and it doesn't report any error back. It just exists with non-zero exit code.

There should be a message like Running python above the FailedEvent line.
Could you try to build the shell command from this log and try to run it manually from $MIST_HOME directory? Maybe there will be any stderr from that may help.

For example, I have to following log message: Running python task: List(python2, /home/dos65/projects/mist/target/mist-run-1.1.3/mist-worker.jar, --module, python_execute_script, --gateway-port, 37903), env List((PYTHONPATH,../../spark_local/spark-2.4.0-bin-hadoop2.7/python:../../spark_local/spark-2.4.0-bin-hadoop2.7/python/lib/py4j-0.10.7-src.zip), (PYSPARK_PYTHON,python2), (PYSPARK_DRIVER_PYTHON,python2))

And this is the command from it PYTHONPATH=../../spark_local/spark-2.4.0-bin-hadoop2.7/python:../../spark_local/spark-2.4.0-bin-hadoop2.7/python/lib/py4j-0.10.7-src.zip PYSPARK_PYTHON=python2 PYSPARK_DRIVER_PYTHON=python2 python2 /home/dos65/projects/mist/target/mist-run-1.1.3/mist-worker.jar --module python_execute_script --gateway-port 37903

Mithun Raj Arackal
@mithunvillae
@dos65 When I run the command using python3 I'm getting the following error
Command:
PYTHONPATH=/home/lenovo/Installations/spark-2.4.4-bin-hadoop2.7/python:/home/lenovo/Installations/spark-2.4.4-bin-hadoop2.7/python/lib/py4j-0.10.7-src.zip PYSPARK_PYTHON=python3 PYSPARK_DRIVER_PYTHON=python3 python3 /home/lenovo/Installations/mist-1.1.3/mist-worker.jar --module python_execute_script --gateway-port 36309
Traceback (most recent call last):
  File "/home/lenovo/Installations/spark-2.4.4-bin-hadoop2.7/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 929, in _get_connection
IndexError: pop from an empty deque

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/lenovo/Installations/spark-2.4.4-bin-hadoop2.7/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1067, in start
ConnectionRefusedError: [Errno 111] Connection refused

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/lenovo/Installations/mist-1.1.3/mist-worker.jar/__main__.py", line 17, in <module>
  File "/home/lenovo/Installations/mist-1.1.3/mist-worker.jar/__main__.py", line 13, in _main
  File "/home/lenovo/Installations/mist-1.1.3/mist-worker.jar/mistpy/python_execute_script.py", line 55, in execution_cmd
  File "/home/lenovo/Installations/spark-2.4.4-bin-hadoop2.7/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 179, in java_import
  File "/home/lenovo/Installations/spark-2.4.4-bin-hadoop2.7/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 983, in send_command
  File "/home/lenovo/Installations/spark-2.4.4-bin-hadoop2.7/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 931, in _get_connection
  File "/home/lenovo/Installations/spark-2.4.4-bin-hadoop2.7/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 937, in _create_connection
  File "/home/lenovo/Installations/spark-2.4.4-bin-hadoop2.7/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1079, in start
py4j.protocol.Py4JNetworkError: An error occurred while trying to connect to the Java server (127.0.0.1:36309)
When I ran with python2:
Traceback (most recent call last):
  File "/usr/lib/python2.7/runpy.py", line 174, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
    exec code in run_globals
  File "/home/lenovo/Installations/mist-1.1.3/mist-worker.jar/__main__.py", line 17, in <module>
  File "/home/lenovo/Installations/mist-1.1.3/mist-worker.jar/__main__.py", line 13, in _main
  File "/home/lenovo/Installations/mist-1.1.3/mist-worker.jar/mistpy/python_execute_script.py", line 55, in execution_cmd
  File "/home/lenovo/Installations/spark-2.4.4-bin-hadoop2.7/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 179, in java_import
  File "/home/lenovo/Installations/spark-2.4.4-bin-hadoop2.7/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 983, in send_command
  File "/home/lenovo/Installations/spark-2.4.4-bin-hadoop2.7/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 931, in _get_connection
  File "/home/lenovo/Installations/spark-2.4.4-bin-hadoop2.7/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 937, in _create_connection
  File "/home/lenovo/Installations/spark-2.4.4-bin-hadoop2.7/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1079, in start
py4j.protocol.Py4JNetworkError: An error occurred while trying to connect to the Java server (127.0.0.1:36309)
Mithun Raj Arackal
@mithunvillae
This message was deleted
Mithun Raj Arackal
@mithunvillae
It's working with spark version 2.4.0. I tried with 2.4.3 but even that had the same issues. Thanks @dos65
Vadim Chelyshov
@dos65
@mithunvillae Got it. I've just got the same problem wth spark 2.4.4
Mithun Raj Arackal
@mithunvillae

@dos65 Getting the following error when running a spark job

ERROR 2019-12-24T15:13:03.514 FailedEvent with Error:
java.lang.IllegalArgumentException: Checksum of downloaded artifact spark_0.0.1.jar different from /home/lenovo/test/mist/target/mist-run-1.1.3/worker-default_70a56f68-debc-491f-928c-10daeedda4c6_1/spark_0.0.1.jar

I've uploaded a jar of size 360MB to the mist server. I'm able to run functions contained within jars of smaller size. But all functions within the big jar is ending up with the above error. Tried running mist server from the source code. But ended up with the same error.

Vadim Chelyshov
@dos65
@mithunvillae Could you try to add the following setting into mist's config mist.workers.max-artifacts-size = 360m?
Its default value is 250m
Mithun Raj Arackal
@mithunvillae
@dos65 Thanks. It worked.
daffydee13
@daffydee13
Hello, I am trying to run mist in cloudera
image.png
I am getting following error
image.png
daffydee13
@daffydee13
I upgraded cloudera quickstart VM to spark 2.2 and scala version is 2.11.8
Need urgent help !!!
@dos65
deveshkataria
@deveshkataria
Hello anyone up?
I am unable to install mist-cli
it says no package or download link found
Pavel Borobov
@blvp
@deveshkataria Could you please send us full log after trying to install it with pip install mist-cli?
I tried to install it on a new virtualenv and it worked for both python 2.7 and 3.7.