extern "C"functions, but it is a bit of work and if the Python library is fast enough and it's not what makes you slow, I don't see the reason to go that route.
py.localis finalizers; since we expose Python objects through nice Scala wrappers, we are dependent on the garbage collector telling us when the Scala wrappers are no longer needed and therefore we can decrement the reference count for the Python value
@bogorman_twitter: this is a known limitation, since the proxy collections aren't exactly Python lists (but instead a sequence type), whereas copy collections are directly created as native lists
if the API you're using requires a list, copies are unfortunately the only way to go
java.lang.IllegalArgumentException: Can't determine class with native methods from the current context (class me.shadaj.scalapy.interpreter.CPythonAPIInterface
@shadaj:matrix.org I have python 3.7 installed on my machine. I'm not using SBT but gradle for the build. Our code has Scala version 2.11. I'm getting the following stack trace despite passing in the jvm argument of jna.library.path with python's lib directory value
KaTeX parse error: Can't use function '$' in math mode at position 8: anonfun$̲1) at com.s…: anonfun$1) at com.sun.jna.Native.findDirectMappedClass(Native.java:1473) at com.sun.jna.Native.register(Native.java:1443) at me.shadaj.scalapy.interpreter.CPythonAPIInterface
Does ScalaPy support Windows + Python 3.10? The manual sbt config seems to invoke python3-config which doesn't appear to be a thing on Windows, and python-native-libs seems to request sys.abiflags which gives an error.
Also, since it looks like configs are set during compile time instead of runtime, is it possible to distribute programs using ScalaPy as a JAR (including cross-platform support) or must users compile from source?
@ducky64: I haven't tested ScalaPy myself with Windows, but in theory everything should just work (as in there is nothing hardcoded for *nix). You might want to try using
python-native-libs as described in https://scalapy.dev/docs/, or otherwise you may need to hardcode the dependency.
You can set the system property
scalapy.python.library to point to a specific Python dependency at runtime (as long as you do this before calling any ScalaPy APIs). We don't have built-in support for automatic configuration, but it may be interesting to use
python-native-libs at runtime to power cross-platform discovery.
Python 3.10 isn't officially supported yet, and you'll need to set the
scalapy.python.library property or the
SCALAPY_PYTHON_LIBRARY environment variable manually to try using it. But I'll look into testing with that in CI and making support official!
Hey @shadaj:matrix.org , firstly, I really appreciate the work that you have put in for building Scalapy! After using it extensively in day-to-day tasks, it has proven to be a great asset in reducing manual conversion efforts from Pandas code to Scala.
I have been trying to run a custom PySpark script from a Scala based notebook on Databricks but facing an issue when I try to pass a spark session to a function in a custom python package. ScalaPy throws a type mismatch error. Have attached the error below for your reference.
Importing required libraries using ScalaPy
val pd = py.module("pandas") val s3fs = py.module("s3fs") val py_spark_sql = py.module("pyspark.sql") val pyspark_package = py.module("pyspark_tier2_test.pyspark_tier2_test")
This is the error I get when I pass the spark session to the function in my custom package:
val result_df = pyspark_package.py_driver_func(py_spark_sql.SparkSession) command-3273291744808514:1: error: type mismatch; found : org.apache.spark.SparkSession required: me.shadaj.scalapy.py.Any
Pandas works perfectly with ScalaPy but I have an requirement to make pyspark scripts run with Scalapy in order to make things more scalable and distributed!
Can you please suggest a fix or head me in the right direction? Any help will be much appreciated!
py_spark_sql.SparkSessionshould automatically be
py.Anysince it's just a member of another Python module. I wonder if the Databricks notebook environment is doing something funky. Could you try printing out the type of
py_spark_sql.SparkSessionin a variable before using it?
"Exception in thread "main" me.shadaj.scalapy.py.PythonException: <class 'ModuleNotFoundError'> No module named 'xgboost'"
xgboostright next to
numpyand the modules that do work?
3-2011.11. Has anyone had experience with this approach and be able to share any pointers? Many thanks in advance!
To make this slightly easier, I've removed pyenv from the equation and pushed this skeleton example to github.
At this point, upon executing
runMain hello in the sbt shell the error begins with:
java.lang.UnsatisfiedLinkError: Unable to load library 'python3': dlopen(libpython3.dylib, 0x0009): tried: '/Applications/IntelliJ IDEA CE.app/Contents/jbr/Contents/Home/bin/../lib/jli/libpython3.dylib' (no such file) ...
And it's correct, that file doesn't exist, it's actually
/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9.dylib, but how do I get this to be picked up?
Next, I've switched to my local install of anaconda by changing the path to python in
lazy val python = Python("/opt/anaconda3/bin/python3.9")
and the existing example works fine.
However, when I then try to experiment with
numpy at runtime a particular library can't be loaded:
[info] INTEL MKL ERROR: dlopen(/opt/anaconda3/lib/libmkl_intel_thread.1.dylib, 0x0009): Library not loaded: @rpath/libiomp5.dylib
I notice that
/opt/anaconda3/lib/libiomp5.dylib does exist, although
/opt/anaconda3/lib/libmkl_intel_thread.1.dylib does not.
Has anyone experienced a similar problem?
[info] INTEL MKL ERROR: dlopen(/opt/anaconda3/lib/libmkl_intel_thread.1.dylib, 0x0009): Library not loaded: @rpath/libiomp5.dylib [info] Referenced from: /opt/anaconda3/lib/libmkl_intel_thread.1.dylib [info] Reason: tried: '/Applications/IntelliJ IDEA CE.app/Contents/jbr/Contents/Home/bin/../lib/jli/libiomp5.dylib' (no such file), '/usr/lib/libiomp5.dylib' (no such file). [info] Intel MKL FATAL ERROR: Cannot load libmkl_intel_thread.1.dylib.
On the executable
/opt/anaconda3/bin/python3.9 it would appear (from using
LC_RPATH is correct:
Load command 14 cmd LC_RPATH cmdsize 272 path /opt/anaconda3/lib (offset 12)
/opt/anaconda3/lib/libmkl_intel_thread.1.dylib itself, I see:
Load command 10 cmd LC_LOAD_DYLIB cmdsize 48 name @rpath/libiomp5.dylib (offset 24)
I'm in a world of macos/rpath pain now and well out of my depth, but none of the above looks incorrect to me.
Would anyone care to venture why it doesn't pick up