@bogorman_twitter: this is a known limitation, since the proxy collections aren't exactly Python lists (but instead a sequence type), whereas copy collections are directly created as native lists
if the API you're using requires a list, copies are unfortunately the only way to go
@shadaj:matrix.org I have python 3.7 installed on my machine. I'm not using SBT but gradle for the build. Our code has Scala version 2.11. I'm getting the following stack trace despite passing in the jvm argument of jna.library.path with python's lib directory value
java.lang.IllegalArgumentException: Can't determine class with native methods from the current context (class me.shadaj.scalapy.interpreter.CPythonAPIInterfaceKaTeX parse error: Can't use function '$' in math mode at position 8: anonfun$̲1)
at com.s…: anonfun$1)
at com.sun.jna.Native.findDirectMappedClass(Native.java:1473)
at com.sun.jna.Native.register(Native.java:1443)
at me.shadaj.scalapy.interpreter.CPythonAPIInterface
anonfun$1.apply(CPythonAPI.scala:20)Does ScalaPy support Windows + Python 3.10? The manual sbt config seems to invoke python3-config which doesn't appear to be a thing on Windows, and python-native-libs seems to request sys.abiflags which gives an error.
Also, since it looks like configs are set during compile time instead of runtime, is it possible to distribute programs using ScalaPy as a JAR (including cross-platform support) or must users compile from source?
@ducky64: I haven't tested ScalaPy myself with Windows, but in theory everything should just work (as in there is nothing hardcoded for *nix). You might want to try using python-native-libs
as described in https://scalapy.dev/docs/, or otherwise you may need to hardcode the dependency.
You can set the system property scalapy.python.library
to point to a specific Python dependency at runtime (as long as you do this before calling any ScalaPy APIs). We don't have built-in support for automatic configuration, but it may be interesting to use python-native-libs
at runtime to power cross-platform discovery.
Python 3.10 isn't officially supported yet, and you'll need to set the scalapy.python.library
property or the SCALAPY_PYTHON_LIBRARY
environment variable manually to try using it. But I'll look into testing with that in CI and making support official!
Hey @shadaj:matrix.org , firstly, I really appreciate the work that you have put in for building Scalapy! After using it extensively in day-to-day tasks, it has proven to be a great asset in reducing manual conversion efforts from Pandas code to Scala.
I have been trying to run a custom PySpark script from a Scala based notebook on Databricks but facing an issue when I try to pass a spark session to a function in a custom python package. ScalaPy throws a type mismatch error. Have attached the error below for your reference.
Importing required libraries using ScalaPy
val pd = py.module("pandas")
val s3fs = py.module("s3fs")
val py_spark_sql = py.module("pyspark.sql")
val pyspark_package = py.module("pyspark_tier2_test.pyspark_tier2_test")
This is the error I get when I pass the spark session to the function in my custom package:
val result_df = pyspark_package.py_driver_func(py_spark_sql.SparkSession)
command-3273291744808514:1: error: type mismatch;
found : org.apache.spark.SparkSession
required: me.shadaj.scalapy.py.Any
Pandas works perfectly with ScalaPy but I have an requirement to make pyspark scripts run with Scalapy in order to make things more scalable and distributed!
Can you please suggest a fix or head me in the right direction? Any help will be much appreciated!
py_spark_sql.SparkSession
should automatically be py.Any
since it's just a member of another Python module. I wonder if the Databricks notebook environment is doing something funky. Could you try printing out the type of py_spark_sql.SparkSession
(py_spark_sql.SparkSession.getClass
)?
py_spark_sql.SparkSession
in a variable before using it?
"Exception in thread "main" me.shadaj.scalapy.py.PythonException: <class 'ModuleNotFoundError'> No module named 'xgboost'"
xgboost
right next to numpy
and the modules that do work?
3.9.9
and anaconda 3-2011.11
. Has anyone had experience with this approach and be able to share any pointers? Many thanks in advance!
To make this slightly easier, I've removed pyenv from the equation and pushed this skeleton example to github.
At this point, upon executing runMain hello
in the sbt shell the error begins with:
java.lang.UnsatisfiedLinkError: Unable to load library 'python3':
dlopen(libpython3.dylib, 0x0009): tried: '/Applications/IntelliJ IDEA CE.app/Contents/jbr/Contents/Home/bin/../lib/jli/libpython3.dylib' (no such file) ...
And it's correct, that file doesn't exist, it's actually /Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9.dylib
, but how do I get this to be picked up?
Next, I've switched to my local install of anaconda by changing the path to python in build.sbt
:
lazy val python = Python("/opt/anaconda3/bin/python3.9")
and the existing example works fine.
However, when I then try to experiment with numpy
at runtime a particular library can't be loaded:
[info] INTEL MKL ERROR: dlopen(/opt/anaconda3/lib/libmkl_intel_thread.1.dylib, 0x0009): Library not loaded: @rpath/libiomp5.dylib
I notice that /opt/anaconda3/lib/libiomp5.dylib
does exist, although /opt/anaconda3/lib/libmkl_intel_thread.1.dylib
does not.
Has anyone experienced a similar problem?
[info] INTEL MKL ERROR: dlopen(/opt/anaconda3/lib/libmkl_intel_thread.1.dylib, 0x0009): Library not loaded: @rpath/libiomp5.dylib
[info] Referenced from: /opt/anaconda3/lib/libmkl_intel_thread.1.dylib
[info] Reason: tried: '/Applications/IntelliJ IDEA CE.app/Contents/jbr/Contents/Home/bin/../lib/jli/libiomp5.dylib' (no such file), '/usr/lib/libiomp5.dylib' (no such file).
[info] Intel MKL FATAL ERROR: Cannot load libmkl_intel_thread.1.dylib.
On the executable /opt/anaconda3/bin/python3.9
it would appear (from using otool
) that LC_RPATH
is correct:
Load command 14
cmd LC_RPATH
cmdsize 272
path /opt/anaconda3/lib (offset 12)
and in /opt/anaconda3/lib/libmkl_intel_thread.1.dylib
itself, I see:
Load command 10
cmd LC_LOAD_DYLIB
cmdsize 48
name @rpath/libiomp5.dylib (offset 24)
I'm in a world of macos/rpath pain now and well out of my depth, but none of the above looks incorrect to me.
Would anyone care to venture why it doesn't pick up @rpath/libiomp5.dylib
from /opt/anaconda3/lib
?
javaOptions
specifically for the Docker image (I believe javaOptions in Universal
should work) to -Djna.library.path=$pythonLibsDir
where $pythonLibsDir
is replaced with the Python installation path that python3-config
prints in the container
import ai.kien.python.Python
Python().scalapyProperties.fold(
ex => println(s"Error while getting ScalaPy properties: $ex"),
props => props.foreach { case(k, v) => System.setProperty(k, v) }
)