-Ylog-classpath
.scalapy-numpy
is unfortunately quite a bit out of date, but there is active work that will bring back static typing for NumPy and TensorFlow soon! in the meantime, you'll have to use the dynamically-typed APIs or define your own facades
Hi. I know the facadeGen is alpha but i just tried to run it and i get this error. Any suggestion on how to get it running? I just want to generate some facades for some modules and play with it. I am only trying to generate it for the "builtins" module at the moment while testing. I can see in your scalacon-live folder it looks like the facadeGen did work for at that point in time. Thanks.
[info] running (fork) me.shadaj.scalapy.facadegen.Main
[error] Exception in thread "main" me.shadaj.scalapy.py.PythonException: <class 'TypeError'> list object expected; got SequenceProxy
[error] at me.shadaj.scalapy.interpreter.CPythonInterpreter$.$anonfun$throwErrorIfOccured$2(CPythonInterpreter.scala:328)
[error] at me.shadaj.scalapy.interpreter.Platform$.Zone(Platform.scala:10)
[error] at me.shadaj.scalapy.interpreter.CPythonInterpreter$.$anonfun$throwErrorIfOccured$1(CPythonInterpreter.scala:314)
[error] at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.scala:18)
py.local
).finalize
? Because it is deprecated in Java 9, maybe people continue to use it either way?malloc
there is a "better" ( hackish ) way to deal with that and have GC memory managed..Array[Byte]
of a certain dimension. And then with ByteArray.at(0)
you get the pointer to the start of the arrays' data section. When the original Array is GC collected, you have successfully freed the memory. So you don't have to use malloc.import scala.scalanative.runtime.ByteArray
val arr = ByteArray.alloc(size)
arr.at(0) // is your pointer
return arr // you don't want to lose the reference to the `ByteArray`object ahead of time
extern "C"
functions, but it is a bit of work and if the Python library is fast enough and it's not what makes you slow, I don't see the reason to go that route.py.local
is finalizers; since we expose Python objects through nice Scala wrappers, we are dependent on the garbage collector telling us when the Scala wrappers are no longer needed and therefore we can decrement the reference count for the Python value
@bogorman_twitter: this is a known limitation, since the proxy collections aren't exactly Python lists (but instead a sequence type), whereas copy collections are directly created as native lists
if the API you're using requires a list, copies are unfortunately the only way to go
@shadaj:matrix.org I have python 3.7 installed on my machine. I'm not using SBT but gradle for the build. Our code has Scala version 2.11. I'm getting the following stack trace despite passing in the jvm argument of jna.library.path with python's lib directory value
java.lang.IllegalArgumentException: Can't determine class with native methods from the current context (class me.shadaj.scalapy.interpreter.CPythonAPIInterfaceKaTeX parse error: Can't use function '$' in math mode at position 8: anonfun$̲1)
at com.s…: anonfun$1)
at com.sun.jna.Native.findDirectMappedClass(Native.java:1473)
at com.sun.jna.Native.register(Native.java:1443)
at me.shadaj.scalapy.interpreter.CPythonAPIInterface
anonfun$1.apply(CPythonAPI.scala:20)Does ScalaPy support Windows + Python 3.10? The manual sbt config seems to invoke python3-config which doesn't appear to be a thing on Windows, and python-native-libs seems to request sys.abiflags which gives an error.
Also, since it looks like configs are set during compile time instead of runtime, is it possible to distribute programs using ScalaPy as a JAR (including cross-platform support) or must users compile from source?
@ducky64: I haven't tested ScalaPy myself with Windows, but in theory everything should just work (as in there is nothing hardcoded for *nix). You might want to try using python-native-libs
as described in https://scalapy.dev/docs/, or otherwise you may need to hardcode the dependency.
You can set the system property scalapy.python.library
to point to a specific Python dependency at runtime (as long as you do this before calling any ScalaPy APIs). We don't have built-in support for automatic configuration, but it may be interesting to use python-native-libs
at runtime to power cross-platform discovery.
Python 3.10 isn't officially supported yet, and you'll need to set the scalapy.python.library
property or the SCALAPY_PYTHON_LIBRARY
environment variable manually to try using it. But I'll look into testing with that in CI and making support official!
Hey @shadaj:matrix.org , firstly, I really appreciate the work that you have put in for building Scalapy! After using it extensively in day-to-day tasks, it has proven to be a great asset in reducing manual conversion efforts from Pandas code to Scala.
I have been trying to run a custom PySpark script from a Scala based notebook on Databricks but facing an issue when I try to pass a spark session to a function in a custom python package. ScalaPy throws a type mismatch error. Have attached the error below for your reference.
Importing required libraries using ScalaPy
val pd = py.module("pandas")
val s3fs = py.module("s3fs")
val py_spark_sql = py.module("pyspark.sql")
val pyspark_package = py.module("pyspark_tier2_test.pyspark_tier2_test")
This is the error I get when I pass the spark session to the function in my custom package:
val result_df = pyspark_package.py_driver_func(py_spark_sql.SparkSession)
command-3273291744808514:1: error: type mismatch;
found : org.apache.spark.SparkSession
required: me.shadaj.scalapy.py.Any
Pandas works perfectly with ScalaPy but I have an requirement to make pyspark scripts run with Scalapy in order to make things more scalable and distributed!
Can you please suggest a fix or head me in the right direction? Any help will be much appreciated!
py_spark_sql.SparkSession
should automatically be py.Any
since it's just a member of another Python module. I wonder if the Databricks notebook environment is doing something funky. Could you try printing out the type of py_spark_sql.SparkSession
(py_spark_sql.SparkSession.getClass
)?
py_spark_sql.SparkSession
in a variable before using it?
"Exception in thread "main" me.shadaj.scalapy.py.PythonException: <class 'ModuleNotFoundError'> No module named 'xgboost'"
xgboost
right next to numpy
and the modules that do work?
3.9.9
and anaconda 3-2011.11
. Has anyone had experience with this approach and be able to share any pointers? Many thanks in advance!
To make this slightly easier, I've removed pyenv from the equation and pushed this skeleton example to github.
At this point, upon executing runMain hello
in the sbt shell the error begins with:
java.lang.UnsatisfiedLinkError: Unable to load library 'python3':
dlopen(libpython3.dylib, 0x0009): tried: '/Applications/IntelliJ IDEA CE.app/Contents/jbr/Contents/Home/bin/../lib/jli/libpython3.dylib' (no such file) ...
And it's correct, that file doesn't exist, it's actually /Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9.dylib
, but how do I get this to be picked up?