Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
  • May 13 07:52
    kwalcock commented #251
  • May 12 17:49
    i10416 closed #240
  • May 09 16:25
    sfriedowitz closed #283
  • May 09 16:25
    sfriedowitz commented #283
  • May 09 16:25
    sfriedowitz commented #283
  • May 06 21:04
    sfriedowitz commented #283
  • May 06 20:43
    shadaj commented #283
  • May 06 15:28
    kwalcock commented #251
  • May 05 23:17
    sfriedowitz commented #283
  • May 05 23:14
    sfriedowitz commented #283
  • May 05 21:34
    shadaj commented #283
  • May 04 23:25
    sfriedowitz edited #283
  • May 04 23:24
    sfriedowitz edited #283
  • May 04 23:21
    sfriedowitz opened #283
  • May 01 00:13
    kiendang commented #280
  • Apr 30 23:59
    kiendang commented #280
  • Apr 30 23:23
    kiendang closed #280
  • Apr 30 23:23
    kiendang commented #280
  • Apr 30 22:08
    shadaj commented #280
  • Apr 30 22:08
    shadaj labeled #282
Lorenzo Gabriele
@lolgab
Hi @shadaj:matrix.org,
I was playing with Scalapy in Scala Native and I whatever I do I leak memory ( unless I use py.local).
I wanted to ask you why the automatic free of memory can't be done in Scala Native.
Is it because Scala Native doesn't support finalize? Because it is deprecated in Java 9, maybe people continue to use it either way?
If it is because you need to call malloc there is a "better" ( hackish ) way to deal with that and have GC memory managed..
You can allocate an Array[Byte]of a certain dimension. And then with ByteArray.at(0) you get the pointer to the start of the arrays' data section. When the original Array is GC collected, you have successfully freed the memory. So you don't have to use malloc.
import scala.scalanative.runtime.ByteArray

val arr = ByteArray.alloc(size)
arr.at(0) // is your pointer
return arr // you don't want to lose the reference to the `ByteArray`object ahead of time
Scalapy could be the thing that would allow us in Scala Native to have a proper ecosystem of utility libraries to get from, while still use pure Scala for the main programs. It would be great if it worked out of the box with memory leaks so you can call Python libraries and forget about it.
I'm particularly interested in the cloud native ecosystem which is really thorough in Python!
Lorenzo Gabriele
@lolgab
*without memory leaks
Eric K Richardson
@ekrich
@lolgab Could you explain a bit more you take on ScalaPy and how this fits into Scala Native and Cloud Native Foundation tools?
Lorenzo Gabriele
@lolgab
@ekrich This answer from Odersky + the 2 replies explain very well why Scala Native would play nicely with Python: https://contributors.scala-lang.org/t/scala-native-next-steps/4216/75
About Cloud Native tools, nothing special about Python there.. It is just that Cloud Native is the area I'm personally interested in and official client libraries for famous clouds like AWS are only for few languages: C++, Go, Java, Javascript, .NET, Node.js, PHP, Python and Ruby.
Scala Native can use C++ libraries if you write C++ glue code with extern "C"functions, but it is a bit of work and if the Python library is fast enough and it's not what makes you slow, I don't see the reason to go that route.
Other dynamic languages like PHP, Ruby etc. can be probably integrated with a Scalapy like library, but I think Python ecosystem is bigger.. And we already have scalapy!
Eric K Richardson
@ekrich
@lolgab I see thanks, that make total sense.
shadaj
@shadaj:matrix.org
[m]
@lolgab: the reason for needing py.local is finalizers; since we expose Python objects through nice Scala wrappers, we are dependent on the garbage collector telling us when the Scala wrappers are no longer needed and therefore we can decrement the reference count for the Python value
@joesan: take a look at https://scalapy.dev/docs/static-types for some examples

@bogorman_twitter: this is a known limitation, since the proxy collections aren't exactly Python lists (but instead a sequence type), whereas copy collections are directly created as native lists

if the API you're using requires a list, copies are unfortunately the only way to go

Lorenzo Gabriele
@lolgab
@shadaj:matrix.org Any plan to migrate away from finalizers? They are flagged for removal in Java 17 and will never be supported in Scala Native.
Something like AutoCloseable would make it, but at a developer experience cost.
shadaj
@shadaj:matrix.org
[m]
@lolgab: I think the solution would be to switch to something like phantom references, with a separate thread responsible for clearing out freed Python values. But one way or another we need some support from Scala Native to have the GC notify ScalaPy of such changes
Dhruva Bharadwaj
@dhruva-clari
@shadaj:matrix.org I have python 3.7 installed on my machine. I'm not using SBT but gradle for the build. Our code has Scala version 2.11. I'm getting the following stack trace despite passing in the jvm argument of jna.library.path with  python's lib directory value
java.lang.IllegalArgumentException: Can't determine class with native methods from the current context (class me.shadaj.scalapy.interpreter.CPythonAPIInterface
KaTeX parse error: Can't use function '$' in math mode at position 8: anonfun$̲1)
    at com.s…: anonfun$1)
    at com.sun.jna.Native.findDirectMappedClass(Native.java:1473)
    at com.sun.jna.Native.register(Native.java:1443)
    at me.shadaj.scalapy.interpreter.CPythonAPIInterface
anonfun$1.apply(CPythonAPI.scala:20)
at me.shadaj.scalapy.interpreter.CPythonAPIInterface$$anonfun$1.apply(CPythonAPI.scala:19)
at scala.collection.immutable.Stream.map(Stream.scala:418)
at me.shadaj.scalapy.interpreter.CPythonAPIInterface.<init>(CPythonAPI.scala:19)
at me.shadaj.scalapy.interpreter.CPythonAPI$.<init>(CPythonAPI.scala:111)
at me.shadaj.scalapy.interpreter.CPythonAPI$.<clinit>(CPythonAPI.scala)
at me.shadaj.scalapy.interpreter.CPythonInterpreter$.<init>(CPythonInterpreter.scala:9)
at me.shadaj.scalapy.interpreter.CPythonInterpreter$.<clinit>(CPythonInterpreter.scala)
at me.shadaj.scalapy.py.package$.<init>(package.scala:15)
at me.shadaj.scalapy.py.package$.<clinit>(package.scala)
shadaj
@shadaj:matrix.org
[m]
@dhruva-clari: ah, unfortunately ScalaPy only supports Scala 2.12 and up, so you'll likely need to upgrade to use ScalaPy
it's interesting that KaTeX shows up somehow, I wonder why (or if) JNA is using that under the hood
Richard Lin
@ducky64

Does ScalaPy support Windows + Python 3.10? The manual sbt config seems to invoke python3-config which doesn't appear to be a thing on Windows, and python-native-libs seems to request sys.abiflags which gives an error.

Also, since it looks like configs are set during compile time instead of runtime, is it possible to distribute programs using ScalaPy as a JAR (including cross-platform support) or must users compile from source?

shadaj
@shadaj:matrix.org
[m]

@ducky64: I haven't tested ScalaPy myself with Windows, but in theory everything should just work (as in there is nothing hardcoded for *nix). You might want to try using python-native-libs as described in https://scalapy.dev/docs/, or otherwise you may need to hardcode the dependency.

You can set the system property scalapy.python.library to point to a specific Python dependency at runtime (as long as you do this before calling any ScalaPy APIs). We don't have built-in support for automatic configuration, but it may be interesting to use python-native-libs at runtime to power cross-platform discovery.

Python 3.10 isn't officially supported yet, and you'll need to set the scalapy.python.library property or the SCALAPY_PYTHON_LIBRARY environment variable manually to try using it. But I'll look into testing with that in CI and making support official!

Darsh Selarka
@darshselarka1497

Hey @shadaj:matrix.org , firstly, I really appreciate the work that you have put in for building Scalapy! After using it extensively in day-to-day tasks, it has proven to be a great asset in reducing manual conversion efforts from Pandas code to Scala.

I have been trying to run a custom PySpark script from a Scala based notebook on Databricks but facing an issue when I try to pass a spark session to a function in a custom python package. ScalaPy throws a type mismatch error. Have attached the error below for your reference.

Importing required libraries using ScalaPy

val pd = py.module("pandas")

val s3fs = py.module("s3fs")

val py_spark_sql = py.module("pyspark.sql")

val pyspark_package = py.module("pyspark_tier2_test.pyspark_tier2_test")

This is the error I get when I pass the spark session to the function in my custom package:

val result_df = pyspark_package.py_driver_func(py_spark_sql.SparkSession)

command-3273291744808514:1: error: type mismatch;
 found   : org.apache.spark.SparkSession
 required: me.shadaj.scalapy.py.Any

Pandas works perfectly with ScalaPy but I have an requirement to make pyspark scripts run with Scalapy in order to make things more scalable and distributed!
Can you please suggest a fix or head me in the right direction? Any help will be much appreciated!

shadaj
@shadaj:matrix.org
[m]
@darshselarka1497: hmm, this is odd; py_spark_sql.SparkSession should automatically be py.Any since it's just a member of another Python module. I wonder if the Databricks notebook environment is doing something funky. Could you try printing out the type of py_spark_sql.SparkSession (py_spark_sql.SparkSession.getClass)?
Darsh Selarka
@darshselarka1497
@shadaj:matrix.org This is the output for the class type
class me.shadaj.scalapy.py.AnyDynamics$$anon$15$$anon$16
shadaj
@shadaj:matrix.org
[m]
@darshselarka1497: hmm, that's weird, in theory things should compile then; maybe you can try storing py_spark_sql.SparkSession in a variable before using it?
Pascal Méheut
@pascal_meheut_gitlab
Hi. I'm running Scalapy with Python 3.9 in Anaconda on a Mac. It works fine. I just had to configure the jna.library.path manually because python3-config returns something wrong. But some modules cannot be imported. numpy and pandas work fine but when I try to import feather or xgboost, I got a message
"Exception in thread "main" me.shadaj.scalapy.py.PythonException: <class 'ModuleNotFoundError'> No module named 'xgboost'"
shadaj
@shadaj:matrix.org
[m]
@pascal_meheut_gitlab: hmm, these packages are installed in your Anaconda environment as well? those should work out of the box; if you're using a virtualenv you need to follow the instructions at https://scalapy.dev/docs/#virtualenv though
Pascal Méheut
@pascal_meheut_gitlab
Yes, this package are installed: they are my bread and butter. I'm not using VirtualEnv at all, just Anaconda.
shadaj
@shadaj:matrix.org
[m]
that's surprising, is the installation of xgboost right next to numpy and the modules that do work?
Pascal Méheut
@pascal_meheut_gitlab
Yes. Everything is in $HOME/opt/anaconda3/envs/lbo/lib/python3.9/site-packages
lbo being my environment name. I'll test on another Mac and on Linux & Windows tomorrow.
Pascal Méheut
@pascal_meheut_gitlab
Ok, this was a problem with my installation. I removed Anaconda, reinstalled it, recreated the environment and now it works. Thanks.
Pascal Méheut
@pascal_meheut_gitlab
Another question: anybody wrote a facade or explained how to use a Pandas dataframe?
mn98
@mn98
Hi all, I'm trying to get started with ScalaPy and experiencing issues similar to others with respect to libraries not being found.
I've tried a few of the solutions proposed but have been miserably unsuccessful in getting it to work. What's slightly different about my setup is that I've used pyenv to install python 3.9.9 and anaconda 3-2011.11. Has anyone had experience with this approach and be able to share any pointers? Many thanks in advance!
mn98
@mn98

To make this slightly easier, I've removed pyenv from the equation and pushed this skeleton example to github.
At this point, upon executing runMain hello in the sbt shell the error begins with:

java.lang.UnsatisfiedLinkError: Unable to load library 'python3':
dlopen(libpython3.dylib, 0x0009): tried: '/Applications/IntelliJ IDEA CE.app/Contents/jbr/Contents/Home/bin/../lib/jli/libpython3.dylib' (no such file) ...

And it's correct, that file doesn't exist, it's actually /Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9.dylib, but how do I get this to be picked up?

1 reply
I get the same link error when running either in the sbt shell or within IntelliJ IDEA, so I don't think it's an IDE issue.
mn98
@mn98
I'd be curious to know if what I've pushed to github works out of the box for people.
Kien Dang
@kiendang
add fork := true to your build.sbt and things should work fine
1 reply
mn98
@mn98
@kiendang thank you very much!!
mn98
@mn98
I updated my minimal example on github, which may be useful for others trying to get off the ground.
mn98
@mn98

Next, I've switched to my local install of anaconda by changing the path to python in build.sbt:

lazy val python = Python("/opt/anaconda3/bin/python3.9")

and the existing example works fine.
However, when I then try to experiment with numpy at runtime a particular library can't be loaded:

[info] INTEL MKL ERROR: dlopen(/opt/anaconda3/lib/libmkl_intel_thread.1.dylib, 0x0009): Library not loaded: @rpath/libiomp5.dylib

I notice that /opt/anaconda3/lib/libiomp5.dylib does exist, although /opt/anaconda3/lib/libmkl_intel_thread.1.dylib does not.
Has anyone experienced a similar problem?

mn98
@mn98
Correction, both libraries are present under /opt/anaconda3/lib yet they are not loaded at runtime.
mn98
@mn98
I tried a few of these suggestions in the anaconda docs but unfortunately they haven't resolved my issue.
The full error message reads:
[info] INTEL MKL ERROR: dlopen(/opt/anaconda3/lib/libmkl_intel_thread.1.dylib, 0x0009): Library not loaded: @rpath/libiomp5.dylib
[info]   Referenced from: /opt/anaconda3/lib/libmkl_intel_thread.1.dylib
[info]   Reason: tried: '/Applications/IntelliJ IDEA CE.app/Contents/jbr/Contents/Home/bin/../lib/jli/libiomp5.dylib' (no such file), '/usr/lib/libiomp5.dylib' (no such file).
[info] Intel MKL FATAL ERROR: Cannot load libmkl_intel_thread.1.dylib.
mn98
@mn98

On the executable /opt/anaconda3/bin/python3.9 it would appear (from using otool) that LC_RPATH is correct:

Load command 14
          cmd LC_RPATH
      cmdsize 272
         path /opt/anaconda3/lib (offset 12)

and in /opt/anaconda3/lib/libmkl_intel_thread.1.dylib itself, I see:

Load command 10
          cmd LC_LOAD_DYLIB
      cmdsize 48
         name @rpath/libiomp5.dylib (offset 24)

I'm in a world of macos/rpath pain now and well out of my depth, but none of the above looks incorrect to me.
Would anyone care to venture why it doesn't pick up @rpath/libiomp5.dylib from /opt/anaconda3/lib?

mn98
@mn98
I put the minimal anaconda/numpy example on this branch. Again, I'd be curious to know if that just works out of the box for folks with a local anaconda install on MacOS.
8 replies
alicebrb
@alicebrb
Hello, I'm trying to use Scalapy to integrate Python Arima library with the Scala code from the rest of my project. It is working fine, but when I'm trying to integrate with Jenkins pipeline, I'm getting an error with Sonar analysis.
I have an class in my project that contains the Scalapy code, all others classes have Scala code.
I tried to ignore this specific file but no success, and the goal would be to analyse all Scala and ScalaPy code in SonarQube
image.png
Andy Czerwonka
@andyczerwonka
Looking at the Getting Started docs, I'm curious if there are some details somewhere around how to get this up and running in the docker container?
shadaj
@shadaj:matrix.org
[m]
@alicebrb: interesting, it seems that the Sonar analysis doesn't like code that's generated by the macro. I'll see if there's some way to ensure that positions are assigned to the generated code, but in the meantime you'll probably have to disable the analysis on files that involve ScalaPy calls.
1 reply
@andyczerwonka: not yet, but things should "just work" with a container containing both SBT and a Python installation. The ScalaPy website itself is built in Netlify's Docker containers, and we run some ScalaPy code as part of that to generate the outputs of the code examples. So it works there at least.
if you end up putting together a setup for Docker, it would be great to add that to the docs!
@mn98: I noticed that you seem to be running the examples from IntelliJ. Do you run into the same error when running SBT on the command line?
mn98
@mn98
@shadaj:matrix.org yes, unfortunately the error is identical via either sbt on the command line or the sbt console within IntelliJ
it's odd, because anaconda+numpy works in isolation, scalapy works with anaconda, my branch works for at least one other user and yet anaconda+numpy won't work for me with scalapy
Andy Czerwonka
@andyczerwonka
@shadaj:matrix.org Today, we don't ship SBT as part of our container, we package things up using via .enablePlugins(DockerPlugin, JavaAgent, JavaAppPackaging, AshScriptPlugin).