Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
Amos Bird
@amosbird
I wish CPython had better support of doing module unloading. What happens when a long running python keeps jitting new functions...
will the unused code be freed?
btw, I'm trying to dynamically build a jitclass. I wonder if there is a way to first create a python class with __init__ method defined as a dataclass from a named string 'x, y, z'
Hameer Abbasi
@hameerabbasi
I was wondering if there's a reliable function-to-ast converter in Numba.
Amos Bird
@amosbird
hmm
Hameer Abbasi
@hameerabbasi
If not, how does it implement the core decorator mechanism?
Alexander Clausen
@sk1p
Hameer Abbasi
@hameerabbasi
Ah, so it works with bytecode, and not the AST.
luk-f-a
@luk-f-a
@amosbird , a python class can be created dynamically with the type function, and therefore a jitclass too. However, personally, whenever I think "I need to use type()" I stop and think really hard if I'm going the right way. There's usually a simpler way, unless you're working on a framework and trying to provide a lot of "magic" to end users.
Amos Bird
@amosbird

unless you're working on a framework and trying to provide a lot of "magic" to end users.

exactly

I tried with this for now

def xinit(self, x, y):
    self.x = x
    self.y = y

C = type('gogogo', (), {"__init__": xinit})
C = jitclass([("x", types.int32[:]), ("y", types.int32[:])])(C)

but still cannot find a way to create the xinit function dynamically

luk-f-a
@luk-f-a
did the above work? what do you mean by creating xinit dynamically?
Amos Bird
@amosbird
for example, if the end user provides a column type string : x array(int), y array(int) I'd like to create the jitclass for him
so I need to parse the string, and build the class with correct __init__ method so that it can be jitclassed
and the above example works with the hardcoded xinit
luk-f-a
@luk-f-a
and both the names and types could change? so the string could be a array(float), b array(float)?
Amos Bird
@amosbird
exactly
the user provided string is arbitrary
luk-f-a
@luk-f-a
I'm working on something similar, so I can share my thought process so far.
in one case, I have user data from csv files. I just need to store the data, and I don't need methods. For this case, I'm using numpy record arrays.
this user-defined data is operated on by user-defined functions. so the users know that they can expect an input parameter called data that will have the fields they provided in the csv. They operate on that data writing data.x or data['x'].
pro: records are mutable, but you can make them immutable by making a readonly view. cons: no methods, so you won't be able to write data.run_operation().
luk-f-a
@luk-f-a
in the second case, the user defines the fields, and requires to be able to run overloaded functions. For that, and following a great recommendation from Stuart, I'll use namedtuples.
NamedTuples will give you automatically the constructor you want. You can easily overload functions to act differently depending on the named tuple, while keeping a single function name. I haven't tried, but it might be possible to overload methods, but I'm not sure. I've never used numba's overload_method.
Amos Bird
@amosbird
Yeah, I remembered I've achieved that using namedtuple
but I'm in favor of jitclass
luk-f-a
@luk-f-a
any particular reason? I'm in the process of building the second use case based on named tuples, so I'm interested.
I think we ruled out jitclasses because we couldn't get them to serialize, so we couldn't use multiprocessing.
any info on downsides of namedtuples approach is highly appreciated! :-)
Amos Bird
@amosbird
hmm, I cannot think of one right now but I can assure you that it at least works in a c++ embeded env
it's just I think building custom user data via jitclass is more convenient, maybe I'm totally wrong
luk-f-a
@luk-f-a
for construction based on strings, namedtuple should be very convenient, since namedtupleaccepts a string naturally, without type gymnastics.
will your users declare their own functions on the data? or are functions fixed and provided by the framework?
Amos Bird
@amosbird
users declare their own functions on the data. but the data is provided by the framework and with the schema verified
it's actually from a database
so the string signature is a sql table definition and the data just mimics the table in memory
luk-f-a
@luk-f-a
if the users don't need to append or delete records, I can recommend the record array. we have sth similar on our first use case: csv data (like a database), which is passed to user-provided functions.
is more light-weight that jitclasses, it can be serialized (so you can use dask to parallelize calculations), and I think it might be even be possible to use prangeon a record array.
plus you can use numpy operations and slicing on the whole batch of data, like all_data['field'] will give you a vector with fieldfor all records.
we've been working like this for about 6 months, and it's worked very well
Amos Bird
@amosbird
is the user able to access columns via table.x when using record array?
luk-f-a
@luk-f-a
yes, both table.x and table['x'] work.
Amos Bird
@amosbird
cool.
btw, does it provide an easy way of constructing a rec array via string signature?
luk-f-a
@luk-f-a
in interpreted code, there are 4 ways to build the dtype: https://docs.scipy.org/doc/numpy/user/basics.rec.html
one of them is pure string, but I don't think that's what you need.
if you have your data as pandas dataframe, it's a one-step conversion
so it's very easy to construct in interpreted code. inside jitted code, the constructor is broken. I'm working on it currently. Maybe I'll be able to fix it.
Amos Bird
@amosbird
cool, thanks
luk-f-a
@luk-f-a
update: it's not broken, it's just not implemented, but it's not trivial. If I could find out how numpy does it, I could copy it, but it seems it's buried in C code and I cannot find where.