Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
Vicente
@masip85

I am sorry. I've been reading this. This library provides I/O:
https://stacresearch.com/system/files/resource/files/STAC-Summit-1-Nov-2017-Intel-Totoni.pdf
But this only would work in intel processors, is that right?

can anyone give a tip about this?

stuartarchibald
@stuartarchibald
@masip85 What specifically are you asking about? Intel vs. other CPU manufacturers or CPU vs GPUs?
Vicente
@masip85
i am asking about the content of my link,where a I/O process is jitted
stuartarchibald
@stuartarchibald
That didn't answer the question :) However, HPAT, which is what that link is talking about is now Intel SDC https://github.com/IntelPython/sdc, and I don't recall seeing anything that'd prevent it being compiled for any CPU, perhaps just try building it?
Fundamentally, it's just calling out to some C code to do the IO, have you considered just doing this in your case?
stuartarchibald
@stuartarchibald
@masip85 do you have an example of what you are trying to do with some actual code and an example file/input of what you need to read ? Something that can be run locally? This would really help with working out if there's something Numba can do here.
If it doesn't need to be cross platform, and you are just reading a line of a file that is sometimes updated, you can probably just fopen() fread() fclose() it on linux.
Vicente
@masip85
@stuartarchibald , I am carrying our a RT DSP process. Every x seconds I read a buffer with mmap numpy array. That is fast, in fact I don't read when reading. But the, I make a simple process and save a subsample of it.
but, there is a control signal of the hardware I am using. This control signal is in a file that cannot be mmaped,so i have to open as a regular file. This takes to much time. 10-100 times more.
Vicente
@masip85
well.I'll dive into that library looking for how they read and I'll try to get it out
stuartarchibald
@stuartarchibald
they are just calling a C Library from numba
if you can express the read you want to do in terms of fopen, fread, fclose you can probably just do it with ctypes
Vicente
@masip85
ok. I'll try. thank you very much
Andreas Sodeur
@asodeur
I am using cache a cache decorator that works like this:
GLOBAL_VAR_NAME = 'mypackage.mygv'
result_type = MyType()

def caching_wrapper(arg):
    candidate = _load_global(result_type, GLOBAL_VAR_NAME)  # returns Optional(result_type)
    if candidate is None:
        result = expensive_function(arg)
        _store_global(result, GLOBAL_VAR_NAME)
    else:
        result = _cast(candidate, result_type)

    return result
where _load_global and _store_global load and store from an ir.GlobalVariable with linkage 'linkonce'. Will the global variable be shared between different jitted functions that call caching wrapper?
Andreas Sodeur
@asodeur
So far this seems to be the case but is probably depending on when Numba creates new LLVM modules (which is something I never had to worry about to date)
Pearu Peterson
@pearu

In an unbox function I have:

pyaddr = c.pyapi.object_getattr_string(obj, "_wrapper_address")
...

If obj does not have the required attribute, pyaddr will contain NULL and an exception should be raised.
How to accomplish that within the unbox function? I found guard_null but I was not able to get it working..

Siu Kwan Lam
@sklam
@pearu An @unbox function should return a NativeValue(…, is_error=an_error_bit)
grep for NativeValue.*is_error in the code base for example
you can use cgutis.is_null() to check if pyaddr is null
The guard_null assume a different calling convention then the code in @unbox
Pearu Peterson
@pearu
This is what I currently have:
@unbox(FunctionType)
def unbox_function_type(typ, obj, c):
    print(f'UNBOX_function_type({typ}, {obj})')
    fnty = lower_nbtype(typ)
    # Assume obj is CFunc
    pyaddr = c.pyapi.object_getattr_string(obj, "_wrapper_address")
    # TODO: pyaddr == NULL, e.g. when obj is pure Python function
    ptr = c.pyapi.long_as_voidptr(pyaddr)
    # TODO: decref pyaddr?
    fptr = c.builder.bitcast(ptr, fnty.as_pointer())
    return NativeValue(fptr, is_error=c.pyapi.c_api_error())
you’ll need to have a if-else branch when there’s an error
Pearu Peterson
@pearu
yes, thanks, this is a good example.
Siu Kwan Lam
@sklam

where _load_global and _store_global load and store from an ir.GlobalVariable with linkage 'linkonce'. Will the global variable be shared between different jitted functions that call caching wrapper?

@asodeur, linkonce_odr linkage should merge the symbols. LLVM linkage is quite tricky and I usually rely on trial-and-error

Andreas Sodeur
@asodeur
@sklam and Numba creates a single LLVM module for all jitted functions?
Siu Kwan Lam
@sklam
one LLVM module per user function and it get’s link to all depending LLVM modules at the end
Andreas Sodeur
@asodeur
perfect, thx
Graham Markall
@gmarkall
Do we expect python -m numba.runtests numba.cuda.tests.cudadrvto work? As a whole, the testsuite works for me (running numba.runtests only) but with the aforementioned invocation I reliably get:
test_broadcast (numba.cuda.tests.cudadrv.test_cuda_array_slicing.CudaArraySetting) ... /home/nfs/gmarkall/numbadev/numba/numba/cuda/compiler.py:272: RuntimeWarning: divide by zero encountered in long_scalars
  blkct = (self.ntasks + tpbm1) // tpb
Fatal Python error: Segmentation fault

Current thread 0x00007fdc68cb1700 (most recent call first):
  File "/home/nfs/gmarkall/numbadev/numba/numba/cuda/cudadrv/driver.py", line 293 in safe_cuda_api_call
  File "/home/nfs/gmarkall/numbadev/numba/numba/cuda/cudadrv/driver.py", line 1622 in launch_kernel
  File "/home/nfs/gmarkall/numbadev/numba/numba/cuda/cudadrv/driver.py", line 1578 in __call__
  File "/home/nfs/gmarkall/numbadev/numba/numba/cuda/compiler.py", line 611 in _kernel_call
  File "/home/nfs/gmarkall/numbadev/numba/numba/cuda/compiler.py", line 537 in __call__
  File "/home/nfs/gmarkall/numbadev/numba/numba/cuda/compiler.py", line 275 in __call__
  File "/home/nfs/gmarkall/numbadev/numba/numba/cuda/cudadrv/devicearray.py", line 581 in _do_setitem
  File "/home/nfs/gmarkall/numbadev/numba/numba/cuda/cudadrv/devicearray.py", line 530 in __setitem__
  File "/home/nfs/gmarkall/numbadev/numba/numba/cuda/cudadrv/devices.py", line 225 in _require_cuda_context
  File "/home/nfs/gmarkall/numbadev/numba/numba/cuda/tests/cudadrv/test_cuda_array_slicing.py", line 209 in test_broadcast
  File "/home/nfs/gmarkall/miniconda3/envs/numbaenv/lib/python3.7/unittest/case.py", line 628 in run
  File "/home/nfs/gmarkall/miniconda3/envs/numbaenv/lib/python3.7/unittest/case.py", line 676 in __call__
  File "/home/nfs/gmarkall/miniconda3/envs/numbaenv/lib/python3.7/unittest/suite.py", line 122 in run
  File "/home/nfs/gmarkall/miniconda3/envs/numbaenv/lib/python3.7/unittest/suite.py", line 84 in __call__
  File "/home/nfs/gmarkall/miniconda3/envs/numbaenv/lib/python3.7/unittest/runner.py", line 176 in run
  File "/home/nfs/gmarkall/numbadev/numba/numba/testing/main.py", line 123 in run
  File "/home/nfs/gmarkall/miniconda3/envs/numbaenv/lib/python3.7/unittest/main.py", line 271 in runTests
  File "/home/nfs/gmarkall/numbadev/numba/numba/testing/main.py", line 354 in run_tests_real
  File "/home/nfs/gmarkall/numbadev/numba/numba/testing/main.py", line 369 in runTests
  File "/home/nfs/gmarkall/miniconda3/envs/numbaenv/lib/python3.7/unittest/main.py", line 101 in __init__
  File "/home/nfs/gmarkall/numbadev/numba/numba/testing/main.py", line 163 in __init__
  File "/home/nfs/gmarkall/numbadev/numba/numba/testing/__init__.py", line 75 in run_tests
  File "/home/nfs/gmarkall/numbadev/numba/numba/_runtests.py", line 28 in _main
  File "/home/nfs/gmarkall/numbadev/numba/numba/runtests.py", line 9 in <module>
  File "/home/nfs/gmarkall/miniconda3/envs/numbaenv/lib/python3.7/runpy.py", line 85 in _run_code
  File "/home/nfs/gmarkall/miniconda3/envs/numbaenv/lib/python3.7/runpy.py", line 193 in _run_module_as_main
Segmentation fault
It's a bit of a heisenbug... What I can determine so far is that the call to determine the threads per block (cuOccupancyMaxPotentialBlockSize?) is somehow returning 0
then the kernel gets configured with 0 threads, which the runtime doesn't like
stuartarchibald
@stuartarchibald
I can't reproduce but can see other failures on just invoking that line.
wonder if its due to you having newer hardware?
I see:
` numba/cuda/tests/cudadrv/test_linker.py", line 93, in test_set_registers_57 self.assertEquals(57, compiled._func.get().attrs.regs) AssertionError: 57 != 56
Graham Markall
@gmarkall
however, if I try to print out tpb (as mentioned in the DivideByZero warning) then I don't get a segfault, just a ZeroDivisionError
hmm, will poke around some more
adding a check:
diff --git a/numba/cuda/compiler.py b/numba/cuda/compiler.py
index d242e11..0b3ea5f 100644
--- a/numba/cuda/compiler.py
+++ b/numba/cuda/compiler.py
@@ -268,6 +268,8 @@ class ForAll(object):
             kernel = self.kernel

         tpb = self._compute_thread_per_block(kernel)
+        if tpb == 0:
+            raise RuntimeError("AWWW")
         tpbm1 = tpb - 1
         blkct = (self.ntasks + tpbm1) // tpb
does get me through the test suite... didn't think of that before, but knowing it works for you led me down the right direction
also, I then get:
test_set_registers_57 (numba.cuda.tests.cudadrv.test_linker.TestLinker) ... ok
what hardware do you have, @stuartarchibald ?
I wonder if it's related to the fact that some tests are CUDATestCase instances, which calls numba.cuda.reset() in tearDown()
Graham Markall
@gmarkall
Do we know what the rationale for the CUDATestCase class is? (more specifically than "some tests need the context resetting afterwards", that is :-) )
stuartarchibald
@stuartarchibald
locally GTX 750 Ti
I'm not sure, I'd have to go digging in the code :)
Graham Markall
@gmarkall
was just trying to avoid doing that myself :-)
stuartarchibald
@stuartarchibald
there's nothing high up in my memory I'm afraid
Graham Markall
@gmarkall
Going to open an issue to keep track of this. There seems to be various ways to hit this, and I suspect given that slightly different conditions make different tests fail, a valgrind run would be informative but I need to find a relatively low-pain way to do that and get far enough to trigger the bug