Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
  • Nov 21 06:04
    Musador13 starred Syncleus/aparapi-examples
  • Oct 25 19:27
    GZGavinZhao starred Syncleus/aparapi
  • Oct 23 22:12
    david-d25 starred Syncleus/aparapi
  • Oct 21 07:07
    edom71 starred Syncleus/aparapi
  • Oct 12 22:35
    Raunak-Singh-Inventor closed #168
  • Oct 12 22:35
    Raunak-Singh-Inventor closed #168
  • Oct 11 09:03
    grfrost commented #168
  • Oct 08 12:13
    treyperry starred Syncleus/aparapi
  • Oct 06 09:01
    dependabot[bot] closed #5
  • Oct 06 09:01

    dependabot[bot] on bundler

    (compare)

  • Oct 06 09:01
    dependabot[bot] commented #5
  • Oct 06 09:01
    dependabot[bot] labeled #8
  • Oct 06 09:01
    dependabot[bot] opened #8
  • Oct 06 09:01

    dependabot[bot] on bundler

    Bump addressable from 2.7.0 to … (compare)

  • Oct 03 14:25
    yqy7 starred Syncleus/aparapi
  • Oct 02 14:47
    Kevin-HYX starred Syncleus/aparapi
  • Sep 30 17:23
    dependabot[bot] labeled #170
  • Sep 30 17:23

    dependabot[bot] on maven

    Build(deps): Bump scala-library… (compare)

  • Sep 30 17:23
    dependabot[bot] opened #170
  • Sep 16 02:49
    lakxtxue starred Syncleus/aparapi
Matt Groth
@mgroth0
matthewgroth@Matthews-MBP surefire-reports % clinfo
Number of platforms                               1
  Platform Name                                   Apple
  Platform Vendor                                 Apple
  Platform Version                                OpenCL 1.2 (Nov 13 2021 00:45:09)
  Platform Profile                                FULL_PROFILE
  Platform Extensions                             cl_APPLE_SetMemObjectDestructor cl_APPLE_ContextLoggingFunctions cl_APPLE_clut cl_APPLE_query_kernel_names cl_APPLE_gl_sharing cl_khr_gl_event
 Platform Name                                   Apple
Number of devices                                 1
  Device Name                                     Apple M1 Max
  Device Vendor                                   Apple
  Device Vendor ID                                x1027f00
  Device Version                                  OpenCL 1.2
  Driver Version                                  1.2 1.0
  Device OpenCL C Version                         OpenCL C 1.2
  Device Type                                     GPU
  Device Profile                                  FULL_PROFILE
  Device Available                                Yes
  Compiler Available                              Yes
  Linker Available                                Yes
  Max compute units                               32
  Max clock frequency                             000MHz
  Device Partition                                (core)
    Max number of sub-devices                     0
    Supported partition types                     None
    Supported affinity domains                    (n/a)
  Max work item dimensions                        3
  Max work item sizes                             56x256x256
  Max work group size                             256
  Preferred work group size multiple (kernel)     32
  Preferred / native vector sizes
    char                                                 1 / 1
    short                                                1 / 1
    int                                                  1 / 1
    long                                                 1 / 1
    half                                                 0 / 0        (n/a)
    float                                                1 / 1
    double                                               1 / 1        (n/a)
  Half-precision Floating-point support           (n/a)
  Single-precision Floating-point support         (core)
    Denormals                                     No
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  Yes
  Double-precision Floating-point support         (n/a)
  Address bits                                    64, Little-Endian
  Global memory size                              45812989952 (42.7GiB)
  Error Correction support                        No
  Max memory allocation                           8589934592 (GiB)
  Unified memory for Host and Device              Yes
  Minimum alignment for any data type             1 bytes
  Alignment of base address                       32768 bits (4096 bytes)
  Global Memory cache type                        None
  Image support                                   Yes
    Max number of samplers per kernel             32
    Max size for D images from buffer            268435456 pixels
   Max D or D image array size                 2048 images
    Base address alignment for D image buffers   256 bytes
    Pitch alignment for D image buffers          256 pixels
    Max D image size                             6384x16384 pixels
    Max D image size                             048x2048x2048 pixels
    Max number of read image args                 128
    Max number of write image args                8
  Local memory type                               Local
  Local memory size                               32768 (2KiB)
  Max number of constant args                     31
  Max constant buffer size                        1073741824 (024MiB)
  Max size of kernel argument                     4096 (KiB)
  Queue properties
    Out-of-order execution                        No
    Profiling                                     Yes
  Prefer user sync for interop                    Yes
  Profiling timer resolution                      000ns
  Execution capabilities
    Run OpenCL kernels                            Yes
    Run native kernels                            No
  printf() buffer size                            1048576 (024KiB)
  Built-in kernels                                (n/a)
  Device Extensions                               cl_APPLE_SetMemObjectDestructor cl_APPLE_ContextLoggingFunctions cl_APPLE_clut cl_APPLE_query_kernel_names cl_APPLE_gl_sharing cl_khr_gl_event cl_khr_byte_addressable_store cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_3d_image_writes cl_khr_image2d_from_buffer cl_khr_depth_images

NULL platform behavior
  clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...)  Apple
  clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...)   Success [P0]
  clCreateContext(NULL, ...) [default]            Success [P0]
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_DEFAULT)  Success (1)
    Platform Name                                 Apple
    Device Name                                   Apple M1 Max
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU)  Success (1)
    Platform Name                                 Apple
    Device Name                                   Apple M1 Max
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM)  Invalid device type for platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL)  Success (1)
    Platform Name                                 Apple
    Device Name                                   Apple M1 Max
matthewgroth@Matthews-MBP surefire-reports %
Back to my own java. isOpenCLAvailable() returns false
Matt Groth
@mgroth0
Screen Shot 2022-02-05 at 1.37.33 AM.png
Matt Groth
@mgroth0
tried manually extracting libaparapi_x86_64.dylib and adding the dir I put it in into DYLD_LIBRARY_PATH. Same results.
Matt Groth
@mgroth0
tried manually executing NativeLoader.load() and got
Caused by: java.lang.UnsatisfiedLinkError: /private/var/folders/fq/hkrz_j5j5458x6yty_c9k0v40000gn/T/Aparapi153186659225701714/libaparapi.dylib: dlopen(/private/var/folders/fq/hkrz_j5j5458x6yty_c9k0v40000gn/T/Aparapi153186659225701714/libaparapi.dylib, 0x0001): tried: '/Users/mathewgroth/Desktop/libaparapi.dylib' (no such file), '/private/var/folders/fq/hkrz_j5j5458x6yty_c9k0v40000gn/T/Aparapi153186659225701714/libaparapi.dylib' (mach-o file, but is an incompatible architecture (have 'x86_64', need 'arm64e')), '/usr/lib/libaparapi.dylib' (no such file)
    at java.base/jdk.internal.loader.NativeLibraries.load(Native Method)
    at java.base/jdk.internal.loader.NativeLibraries$NativeLibraryImpl.open(NativeLibraries.java:383)
    at java.base/jdk.internal.loader.NativeLibraries.loadLibrary(NativeLibraries.java:227)
    at java.base/jdk.internal.loader.NativeLibraries.loadLibrary(NativeLibraries.java:169)
note this part: libaparapi.dylib' (mach-o file, but is an incompatible architecture (have 'x86_64', need 'arm64e')
if its really the case that this library is just not compatible with the new macs, maybe it should say so in the readme?
CoreRasurae
@CoreRasurae
@mgroth0 That's though luck. Aparapi does not yet support ARM64 on MacOS, only x86_64.
Until that becomes available
you would need to compile Aparapi-native project in ARM64 MacOS and then adjust Aparapi-JNI to detect ARM64 architecture on MacOS and load the appropriate dylib
finally, the generated aparapi-jni.jar could be used with Aparapi and should work on ARM64 MacOS too.
CoreRasurae
@CoreRasurae
@freemo:qoto.org Hi, are you around?
CoreRasurae
@CoreRasurae
@mgroth0 I don't own a Mac, so I cannot be of much help
Matt Groth
@mgroth0
I'm honestly afraid of trying to compile anything native myself :/ I just have basically zero experience there and imagine theres a significant learning curve
I try to stay on the java side, which is why this lib is so appealing!
CoreRasurae
@CoreRasurae
@mgroth0 I've heard about Rosetta2 which can deal with x86_64 binaries and translate them to arm64
@mgroth0 maybe you can try a x86_64 JVM with Aparapi
Jeffrey Phillips Freeman
@freemo
@CoreRasurae am now (in egypt so limited connectivity)
CoreRasurae
@CoreRasurae
Hi
@freemo long time no see
Jeffrey Phillips Freeman
@freemo
@CoreRasurae been in egypt for 3 months
KrystilizeNevaDies
@KrystilizeNevaDies
Heya, I am getting a NPR on MethodModel#1633:
Cannot invoke "com.aparapi.internal.model.ClassModel$AttributePool$CodeEntry.getExceptionPoolEntries()" because the return value of "com.aparapi.internal.model.ClassModel$ClassModelMethod.getCodeEntry()" is null
Any ideas why/how to fix?
I can give a full stacktrace or my code if that helps
Using 3.0.1-SNAPSHOT
Same thing on 3.0.0
grfrost
@grfrost
Does your kernel have a none abstract 'run' method ?
@KrystilizeNevaDies the exception above seems to imply that there is no 'bytecode' associated with the 'run' method. This can happen if you dont have a run method on your kernel? Also if you have an abstract run method (expecting a superclass to implement it).
grfrost
@grfrost
@mgroth0 as @CoreRasurae suggested you may have to be the M1 'leader' here ;) Unless you want to send one of us a shiny new laptop ;) Do you have xcode on your mac? If so I can probably walk you through the steps to build, but sadly it would require some basic knowledge of C++ compilation.
@mgroth0 so I build on Mac using cmake (not using maven - have I mentioned today how much I hate maven ;) ). So if you can setup xcode + clang +cmake then I could talk you through the build process.
epiovesam
@epiovesam
Hello! I'm using Ubuntu 20.04 + latest Aparapi (3.0.0) + Oracle JDK 1.8 + NVIDIA RTX 2060 (Driver 470), and Aparapi says that I'm running an unsupported OpenCL 3.0 version... I've installed opencl 2.1 on Ubuntu with success, but Aparapi still says that I'm running OpenCL 3.0. I guess that it's getting an opencl nvidia driver and not the opencl 2.1 version which I've installed - is there an way to force aparapi to use an specific libopencl.so file? Or any other clue? Thanks in advance
CoreRasurae
@CoreRasurae
@epiovesam There is no need to go back with the OpenCL version, you can keep with the latest drivers, even if OpenCL 3.0. We have marked it as unsupported because we haven't done any special validation, but it seems to work. So I would suggest that you try it and only revert to a prior version if any issue arises. A special note must be made regarding the new Aparapi API for dealing with the workgroup size, that is OpenCL reports a default maximum workgroup size, but the actual maximum workgroup size allowed maybe smaller, depending on the actual compiled kernel, and the device OpenCL driver. To find what is the actual workgroup size allowed for a given kernel, please use kernel.getKernelMaxWorkGroupSize(device);
epiovesam
@epiovesam
@CoreRasurae thank you very much for the clarifications, we'll keep and try with OCL 3.0. Regards
epiovesam
@epiovesam
Hi again. I'm having some issues but, first of all, I'm to make sure that I'm not misunderstanding some Aparapi concepts.
Some time ago I was using OpenCL 1.2 + Ubuntu 16 + Aparapi 2.0 + Nvidia gtx old gpu , and I was able to call Kernel.execute with millions of range kernels with 268 passes, like Kernel.execute(16000000,268).
But today, with OpenCL 3.0 + Ubuntu 20 + Aparapi 3.0 + Nvidia rtx 2060, it says: "!!!!!!! Kernel overall local size: 1024 exceeds maximum kernel allowed local size of: 256 failed" . If I follow the getKernelMaxWorkGroupSize(device) info = 256, and pass only 256 range kernels it works... but it became limited to 256 kernels for each "pass" and I'm obligated to iterate (16000000/256) calls, which delays a lot the processing time compared to the OCL1.2+Aparapi 2.0 setup...
Could you please help me to understand what I'm doing wrong in this new config (or if I was doing something wrong in the old config)? Thanks again
CoreRasurae
@CoreRasurae
@epiovesam There is no problem with old or the new config. The reality is that NVIDIA changed the behavior of their drivers and it has not to do with OpenCL 3.0. Due to that NVIDIA driver behavior change, we had to make the Aparapi 3.0.0 on purpose which also required a small API change. So what you need is to use the call kernel.getKernelMaxWorkGroupSize(device); to adjust the kernel allowed max work group size. So there is nothing that you can do with the new NVIDIA drivers to work with higher workgroup sizes. I mean the kernel may run until the end with a previous Aparapi version and the new NVIDIA driver, while having 1024 workgroup size, but results are not guaranteed to be consistent/correct, so it is not recommended.
You may have to find other ways of optimizing the kernel execution time.
Freemo
@freemo:qoto.org
[m]
Hi, im back behind internet again if anyone needs anything
epiovesam
@epiovesam
@CoreRasurae thanks for the info! Understood, and we'll try to find other ways. Regards
Scuffi
@plusgithub
Hey all, just wondering if there's anyway to fetch an object based on the global id in a kernel so I can use it? Right now I've tried a few methods but get errors as only primitive datatypes are supported in arrays. If anyone has any idea if there would/could be a way to fetch and use objects would be great, thanks :)
grfrost
@grfrost

@plusgithub We can't use Java objects directly on the GPU due to the way that Java allocates them from non-contiguous memory on the heap. We can only rely on arrays of primitives being laid out appropriately. So alas no. If your 'objects' are just holding primitives. You can use a trick whereby you represent the objects as parallel arrays of objects.

So given say

class Record{
int x,y;
float value;
}

And you had an array of Records to process.....

You can allocate an array of ints for the (x and y)'s and another array for the float values

int Recordxys[] = new int[records.length*2];
floats Recordvalues[] = new int[records.length];

Then copy data from your records array to your parallel int and float array prior to kernel dispatch, then back again after the kernel dispatches.

Hopefully you have enough work to do in the kernel, to warrant this extra map step.

Dan Marcovecchio
@BlackHat0001
Does aparapi have any plans to support opencl 3.0?
grfrost
@grfrost
@BlackHat0001 When you say support. Do you mean support all features of OpenCL kernel Language 3.0 (which is likely never going to happen) or allow one to run Aparapi against an Open 3.0 compatible runtime. ? Do you have a particular vendor in mind ( I assume NVidia/Intel). As Aparapi maps bytecode to OpenCL kernel, we need to ensure that we create code that works all the way back to OpenCL 1.0 (well 1.2 probably). We could snoop the device and code possibly include patches for specific new features (pipes/lane aware instructions), but that is a lot of work, and leads to testing issues. We can only test on devices we have access to.
Do you have a specific OpenCL 3.0 feature in mind. Just curions.
Dan Marcovecchio
@BlackHat0001
@grfrost Apologies it appears my problem led me to blaming some ridiculous idea about OpenCL compatibility. I do not actually require OpenCL 3.0 full support.
Although I am having trouble selecting specific devices. I can only seem to use my GPU and I cant figure out how to multithread over the CPU
grfrost
@grfrost

@BlackHat0001 Sorry I was away. Did you sort this out?

To be able to use Aparapi/OpenCL on your CPU (as well as GPU) you will need a CPU based OpenCL runtime.

Intel has one (https://www.intel.com/content/www/us/en/developer/articles/tool/opencl-drivers.html), I think Apple also has one. NVidia does not, AMD used to, but I don't think they do anymore.

What platform are you on? Mac/Windows/Linux? What processor x64/aarch64?
I can recommend the Intel ones (as an ex AMDer that hurts me to type ;) ) it maps to AVX vector instructions pretty well.

There is an Open Source project called Pocl, which works OK. So if you really wanted to you could build it. http://portablecl.org/

CoreRasurae
@CoreRasurae
@BlackHat0001 Or do you mean to use CPU multi-threading to share the GPU across multiple GPU jobs? If the latter is the case, then you can do so, yes. Just ensure that you have an independent Kernel instance per each CPU thread and that you GPU has enough memory to accommodate all the kernel calls simultaneously.
CoreRasurae
@CoreRasurae

@BlackHat0001 There are several methods to select the specific device, although I recommend this one:

    public static List<OpenCLDevice> listDevices(OpenCLDevice.TYPE type) {
        final ArrayList<OpenCLDevice> results = new ArrayList<>();

        for (final OpenCLPlatform p : OpenCLPlatform.getUncachedOpenCLPlatforms()) {
            for (final OpenCLDevice device : p.getOpenCLDevices()) {
                if (type == null || device.getType() == type) {
                    results.add(device);
                }
            }
        }

        return results;
    }

and from there (as an example only):

    final Range range = Range.create2D(device, sideGlobal, sideGlobal, sideLocal, sideLocal);
    kernel.execute(range);