Where communities thrive

  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
Repo info
Matt Groth
bestDevice() gave me Java Alternative Algorithm
Matt Groth
@grfrost when you were helping me way back when you had me make a program called infocl.cpp which I ran and gave you the output of on my old mac. I've just done it on the new mac as well, and got this result:
Found 1 platform
Platform Apple
  Found 1 device
      Device Apple M1 Max supports OpenCL 1.2
Matt Groth
Tried the example repository and got the following:
matthewgroth@Matthews-MBP aparapi-examples % mvn clean package exec:java
[INFO] Scanning for projects...
[WARNING] Some problems were encountered while building the effective model for com.aparapi:aparapi-examples:jar:3.0.0
[WARNING] 'build.plugins.plugin.version' for org.codehaus.mojo:exec-maven-plugin is missing. @ line 125, column 21
[WARNING] It is highly recommended to fix these problems because they threaten the stability of your build.
[WARNING] For this reason, future Maven versions might no longer support building such malformed projects.
[INFO] Inspecting build with total of 1 modules...
[INFO] Installing Nexus Staging features:
[INFO]   ... total of 1 executions of maven-deploy-plugin replaced with nexus-staging-maven-plugin
[INFO] --------------------< com.aparapi:aparapi-examples >--------------------
[INFO] Building Aparapi Examples 3.0.0
[INFO] --------------------------------[ jar ]---------------------------------
[WARNING] The POM for com.aparapi:aparapi:jar:3.0.0-SNAPSHOT is missing, no dependency information available
[INFO] ------------------------------------------------------------------------
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  0.318 s
[INFO] Finished at: 2022-02-05T00:57:40-05:00
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal on project aparapi-examples: Could not resolve dependencies for project com.aparapi:aparapi-examples:jar:3.0.0: com.aparapi:aparapi:jar:3.0.0-SNAPSHOT was not found in https://oss.sonatype.org/content/repositories/snapshots during a previous attempt. This failure was cached in the local repository and resolution is not reattempted until the update interval of ossrh.snapshots has elapsed or updates are forced -> [Help 1]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/DependencyResolutionException
matthewgroth@Matthews-MBP aparapi-examples %
Matt Groth
It looks like the tag 3.0.0 of the example depends on aparapi 3.0.0-SNAPSHOT according to https://git.qoto.org/aparapi/aparapi-examples/-/blob/v3.0.0/pom.xml
but the README explains that if we use specific tags we don't need to install the jar ourself, right?
did a lit pom.xml edit to make it depend on the regular 3.0.0 jar instead of the SNAPSHOT and tried again. Got some ugly errors:
Found the source error in a test report:
# Created at 2022-02-05T01:08:11.584
Unrecognized VM option 'MaxPermSize=256m'

# Created at 2022-02-05T01:08:11.588
Error: Could not create the Java Virtual Machine.

# Created at 2022-02-05T01:08:11.589
Error: A fatal exception has occurred. Program will exit.
Matt Groth
found a stack overflow response saying that they removed MaxPermSize in java 8. Is the examples repo not compatible with java 8+?
Matt Groth
installed clinfo with brew and ran it:
matthewgroth@Matthews-MBP surefire-reports % clinfo
Number of platforms                               1
  Platform Name                                   Apple
  Platform Vendor                                 Apple
  Platform Version                                OpenCL 1.2 (Nov 13 2021 00:45:09)
  Platform Profile                                FULL_PROFILE
  Platform Extensions                             cl_APPLE_SetMemObjectDestructor cl_APPLE_ContextLoggingFunctions cl_APPLE_clut cl_APPLE_query_kernel_names cl_APPLE_gl_sharing cl_khr_gl_event
 Platform Name                                   Apple
Number of devices                                 1
  Device Name                                     Apple M1 Max
  Device Vendor                                   Apple
  Device Vendor ID                                x1027f00
  Device Version                                  OpenCL 1.2
  Driver Version                                  1.2 1.0
  Device OpenCL C Version                         OpenCL C 1.2
  Device Type                                     GPU
  Device Profile                                  FULL_PROFILE
  Device Available                                Yes
  Compiler Available                              Yes
  Linker Available                                Yes
  Max compute units                               32
  Max clock frequency                             000MHz
  Device Partition                                (core)
    Max number of sub-devices                     0
    Supported partition types                     None
    Supported affinity domains                    (n/a)
  Max work item dimensions                        3
  Max work item sizes                             56x256x256
  Max work group size                             256
  Preferred work group size multiple (kernel)     32
  Preferred / native vector sizes
    char                                                 1 / 1
    short                                                1 / 1
    int                                                  1 / 1
    long                                                 1 / 1
    half                                                 0 / 0        (n/a)
    float                                                1 / 1
    double                                               1 / 1        (n/a)
  Half-precision Floating-point support           (n/a)
  Single-precision Floating-point support         (core)
    Denormals                                     No
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  Yes
  Double-precision Floating-point support         (n/a)
  Address bits                                    64, Little-Endian
  Global memory size                              45812989952 (42.7GiB)
  Error Correction support                        No
  Max memory allocation                           8589934592 (GiB)
  Unified memory for Host and Device              Yes
  Minimum alignment for any data type             1 bytes
  Alignment of base address                       32768 bits (4096 bytes)
  Global Memory cache type                        None
  Image support                                   Yes
    Max number of samplers per kernel             32
    Max size for D images from buffer            268435456 pixels
   Max D or D image array size                 2048 images
    Base address alignment for D image buffers   256 bytes
    Pitch alignment for D image buffers          256 pixels
    Max D image size                             6384x16384 pixels
    Max D image size                             048x2048x2048 pixels
    Max number of read image args                 128
    Max number of write image args                8
  Local memory type                               Local
  Local memory size                               32768 (2KiB)
  Max number of constant args                     31
  Max constant buffer size                        1073741824 (024MiB)
  Max size of kernel argument                     4096 (KiB)
  Queue properties
    Out-of-order execution                        No
    Profiling                                     Yes
  Prefer user sync for interop                    Yes
  Profiling timer resolution                      000ns
  Execution capabilities
    Run OpenCL kernels                            Yes
    Run native kernels                            No
  printf() buffer size                            1048576 (024KiB)
  Built-in kernels                                (n/a)
  Device Extensions                               cl_APPLE_SetMemObjectDestructor cl_APPLE_ContextLoggingFunctions cl_APPLE_clut cl_APPLE_query_kernel_names cl_APPLE_gl_sharing cl_khr_gl_event cl_khr_byte_addressable_store cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_3d_image_writes cl_khr_image2d_from_buffer cl_khr_depth_images

NULL platform behavior
  clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...)  Apple
  clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...)   Success [P0]
  clCreateContext(NULL, ...) [default]            Success [P0]
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_DEFAULT)  Success (1)
    Platform Name                                 Apple
    Device Name                                   Apple M1 Max
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU)  Success (1)
    Platform Name                                 Apple
    Device Name                                   Apple M1 Max
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM)  Invalid device type for platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL)  Success (1)
    Platform Name                                 Apple
    Device Name                                   Apple M1 Max
matthewgroth@Matthews-MBP surefire-reports %
Matt Groth
Back to my own java. isOpenCLAvailable() returns false
Screen Shot 2022-02-05 at 1.37.33 AM.png
Matt Groth
tried manually extracting libaparapi_x86_64.dylib and adding the dir I put it in into DYLD_LIBRARY_PATH. Same results.
Matt Groth
tried manually executing NativeLoader.load() and got
Caused by: java.lang.UnsatisfiedLinkError: /private/var/folders/fq/hkrz_j5j5458x6yty_c9k0v40000gn/T/Aparapi153186659225701714/libaparapi.dylib: dlopen(/private/var/folders/fq/hkrz_j5j5458x6yty_c9k0v40000gn/T/Aparapi153186659225701714/libaparapi.dylib, 0x0001): tried: '/Users/mathewgroth/Desktop/libaparapi.dylib' (no such file), '/private/var/folders/fq/hkrz_j5j5458x6yty_c9k0v40000gn/T/Aparapi153186659225701714/libaparapi.dylib' (mach-o file, but is an incompatible architecture (have 'x86_64', need 'arm64e')), '/usr/lib/libaparapi.dylib' (no such file)
    at java.base/jdk.internal.loader.NativeLibraries.load(Native Method)
    at java.base/jdk.internal.loader.NativeLibraries$NativeLibraryImpl.open(NativeLibraries.java:383)
    at java.base/jdk.internal.loader.NativeLibraries.loadLibrary(NativeLibraries.java:227)
    at java.base/jdk.internal.loader.NativeLibraries.loadLibrary(NativeLibraries.java:169)
note this part: libaparapi.dylib' (mach-o file, but is an incompatible architecture (have 'x86_64', need 'arm64e')
if its really the case that this library is just not compatible with the new macs, maybe it should say so in the readme?
@mgroth0 That's though luck. Aparapi does not yet support ARM64 on MacOS, only x86_64.
Until that becomes available
you would need to compile Aparapi-native project in ARM64 MacOS and then adjust Aparapi-JNI to detect ARM64 architecture on MacOS and load the appropriate dylib
finally, the generated aparapi-jni.jar could be used with Aparapi and should work on ARM64 MacOS too.
@freemo:qoto.org Hi, are you around?
@mgroth0 I don't own a Mac, so I cannot be of much help
Matt Groth
I'm honestly afraid of trying to compile anything native myself :/ I just have basically zero experience there and imagine theres a significant learning curve
I try to stay on the java side, which is why this lib is so appealing!
@mgroth0 I've heard about Rosetta2 which can deal with x86_64 binaries and translate them to arm64
@mgroth0 maybe you can try a x86_64 JVM with Aparapi
Jeffrey Phillips Freeman
@CoreRasurae am now (in egypt so limited connectivity)
@freemo long time no see
Jeffrey Phillips Freeman
@CoreRasurae been in egypt for 3 months
Heya, I am getting a NPR on MethodModel#1633:
Cannot invoke "com.aparapi.internal.model.ClassModel$AttributePool$CodeEntry.getExceptionPoolEntries()" because the return value of "com.aparapi.internal.model.ClassModel$ClassModelMethod.getCodeEntry()" is null
Any ideas why/how to fix?
I can give a full stacktrace or my code if that helps
Using 3.0.1-SNAPSHOT
Same thing on 3.0.0
Does your kernel have a none abstract 'run' method ?
@KrystilizeNevaDies the exception above seems to imply that there is no 'bytecode' associated with the 'run' method. This can happen if you dont have a run method on your kernel? Also if you have an abstract run method (expecting a superclass to implement it).
@mgroth0 as @CoreRasurae suggested you may have to be the M1 'leader' here ;) Unless you want to send one of us a shiny new laptop ;) Do you have xcode on your mac? If so I can probably walk you through the steps to build, but sadly it would require some basic knowledge of C++ compilation.
@mgroth0 so I build on Mac using cmake (not using maven - have I mentioned today how much I hate maven ;) ). So if you can setup xcode + clang +cmake then I could talk you through the build process.
Hello! I'm using Ubuntu 20.04 + latest Aparapi (3.0.0) + Oracle JDK 1.8 + NVIDIA RTX 2060 (Driver 470), and Aparapi says that I'm running an unsupported OpenCL 3.0 version... I've installed opencl 2.1 on Ubuntu with success, but Aparapi still says that I'm running OpenCL 3.0. I guess that it's getting an opencl nvidia driver and not the opencl 2.1 version which I've installed - is there an way to force aparapi to use an specific libopencl.so file? Or any other clue? Thanks in advance
@epiovesam There is no need to go back with the OpenCL version, you can keep with the latest drivers, even if OpenCL 3.0. We have marked it as unsupported because we haven't done any special validation, but it seems to work. So I would suggest that you try it and only revert to a prior version if any issue arises. A special note must be made regarding the new Aparapi API for dealing with the workgroup size, that is OpenCL reports a default maximum workgroup size, but the actual maximum workgroup size allowed maybe smaller, depending on the actual compiled kernel, and the device OpenCL driver. To find what is the actual workgroup size allowed for a given kernel, please use kernel.getKernelMaxWorkGroupSize(device);
@CoreRasurae thank you very much for the clarifications, we'll keep and try with OCL 3.0. Regards
Hi again. I'm having some issues but, first of all, I'm to make sure that I'm not misunderstanding some Aparapi concepts.
Some time ago I was using OpenCL 1.2 + Ubuntu 16 + Aparapi 2.0 + Nvidia gtx old gpu , and I was able to call Kernel.execute with millions of range kernels with 268 passes, like Kernel.execute(16000000,268).
But today, with OpenCL 3.0 + Ubuntu 20 + Aparapi 3.0 + Nvidia rtx 2060, it says: "!!!!!!! Kernel overall local size: 1024 exceeds maximum kernel allowed local size of: 256 failed" . If I follow the getKernelMaxWorkGroupSize(device) info = 256, and pass only 256 range kernels it works... but it became limited to 256 kernels for each "pass" and I'm obligated to iterate (16000000/256) calls, which delays a lot the processing time compared to the OCL1.2+Aparapi 2.0 setup...
Could you please help me to understand what I'm doing wrong in this new config (or if I was doing something wrong in the old config)? Thanks again
@epiovesam There is no problem with old or the new config. The reality is that NVIDIA changed the behavior of their drivers and it has not to do with OpenCL 3.0. Due to that NVIDIA driver behavior change, we had to make the Aparapi 3.0.0 on purpose which also required a small API change. So what you need is to use the call kernel.getKernelMaxWorkGroupSize(device); to adjust the kernel allowed max work group size. So there is nothing that you can do with the new NVIDIA drivers to work with higher workgroup sizes. I mean the kernel may run until the end with a previous Aparapi version and the new NVIDIA driver, while having 1024 workgroup size, but results are not guaranteed to be consistent/correct, so it is not recommended.
You may have to find other ways of optimizing the kernel execution time.