dependabot[bot] on bundler
dependabot[bot] on bundler
Bump addressable from 2.7.0 to … (compare)
dependabot[bot] on maven
Build(deps): Bump scala-library… (compare)
matthewgroth@Matthews-MBP surefire-reports % clinfo Number of platforms 1 Platform Name Apple Platform Vendor Apple Platform Version OpenCL 1.2 (Nov 13 2021 00:45:09) Platform Profile FULL_PROFILE Platform Extensions cl_APPLE_SetMemObjectDestructor cl_APPLE_ContextLoggingFunctions cl_APPLE_clut cl_APPLE_query_kernel_names cl_APPLE_gl_sharing cl_khr_gl_event
Platform Name Apple Number of devices 1 Device Name Apple M1 Max Device Vendor Apple Device Vendor ID x1027f00 Device Version OpenCL 1.2 Driver Version 1.2 1.0 Device OpenCL C Version OpenCL C 1.2 Device Type GPU Device Profile FULL_PROFILE Device Available Yes Compiler Available Yes Linker Available Yes Max compute units 32 Max clock frequency 000MHz Device Partition (core) Max number of sub-devices 0 Supported partition types None Supported affinity domains (n/a) Max work item dimensions 3 Max work item sizes 56x256x256 Max work group size 256 Preferred work group size multiple (kernel) 32 Preferred / native vector sizes
char 1 / 1 short 1 / 1 int 1 / 1 long 1 / 1 half 0 / 0 (n/a) float 1 / 1 double 1 / 1 (n/a) Half-precision Floating-point support (n/a) Single-precision Floating-point support (core) Denormals No Infinity and NANs Yes Round to nearest Yes Round to zero Yes Round to infinity Yes IEEE754-2008 fused multiply-add Yes Support is emulated in software No Correctly-rounded divide and sqrt operations Yes Double-precision Floating-point support (n/a) Address bits 64, Little-Endian Global memory size 45812989952 (42.7GiB) Error Correction support No Max memory allocation 8589934592 (GiB) Unified memory for Host and Device Yes Minimum alignment for any data type 1 bytes Alignment of base address 32768 bits (4096 bytes) Global Memory cache type None Image support Yes Max number of samplers per kernel 32 Max size for D images from buffer 268435456 pixels
Max D or D image array size 2048 images Base address alignment for D image buffers 256 bytes Pitch alignment for D image buffers 256 pixels Max D image size 6384x16384 pixels Max D image size 048x2048x2048 pixels Max number of read image args 128 Max number of write image args 8 Local memory type Local Local memory size 32768 (2KiB) Max number of constant args 31 Max constant buffer size 1073741824 (024MiB) Max size of kernel argument 4096 (KiB) Queue properties Out-of-order execution No Profiling Yes Prefer user sync for interop Yes Profiling timer resolution 000ns Execution capabilities Run OpenCL kernels Yes Run native kernels No printf() buffer size 1048576 (024KiB) Built-in kernels (n/a) Device Extensions cl_APPLE_SetMemObjectDestructor cl_APPLE_ContextLoggingFunctions cl_APPLE_clut cl_APPLE_query_kernel_names cl_APPLE_gl_sharing cl_khr_gl_event cl_khr_byte_addressable_store cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_3d_image_writes cl_khr_image2d_from_buffer cl_khr_depth_images NULL platform behavior clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...) Apple clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...) Success [P0] clCreateContext(NULL, ...) [default] Success [P0] clCreateContextFromType(NULL, CL_DEVICE_TYPE_DEFAULT) Success (1) Platform Name Apple Device Name Apple M1 Max clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU) No devices found in platform clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU) Success (1) Platform Name Apple Device Name Apple M1 Max clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR) No devices found in platform clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM) Invalid device type for platform clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL) Success (1) Platform Name Apple Device Name Apple M1 Max matthewgroth@Matthews-MBP surefire-reports %
Caused by: java.lang.UnsatisfiedLinkError: /private/var/folders/fq/hkrz_j5j5458x6yty_c9k0v40000gn/T/Aparapi153186659225701714/libaparapi.dylib: dlopen(/private/var/folders/fq/hkrz_j5j5458x6yty_c9k0v40000gn/T/Aparapi153186659225701714/libaparapi.dylib, 0x0001): tried: '/Users/mathewgroth/Desktop/libaparapi.dylib' (no such file), '/private/var/folders/fq/hkrz_j5j5458x6yty_c9k0v40000gn/T/Aparapi153186659225701714/libaparapi.dylib' (mach-o file, but is an incompatible architecture (have 'x86_64', need 'arm64e')), '/usr/lib/libaparapi.dylib' (no such file) at java.base/jdk.internal.loader.NativeLibraries.load(Native Method) at java.base/jdk.internal.loader.NativeLibraries$NativeLibraryImpl.open(NativeLibraries.java:383) at java.base/jdk.internal.loader.NativeLibraries.loadLibrary(NativeLibraries.java:227) at java.base/jdk.internal.loader.NativeLibraries.loadLibrary(NativeLibraries.java:169)
libaparapi.dylib' (mach-o file, but is an incompatible architecture (have 'x86_64', need 'arm64e')
Cannot invoke "com.aparapi.internal.model.ClassModel$AttributePool$CodeEntry.getExceptionPoolEntries()" because the return value of "com.aparapi.internal.model.ClassModel$ClassModelMethod.getCodeEntry()" is null
kernel.getKernelMaxWorkGroupSize(device);to adjust the kernel allowed max work group size. So there is nothing that you can do with the new NVIDIA drivers to work with higher workgroup sizes. I mean the kernel may run until the end with a previous Aparapi version and the new NVIDIA driver, while having 1024 workgroup size, but results are not guaranteed to be consistent/correct, so it is not recommended.
@plusgithub We can't use Java objects directly on the GPU due to the way that Java allocates them from non-contiguous memory on the heap. We can only rely on arrays of primitives being laid out appropriately. So alas no. If your 'objects' are just holding primitives. You can use a trick whereby you represent the objects as parallel arrays of objects.
So given say
And you had an array of Records to process.....
You can allocate an array of ints for the (x and y)'s and another array for the float values
int Recordxys = new int[records.length*2];
floats Recordvalues = new int[records.length];
Then copy data from your records array to your parallel int and float array prior to kernel dispatch, then back again after the kernel dispatches.
Hopefully you have enough work to do in the kernel, to warrant this extra map step.
@BlackHat0001 Sorry I was away. Did you sort this out?
To be able to use Aparapi/OpenCL on your CPU (as well as GPU) you will need a CPU based OpenCL runtime.
Intel has one (https://www.intel.com/content/www/us/en/developer/articles/tool/opencl-drivers.html), I think Apple also has one. NVidia does not, AMD used to, but I don't think they do anymore.
What platform are you on? Mac/Windows/Linux? What processor x64/aarch64?
I can recommend the Intel ones (as an ex AMDer that hurts me to type ;) ) it maps to AVX vector instructions pretty well.
There is an Open Source project called Pocl, which works OK. So if you really wanted to you could build it. http://portablecl.org/