Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
  • 00:06

    nemequ on master

    avx512/cvt: add simde_mm{_mask,… avx512/cvtt: add simde_mm{_mask… sse4.1: add some casts to make … and 1 more (compare)

  • Apr 16 22:56

    nemequ on staging

    avx512/cvt: add simde_mm{_mask,… avx512/cvtt: add simde_mm{_mask… sse4.1: add some casts to make … and 1 more (compare)

  • Apr 14 19:12
    nemequ edited #609
  • Apr 14 19:05
    nemequ edited #609
  • Apr 14 19:05
    nemequ edited #609
  • Apr 14 19:05
    nemequ edited #609
  • Apr 14 12:56

    nemequ on master

    Add some files to .gitignore Initial import of a portable SV… (compare)

  • Apr 14 03:13

    nemequ on sve

    Initial import of a portable SV… (compare)

  • Apr 14 01:42

    nemequ on sve

    Initial import of a portable SV… (compare)

  • Apr 14 01:08

    nemequ on sve

    Initial import of a portable SV… (compare)

  • Apr 13 12:45

    nemequ on sve

    Add some files to .gitignore Initial import of a portable SV… (compare)

  • Apr 13 07:27
    junaruga commented #745
  • Apr 12 23:53

    nemequ on staging

    Initial import of a portable SV… (compare)

  • Apr 12 19:27
    nemequ closed #739
  • Apr 12 19:27

    nemequ on master

    avx512: add several new functio… avx512: add tests for previous … (compare)

  • Apr 11 21:02

    nemequ on staging

    avx512: add tests for previous … (compare)

  • Apr 10 19:20

    nemequ on staging

    avx512: add several new functio… avx512: add tests for previous … (compare)

  • Apr 10 17:01
    nemequ closed #748
  • Apr 10 17:01
    nemequ closed #747
  • Apr 10 17:01
    nemequ closed #746
Christopher Moore
@rosbif
@nemequ Thank you, that works fine.
1 reply
Christopher Moore
@rosbif
@nemequ Do you remember that the Intel compiler generated bad code for some of my GFNI implementations with SIMDE_VECTORIZE and Michael (or maybe you) conditionally #ifdefed out SIMDE_VECTORIZE everywhere?
I would like to remove as many of these #ifdefs as possible.
I am also wondering whether this may have been due to the extra complexity for the compiler caused by using simde prefixed intrinsics in the implementations and I would like to remove the prefixes (except obviously where it is necessary or offers real advantages).
Unfortunately I don't have the Intel compiler installed and I don't really want to install it on my old laptop which has very limited resources.
So this might entail many iterations through CI which could eat up valuable tokens, as I understand was the case with Travis.
What do you think?
7 replies
simba611
@simba611
hello, i was trying to implement _mm512_mask_or_pd function and while generating tests i encountered "SIMDE_MATH_NAN"in one of the test vectors. Also, another test vector had "Not enough space to write value (given 53 bytes, need 331 bytes)
SIMDE_FLOAT64_C(-14935775276824340548037239704596964,".
Can someone advise me about what to do with these ?
Evan Nemerson
@nemequ
@simba611, It sounds like you're trying to do bitwise operations of floats which is very tricky. Even if you assume IEEE 754 representation, you have to be very careful about crafting the values so they don't write tons of data to the test cases. It's generally better to just avoid this by creating tests which generate and check integer values, and use functions like _mm512_castsi512_pd and _mm512_castsi512_ps to convert to floats, perform the bitwise operation, then use _mm512_castpd_si512 / _mm512_castps_si512 to convert the results back to integer types.
simba611
@simba611
thanks alot, i will look into that
also, x86 svml test for intel-all-gcc-10 test is resulting in timeout for me, is that something i should be worried about ?
Evan Nemerson
@nemequ
It's being executed with an emulator (all the intel-all-* tests are) so it's pretty slow, though I haven't seen a timeout before.
Are you running docker inside of a vm or something?
simba611
@simba611
no, native ubuntu 18.04
Evan Nemerson
@nemequ
What model is your CPU?
simba611
@simba611
intel i7 9th generation, i don't remember the exact model. do you want me to check ?
Evan Nemerson
@nemequ
Nah, I wouldn't expect any coffee lake i7 to be a problem.
simba611
@simba611
Intel(R) Core(TM) i7-9750H CPU
Evan Nemerson
@nemequ
Can you run the test individually? Something like time sde64 -- ./test/x86/svml-native-c should tell you how long it takes…
simba611
@simba611
sure, i will check
it ran fine indivisually, real 0m8.181s
user 0m6.678s
sys 0m1.397s
Evan Nemerson
@nemequ
Hm, not even particularly slow. I'm not sure what is going on there…
simba611
@simba611
Hi, I attempted to code _mm512_mask_or_pd and made the changes you suggested with respect to using integers for testing.
Can you please look at them once to make sure everything is fine since this is my first attempt at implementing a function ?
The changes I made are reflected in these two commit patches -
  1. simba611/simde@ae16975
    and then to take care of the problem due to floats,
  2. simba611/simde@5e1a1f1
Evan Nemerson
@nemequ
@simba611, looks good to me.
simba611
@simba611
It took slightly longer than expected to get used to the notation. I will send a pull request for this and start working on more functions.
simba611
@simba611
Hi, I submitted the pull request but one required test failed, for emscripten, with errors of the type "constant expression evaluates to -123188372992032540 which cannot be narrowed to type 'int_fast32_t' (aka 'int') [-Wc++11-narrowing]. " I will make sure to test for emscripten in the future. Are there any specific compilers I should always test with or should I test all of them always ?
Also, can someone please explain the cause for this error ?
Evan Nemerson
@nemequ
@simba611, the answer is a bit long, so I commented on the PR.
simba611
@simba611
i understand now, I noticed the use of simde_mm512_set_epi64 in other functions but didn't completely understand it. I will fix this error and keep this in mind for next time.
thanks for the comment
Evan Nemerson
@nemequ
NP, this is supposed to be a learning experience and it's a very easy mistake to make :). I should have noticed it when you sent the patch yesterday, too.
Evan Nemerson
@nemequ
@simba611, the patch looks good to me, and MSVC is okay with it now, too. Pretty confident I'll be able to merge it once CI finishes.
simba611
@simba611
Thanks. Should I go ahead and implement the rest of the functions from the DQ family ?
Evan Nemerson
@nemequ
AVX-512DQ is pretty big, but yeah any functions which aren't already implemented we need.
Evan Nemerson
@nemequ
@simba611, here is an up-to-date list of missing functions: https://github.com/simd-everywhere/implementation-status/blob/main/x86.md
simba611
@simba611
Thanks, i started working on mask(z)_xxx_p* type functions because they are all very similar. Am done with xor and or in this category. Planning to work on mullo family now
simba611
@simba611
Anything particular you would like me to work on or should i continue like this ?
simba611
@simba611

Thanks, i started working on mask(z)_xxx_p* type functions because they are all very similar. Am done with xor and or in this category. Planning to work on mullo family now

Finished this, planning on working on the insert family.

Milot Mirdita
@milot-mirdita
Evan Nemerson
@nemequ
Hm, not really sure what to put there…
Milot Mirdita
@milot-mirdita
I would suggest to put a redirect to the blog, beats a 404
1 reply
Evan Nemerson
@nemequ
@mr-c, I'm going to try to expand this (a lot) into something more reusable, but: https://paste.centos.org/view/e1245e50
3 replies
simba611
@simba611
Hi, I implemented mm512_insertf32x8 with their mask(z) variants. They were slightly different from the other functions I have implemented so far.
Could you please review it once ? This is the commit patch -
simba611/simde@384831a
Evan Nemerson
@nemequ
@simba611, you can just file a PR to get it reviewed; it's easier to comment on the code that way. What you have looks mostly right, though you should be using float instead of int32_t in the tests. Using integers instead of floats is really only for bitwise operations since they regularly create NaNs, but likely will not have exactly the same bit pattern as SIMDE_MATH_NAN / SIMDE_MATH_NANF, so there isn't a good way to recreate the exact bit pattern without using integer types instead. Also, instead of choosing a couple fixed values for the immediate mode parameters, you can use the SIMDE_CONSTIFY_* macros in the tests. For an example, see https://github.com/simd-everywhere/simde/blob/854f91319dacd1d91c5990c0334ff774eaf5c6b4/test/x86/xop.c#L7011
Evan Nemerson
@nemequ
The CONSTIFY stuff is newer than a lot of the tests, especially the x86 ones (which are generally the oldest), so a lot of tests don't use them, but newer tests should. They are pretty straightforward, see simde-constify.h. Basically, they create a switch which chooses a constant value based on a non-constant one. So for simde_mm512_insertf32x8 the imm8 parameter should be one of two possible values (0 or 1), so SIMDE_CONSTIFY_2_ would be appropriate.
simba611
@simba611
Thanks. Will correct that.
Evan Nemerson
@nemequ
I just pushed a wip/sve branch for SVE support if anyone wants to play around a bit.
Evan Nemerson
@nemequ
It's a pretty early prototype, but it works on gcc 10.
Bolarinwa Saheed Olayemi
@refactormyself
Hi, SIMDe community is there a publicly hosted docker image. I don't have disk space to build the docker image. I have tried several ways except for the cloud
Evan Nemerson
@nemequ
There isn't, no, but it's pretty easy to go from a basic image to something which will build SIMDe; our requirements are pretty light (basically, a C compiler). The docker image is just large because we install lots of emulators and compilers, but if you just want to target one or two architectures you really just need meson, git, and either gcc or clang.
@refactormyself, if you start with a basic debian or ubuntu image, I think apt-get install meson git gcc g++ will get you everything you need without wasting a ton of space. Or, if you just want to use SIMDe in your code you don't really need anything additional; just put a copy of simde in your source directory and include the header(s) you want, no build system necessary.
Milot Mirdita
@milot-mirdita
Also https://github.com/simd-everywhere/simde-no-tests is really neat to include as a git subtree/submodule. Makes everything super easy.
Evan Nemerson
@nemequ
I finally merged the beginning of an SVE implementation. Only a few functions so far, but at least the scaffolding is in place if anyone wants to have a go at implementing some functions.
3 replies
Milot Mirdita
@milot-mirdita
The x86, arm and wasm (to a lesser extent) folders are each getting quite large. Maybe they should also have repos like simde-no-tests? (Or I should switch to the amalgamated builds).
Evan Nemerson
@nemequ
Hm, I guess we could add simde-no-tests-* repos for all the different ISAs if you want. It's not really any extra effort, just a bit of time to set up.
Evan Nemerson
@nemequ
Maybe simde-no-tests-{x86,neon,sve,wasm} (or just simde-{x86,neon,sve,wasm}), in addition to simde-no-tests.