by

Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
    Evan Nemerson
    @nemequ
    clang's output looks much better than GCC's here: https://godbolt.org/z/es3pz5. Like I said in the issue, I think that will only happen for the simpler functions, though :(
    Evan Nemerson
    @nemequ
    I responded in the issue (before I read this). Yes, I'll go ahead and make the changes to SIMDE_NATURAL_VECTOR_SIZE and add those LTE/GTE macros
    rosbif
    @rosbif
    @nemequ Remark: I think LE/GE would be more in line with standard abbreviations than LTE/GTE but I don't really care. Could you let me know when you have done this, please? Or do you want me to do it? Then I'll create a new branch (pr/445 as you suggested) based on the new master, integrate cgt and push it to my fork where you can see and review it. When we agree on this I'll modify the others in the same vein. Would this be OK? I also have corrections to get_low and my C++/C include modifications to skel. Should I generate a separate PR for these? Maybe I need to do cgtz as well as there is nothing to simplify in cgt.
    Evan Nemerson
    @nemequ
    @rosbif it's working its way through staging now: simd-everywhere/simde@a077ff3
    Evan Nemerson
    @nemequ

    But LT/GT mean different, in this case less useful, things. I don't really mind adding LT/GT if you need them.

    Sounds good to me.

    rosbif
    @rosbif
    But I wrote LE/GE.
    Evan Nemerson
    @nemequ
    Oops, sorry. I've always seen both ways quite a bit, and have a personal preference for LTE/GTE, but you may be right that LT/GT are more common…
    rosbif
    @rosbif
    LE/GE ;-)
    Evan Nemerson
    @nemequ
    Argh. The good news is I got it right in the code I'm changing :)
    Sorry, I have my head buried in something else right now. Anyways, I'll push the change in a bit, should be in master within an hour or so.
    rosbif
    @rosbif
    OK, no problem, thank you.
    rosbif
    @rosbif
    I think it would be a good idea to translate these into semi-global macros e.g. SIMDE_NEON_CONVERT_64_TO_128 which can be easily globally enabled or disabled. This would be useful in the case of my clang bug or if we find that they are worse than auto_vectorization on some platforms. I could put this in types.h. What do you think?
    rosbif
    @rosbif
    Maybe even four semi-global macros : SIMDE_NEON_CONVERT_8X8_TO_8X16, SIMDE_NEON_CONVERT_16X4_TO_16X8, SIMDE_NEON_CONVERT_32X2_TO_32X4, and SIMDE_NEON_CONVERT_64X1_TO_64X2 because ISTM that the first is almost certainly beneficial but the last may not be.
    Evan Nemerson
    @nemequ
    @rosbif, I commented in the PR (sorry, I thought I did that earlier today). I don't think you need to be that precise; just SIMDE_NEON_ENABLE_VECTOR_EXPANSION to cover everything that should be fine. As I wrote in the PR, I'm not comfortable with the idea of mixing sizes… I think that's asking for trouble.
    rosbif
    @rosbif
    OK I'll make that SIMDE_ARM_NEON_ENABLE_VECTOR_EXPANSION as you wrote in the PR.
    rosbif
    @rosbif
    I'll make the other one SIMDE_ARM_NEON_ENABLE_SIMPLIFICATION but feel free to change it if you wish.
    Milot Mirdita
    @milot-mirdita

    I found a weird error compile error on GCC 5.4 (ubuntu 16.04 default):

    $ cat test.cpp
    #define SIMDE_ENABLE_NATIVE_ALIASES
    #include "x86/sse2.h"
    #include "smmintrin.h"
    
    int main(int argc, const char** argv) { return 0; }

    This fails with any optimization flag g++ test.cpp -O3 -msse2 and compiles without. This is somehow happening in MMseqs2 since we include SIMDe and then some stdc++ file includes x86intrin.h.

    I have a workaround by moving the include that includes SIMDe further down so it gets included after libstdc++ does it's thing.
    rosbif
    @rosbif
    @nemequ I have pushed a new version of #445 to the pr/445 branch of my fork for you to comment on. Tests are in progress (with a few stupid errors).
    Milot Mirdita
    @milot-mirdita
    Evan Nemerson
    @nemequ
    @milot-mirdita hm, I'm not sure there is much we can do about that if you're using native aliases. I'll look into it, but you may have to just make user to include the stdc++ stuff before SIMDe.
    @rosbif ok, I'll take a look in a bit; I added a bunch of implementations to the neon headers last night, working on cleaning them up and committing them now…
    @milot-mirdita the functions should be safe since the native aliases use macros, but I'd they clobber our macros we can't control what gets executed...
    Milot Mirdita
    @milot-mirdita
    Yes I figured that this is nothing that you can do much about. Probably worth to add to caveats though since the error message is very confusing.
    Evan Nemerson
    @nemequ
    Yes, definitely.
    But I'll play with it, maybe I can find a workaround
    rosbif
    @rosbif
    @nemequ For #445 Travis CI is clean now in pr/445, even WASM which is unfamiliar and was completely untested :-)
    rosbif
    @rosbif
    Oops, I see there are some C&P errors in the native aliases. I shall fix them but could you tell me what else you think I should change, please?
    What does "2 annotations" signify in the failure e-mail?
    rosbif
    @rosbif
    Native aliases are OK now.
    Evan Nemerson
    @nemequ
    @rosbif, not sure what you're referring to. maybe two different builds failed? I really don't look at the e-mails, TBH.
    BTW, is it just me or has Travis been really slow lately?
    Looks like you figured it out, all builds are passing \o/
    rosbif
    @rosbif
    I was referring to #445 which I have redone in pr/445 with our agreed modifications. Please review.
    Travis CI doesn't seem slow to me.
    Evan Nemerson
    @nemequ
    I got that, I'm just not sure what the "2 annotations" thing means. Either way, it looks like you figured it out :)
    @rosbif, I'll take a look in ~ 1 hour, there are just a couple little things I need to take care of first.
    rosbif
    @rosbif
    I don't have the e-mail on this machine. One of the tests passed but it said "2 annotations" in parentheses. It isn't important but I wondered what it meant.
    Yes I imagine you're very busy.
    Evan Nemerson
    @nemequ
    Unfortunately right now I am :(. I'm going to be pretty swamped for the next month or so…
    BTW, now that you've had a chance to use the new SIMDE_NATURAL_VECTOR_SIZE stuff, are you good with it?
    rosbif
    @rosbif
    Yes, but I wonder why it isn't defined to 64 on an MMX only machine.
    Maybe SSE (or even SSE2 because SSE isn't much use to me as I'm not a floating point person) is considered a minimum for SIMDe.
    Evan Nemerson
    @nemequ
    An MMX-only machine implies x86; baseline for x86_64 is SSE2. My understanding is that on 32-bit CPUs MMX isn't really safe because some compilers aren't smart enough to handle register allocation properly and if you mix floating point and MMX code they may clobber values.
    TBH I'm rather mystified by this; you would think that the compiler would be able to track which registers are used for FP and which are used for MMX… I think modern compilers can do this, but I know MSVC will issue warnings if you mix FP and MMX, and since I'm not really familiar with all the details I'm really nervous about using MMX for anything on x86.
    rosbif
    @rosbif
    OK, thanks for the explanation. I hadn't realized that MMX-only implies a 32-bit machine.
    Evan Nemerson
    @nemequ
    Probably a 32-bit machine from the late 90s, or maybe early 2000s. I'd still like to figure out how to handle it properly, but it's really not a practical concern.
    That's why I think implementing MMX using SSE would be awesome, though.
    rosbif
    @rosbif
    Implementing MMX on 128-bit SIMD would also let MMX code run with SIMD on AltiVec or WASM ;-)
    Evan Nemerson
    @nemequ
    Yep, which should be a very nice speed-up on those platforms. I'm just not sure how much MMX code is out there at this point…