Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
    Steven R. Brandt
    @stevenrbrandt
    schedule deriv_order_8 as my_deriv at foo {
        LANG: C
        READS: mygf(everywhere)
        WRITES: mygf_x(interior), mygf_yy(interior)
    }
    or something?
    Roland Haas
    @rhaas80
    probably would have bad cache behaviour.
    Steven R. Brandt
    @stevenrbrandt
    Derivatives often do, but we still need them. I just think it would be good to have a convenient way to compute spatial derivatives, just as we have a convenient way to compute time derivatives.
    It just seems to me like coding nth order derivatives is something that should be done once.
    Separating that out might make it easier to switch to a DG framework
    What I'm trying to figure out is, what is the right way to do it? Should it be done with a schedule block, as above, from Thorn FiniteDifference or something? Should it just be a function call that checks to see that it has a function everywhere?
    Steven R. Brandt
    @stevenrbrandt
    I'm also wondering if there's a way to overlap sync and computation. What if we had a loop like loop_innerint and loop_outerint. The idea being that the "inner interior" region is the region that's safe to compute for a stencil code that needs ghosts even if the ghosts aren't present and "outer interior" might be just the region where a stencil would require ghosts.
    jaykalinani
    @jaykalinani

    Hi all,

    I performed the spherical shock test with GRHydroToyGPU, and the movies of rho, eps, press and velz (made via VisIt using 'slice' operation) can be found in the following Gdrive folder:

    https://drive.google.com/drive/folders/1s2waj7PdOqvUCkBhyH2ikMmSReYw0H8b?usp=sharing

    Hope you are able to access the movies, but if not, please let me know. For rho and press, I have also included the mesh. The initial data for this test case ("spherical shock") is set via GRHydroToyGPU/src/initialize.cxx. The parfile used for this test has also been added to the repository.

    About the results, looking at the movie of rho, it looks like the inner regions collapse while the outer layers are pushed out. However, towards the end of the simulation, at the last time stamp, there is a small blob of matter that appears in the center. Similar behaviour can be noticed in eps. Therefore, I am not completely sure if these results are correct.

    @lennoggi and @fedelopezar did you maybe work on performing such a test in the last days?

    Maybe we could compare these results with the same test performed with HydroToyGPU? What do you think?
    Thanks!

    Roland Haas
    @rhaas80
    Reminder: those (you know who you are) that missed filling out the when2meet poll https://www.when2meet.com/?12915229-NGyc2 and asked for it to be re-sent please fill out the poll so that an up-to-date set of possible meeting times could be distilled.
    Federico G Lopez Armengol
    @fedelopezar
    Hi @jaykalinani , I'll reproduce the test and let you know.
    Federico G Lopez Armengol
    @fedelopezar
    @jaykalinani I've evolved for a bit longer and, indeed, it looks like the center gets reset to atmo. Here rho and press: https://drive.google.com/drive/folders/1LY_iHA4n3HhxGq6LvhpweXG1v3E72xbu?usp=sharing
    For that version of the flesh, CarpetX-specific declarations are generated and made available via DECLARE_CCT_ARGUMENTS_func_name
    But only for functions that declare their indexing in the interface.ccl using the index property, e.g. index={CCV}
    The above example would generate constexpr array<int, dim> CCV_centered = {1, 1, 0}; which could be used in loop_int(), etc.
    Erik Schnetter
    @eschnett
    i don't think functions should declare a centering, that is too limiting.
    often, you want to have multiple loops in a function, and they can have different centerings.
    it would be good if there was a way to access the centering of a variable efficiently. then one could write
    loop<gxx>(.....),
    and the loop macro could determing the centering from the variable name gxx.
    Roland Haas
    @rhaas80
    @eschnett: those are additions to interface.ccl so what it does is change:
    CCTK_REAL foo tag="index={1 1 1}"
    to
    CCTK_REAL foo index={CCC}
    not schedule.ccl. The code in DECLARE_CCTK_ARGUMENTS_funcname then would set up the correct layout and G5FIndex (or whatever the name is) for the gf_foo helper object. So most of the boilerplate at the top of functions.
    @stevenrbrandt: one likley needs a:
    CCTK_INDEX_DECLARE_ONCE or so to create all possible layout objects (for CCC, CCV, CVC, CCV etc) and then re-use those the constructor calls for gf_foo etc. This is b/c the compilers are not smart enough to reduce multiple instances of idnetical objects (the layout object) to a single object. It must be teh same object (thisis why there is a gf5index and not I think a gf2index).
    Erik Schnetter
    @eschnett
    ah, misunderstanding. steve said "functions", and i thought he meant "scheduled functions", but he meant "grid functions". all is good, ignore my comment.
    yes, you are correct regarding the layouts. these would be declared as part of DECLARE_CCTK_ARGUMENTS.
    Roland Haas
    @rhaas80
    the pull request would benefit from you taking a look though. Namely the bit that deals with adding code to rdwr.pl which would create the helper object declarations (but needs to use a delayed setting type method similar to LoopControl overwritten CCTK_LOOP3_ALL).
    Erik Schnetter
    @eschnett
    i will have a look
    Roland Haas
    @rhaas80
    Minutes for call from 2021-09-28 (reconstructed from my paper notes):
    • there was some work on GRHydroX
    • Erik and Bruno suggest to ask Ian Hawke about issue about where to put lapse in the flux and RHS calculation
    • discussed option for GPU acceleration in prolongation ops, Roland volunteers to take a stab at this.
    Erik Schnetter
    @eschnett
    thanks!
    Federico G Lopez Armengol
    @fedelopezar
    CactusAMReX
    Meeting minutes 10/13/2021
    • S.Brandt modifications in the flesh regarding indexing are still to merge. There are other modifications of the flesh still to merge, regarding a multipatch system (E.Schnetter).
    • J.Kalinani run the Balsara test but did not reproduce the analytical solution. Some suggestions: repeat with periodic BC and a larger domain, at least in dim X.
    • S.Cupp update on profiling bbh runs, and comparing them with Carpet performance. CarpetX seems slower. Looking into synchronization issues. Some suggestions: Check out nlevels in VisIt; try hpctoolkit to find bottlenecks.
    • Hands-On: Implementing HydroInitial to extend the ID setup of HydroBase. In the end, the thorn compiles and runs.
    Federico G Lopez Armengol
    @fedelopezar
    PS: S.Cupp results and updates can be found at https://docs.einsteintoolkit.org/et-docs/CarpetX
    Steven R. Brandt
    @stevenrbrandt
    In my 2D runs, loop_bnd() does nothing. Digging deeper, I find that bbox=0,0,0,0,0,0 at all times. Not sure where this gets set.
    1 reply
    Roland Haas
    @rhaas80

    Concerning whether storing a w_lorentz to save a sqrt may help (though if that is the reason that I would rather have evolution codes store it) I wrote a quick test code that computes sqrt on many doubles and also just copies them around. The code is in this gist (all one Makefile, running "make" will run the test, I use MPI_Wtime to get time information so you may have to have a MPI stack around): https://gist.github.com/rhaas80/3eef8605b26ec76035b766f2420e4f6e

    On my workstation (old) I get:

    sqrt took 404.949ms for 7.45654e+07
    copy took 113.237ms for 8.28504e+07

    on melete05 (not so old) I get:

    sqrt took 135.887ms for 7.45654e+07
    copy took 68.6313ms for 8.28504e+07

    the float number at the end is just there to avoid a too aggressive compiler optimizing away the calculations because the results are never used.

    The sqrt instructions used on the two systems end up being sqrtpd (%rdi,%rax,1),%xmm0 and vsqrtpd (%rdi,%rax,1),%ymm0 respectively, also displaying the age of my workstation.
    Erik Schnetter
    @eschnett
    interesting. i am surprised.
    how many numbers were there? one grid function (38^3) of such numbers?
    Roland Haas
    @rhaas80
    2**26 doubles so about 64million doubles
    memory from posix_memalign (64 bytes) and telling the compiler that things are aligned (though I forgot to tell it that the array size is a multiple of say 8 doubles as well so there is peel off code at the end of the loop). Those numbers are for a omp simd'ed loop ever for the straight copy since I found that that was (suprisingly) faster than memcpy.

    Note that this is value dependent. If I fill the input array a with only 0.0 values instead of 1.234567 then numbers change to:

    sqrt took 110.065ms for 0
    copy took 114.57ms for 0

    and I surely hope that the 4ms difference is just random noise.

    Erik Schnetter
    @eschnett
    it gets slower if you use zeros?
    Roland Haas
    @rhaas80
    It got faster if I use zeros. The sqrt goes from 400ms to 110ms. The copy was 1ms slower than before but that is most likley just noise (I ran this on my workstation which has other things to do).
    my guess (I could cite stackoverflow but do not really think it a citable source of information :-) ) is that sqrt;s rutime depends eg on the number of bits set in the float's mantissa (smililar to div).
    and a sqrt(0) may just short-circuit in the CPU to be just a copy (I have checked that even with all 0. data in the array the function does still have sqrt instructions in the object file, ie that the compiler does not magicallly, ignoring all -O0 and -fno-lto options I passed optimized across function and object boundaries).
    Erik Schnetter
    @eschnett
    again: i did not know this...