Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
  • Dec 15 2018 14:41
    safford93 starred usnistgov/hiperc
  • Dec 12 2018 18:08

    tkphd on master

    Doxygen moved (compare)

  • Oct 26 2018 23:06
  • Aug 23 2018 14:49
    wme7 forked
    wme7/hiperc
  • Aug 23 2018 14:49
    wme7 starred usnistgov/hiperc
  • Jul 30 2018 10:57
    avinashprabhu starred usnistgov/hiperc
  • Jul 27 2018 17:40
    tkphd opened #138
  • Jul 27 2018 17:39
    tkphd opened #137
  • Jul 27 2018 17:39
    tkphd labeled #137
  • Jul 27 2018 17:39
    tkphd labeled #137
  • Jul 11 2018 02:30
    ablekh starred usnistgov/hiperc
  • Jul 10 2018 18:36

    tkphd on master

    merged find command (compare)

  • Jul 10 2018 18:29

    tkphd on master

    link checks with Travis fix broken links Merge branch 'linting' (compare)

  • Jul 02 2018 16:21
    richardotis starred usnistgov/hiperc
  • Jun 28 2018 17:12
    ritajitk starred usnistgov/hiperc
  • Apr 26 2018 17:37
    tkphd opened #136
  • Mar 30 2018 08:42
    hellabyte starred usnistgov/hiperc
  • Feb 20 2018 21:50

    tkphd on manufactured-solutions

    fix kappa in script (compare)

  • Feb 19 2018 08:56

    tkphd on manufactured-solutions

    prettier truncation label starting temporal study (compare)

  • Feb 19 2018 07:56

    tkphd on manufactured-solutions

    tracking down SymPy source gene… clean-compiling Benchmark 7 for… improved output and 2 more (compare)

A. M. Jokisaari
@amjokisaari
aaahhh hrm. so I launched an interactive job via $ srun --pty -p knlall -t 1:00:00 /bin/bash
i will do it again to see if i can find out what the defaults are on the node.
ah. $OMP_NUM_THREADS was not defined. I have no idea what slurm will do for the interactive KNL job
Trevor Keller
@tkphd
OK... that should reserve the whole node for your use. When you log in, type "echo $OMP_NUM_THREADS". It should give a blank line, meaning the variable is not set. That would take all available cores, though I'm not sure whether it takes KNL hyperthreading into account.
Andrew Reid
@reid-a
Some (all?) KNL devices have up to four-way hyperthreading, so 25600% is theoretically achievable if you're very cache-friendly and have zero pipline stalls.
Pipeline.
Trevor Keller
@tkphd
OK, if you have the patience, please export OMP_NUM_THREADS=32; make run; tail runlog.csv, then export OMP_NUM_THREADS=64; make run; tail runlog.csv
A. M. Jokisaari
@amjokisaari
rofl. if I have the patience. :+1:
way ahead of you
Trevor Keller
@tkphd
Oh... touche, @reid-a. CPU=25600% means 256 cores, not 25; and the program is tiny, so I am indeed cache friendly.
A. M. Jokisaari
@amjokisaari
oh kriky. bebop instructions specifically say to limit to 128 or we might crash the nodes
i think bebop is still in sort of the shakedown stage.
Trevor Keller
@tkphd
Oh, come on. Crash it for science!
A. M. Jokisaari
@amjokisaari
DO IT FOR SCIENCE, MORTY
Trevor Keller
@tkphd
Haha!
A. M. Jokisaari
@amjokisaari

ok, with OMP_NUM_THREADS=32, we get

[jokisaar@knl-0193 phi-openmp-diffusion]$ tail runlog.csv
10000,10000.000000,0.000286,9.274177,2.227248,0.046050,0.101204,11.964242
20000,20000.000000,0.000574,18.508916,4.367604,0.068953,0.105850,23.679478
30000,30000.000000,0.000863,27.712661,6.503964,0.091987,0.110746,35.357514
40000,40000.000000,0.001152,36.919618,8.641958,0.116303,0.115921,47.044215
50000,50000.000000,0.001442,46.118503,10.776314,0.141849,0.121364,58.719061
60000,60000.000000,0.001732,55.324621,12.910489,0.168610,0.127085,70.402023
70000,70000.000000,0.002023,64.535966,15.046884,0.195611,0.132956,82.093101
80000,80000.000000,0.002313,73.741281,17.182138,0.223681,0.138957,93.778446
90000,90000.000000,0.002604,82.992712,19.341044,0.252895,0.144961,105.537492
100000,100000.000000,0.002895,92.192959,21.475859,0.282898,0.150975,117.233624

and that gave me a cpu% of 3200
Trevor Keller
@tkphd
Meta: To disable notifications for this chatroom, please click the slider-looking button next to your avatar in the top-right, select "Notifications", and override to the desired verbosity.
A. M. Jokisaari
@amjokisaari

with OMP_NUM_THREADS=64, we g et

[jokisaar@knl-0193 phi-openmp-diffusion]$ tail runlog.csv
10000,10000.000000,0.000286,4.923778,1.254317,0.040954,0.003207,6.556908
20000,20000.000000,0.000574,9.829164,2.421580,0.062855,0.005595,12.986660
30000,30000.000000,0.000863,14.759755,3.597916,0.085913,0.008138,19.450536
40000,40000.000000,0.001152,19.660510,4.761356,0.110065,0.010777,25.874405
50000,50000.000000,0.001442,24.572766,5.923231,0.136151,0.013556,32.307354
60000,60000.000000,0.001732,29.458316,7.080866,0.163546,0.016455,38.710936
70000,70000.000000,0.002023,34.351207,8.243031,0.190764,0.019482,45.127532
80000,80000.000000,0.002313,39.262969,9.408731,0.219527,0.022538,51.568219
90000,90000.000000,0.002604,44.163932,10.573124,0.248994,0.025615,57.997612
100000,100000.000000,0.002895,49.052405,11.732975,0.278920,0.028673,64.411918

and a cpu% of 6400

Trevor Keller
@tkphd
ooh, pretty good scaling!
A. M. Jokisaari
@amjokisaari
would you like any additional data?
i could do 8...
or 128
Trevor Keller
@tkphd
Yes, please repeat with either unset OMP_NUM_THREADS or OMP_NUM_THREADS=128 (the first might crash your node), then tail runlog?
wait, you already did
A. M. Jokisaari
@amjokisaari
I did the unset one. now I'm doing 128.
Trevor Keller
@tkphd
256 threads gave 61.58906 seconds (bottom-right value)
Thanks!
A. M. Jokisaari
@amjokisaari

OMP_NUM_THREADS=128, gives me a %cpu of 12800 and this result:

[jokisaar@knl-0193 phi-openmp-diffusion]$ tail runlog.csv
10000,10000.000000,0.000286,4.699317,1.313448,0.079539,0.003133,6.530344
20000,20000.000000,0.000574,9.336970,2.435180,0.119604,0.004645,12.760757
30000,30000.000000,0.000863,13.947611,3.561319,0.161630,0.006270,18.970599
40000,40000.000000,0.001152,18.538394,4.700199,0.208136,0.008267,25.179616
50000,50000.000000,0.001442,23.114247,5.820901,0.254010,0.010547,31.355529
60000,60000.000000,0.001732,27.668618,6.928849,0.689720,0.013254,37.888158
70000,70000.000000,0.002023,32.211126,8.041696,0.737063,0.015686,44.023515
80000,80000.000000,0.002313,36.814283,9.161558,0.787185,0.017582,50.228136
90000,90000.000000,0.002604,41.385938,10.275205,0.838038,0.019464,56.394135
100000,100000.000000,0.002895,45.994756,11.394499,0.890074,0.021359,62.605318

looks like 64 is good but 128 is no better for this problem.
Trevor Keller
@tkphd
Sure does.
So, hyperthreading does not appear to help the code as written.
A. M. Jokisaari
@amjokisaari
i wonder if the problem is too small?
running into overhead issues or i/o is bottlnecking or something?
Trevor Keller
@tkphd
We've reached the edge of my knowledge here. Possibly pipelining, as @reid-a suggested. A larger problem might make better use of the system, but there are other optimizations to be done as well. I'll do more reading and either make changes or document how to profile these machines. Thanks for your help!
A. M. Jokisaari
@amjokisaari
sure, let me know if/when you'd like me to test an update.
Trevor Keller
@tkphd
Will do.
Trevor Keller
@tkphd
Hi everyone, just want to let you know that the OpenCL diffusion code is running correctly. There are a couple of open issues, but I consider them optimizations rather than bugs. I'll be starting on spinodal decomposition benchmarks next week.
Trevor Keller
@tkphd
Implicitly, I meant to say that this marks the end of the first round of platform implementations: all the platforms I wanted to test have correct implementations of the diffusion equation, producing identical results to the serial code using whatever parallelism model and hardware are of interest. This means I can generate and publish benchmarking results, and move on to either (a) optimization of the diffusion code, or more likely (b) porting the diffusion codes into the next stage of modeling complexity, which is spinodal decomposition as defined by the CHiMaD Phase Field group. I would note that the OpenACC (#49) and KNL (#107) implementations would benefit from optimization, if anybody has interest in tackling them.
Andrew Reid
@reid-a
Next up: FPGAs!
Trevor Keller
@tkphd
Maybe OpenCL already has that covered? On the software side, anyway. I'm sure you & I can figure out the hardware config :grinning:
Trevor Keller
@tkphd
Interesting new paper comparing image convolution (i.e., one step of the diffusion algorithm) on KNL using three different software platforms: https://arxiv.org/abs/1711.09791. They achieve good scaling on large images, so it is possible to saturate all available threads.
Trevor Keller
@tkphd
Well, hey now! A friend in NIST ITL gave me access to their KNC and KNL testbeds. Things could get interesting.
Trevor Keller
@tkphd
Due to the lapse in funding and subsequent shutdown of our government, the Federal employees among us are furloughed until funding is restored. We will respond to open discussions after NIST re-opens for business.
Trevor Keller
@tkphd
... and we're back :smiley:
Looks interesting
Haskell rules
Daniel Wheeler
@wd15
Paper to go with the code: https://arxiv.org/pdf/1204.4779.pdf
Trevor Keller
@tkphd
Looks interesting, but, also looks abandoned :worried: