Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
  • Dec 15 2018 14:41
    safford93 starred usnistgov/hiperc
  • Dec 12 2018 18:08

    tkphd on master

    Doxygen moved (compare)

  • Oct 26 2018 23:06
  • Aug 23 2018 14:49
    wme7 forked
    wme7/hiperc
  • Aug 23 2018 14:49
    wme7 starred usnistgov/hiperc
  • Jul 30 2018 10:57
    avinashprabhu starred usnistgov/hiperc
  • Jul 27 2018 17:40
    tkphd opened #138
  • Jul 27 2018 17:39
    tkphd opened #137
  • Jul 27 2018 17:39
    tkphd labeled #137
  • Jul 27 2018 17:39
    tkphd labeled #137
  • Jul 11 2018 02:30
    ablekh starred usnistgov/hiperc
  • Jul 10 2018 18:36

    tkphd on master

    merged find command (compare)

  • Jul 10 2018 18:29

    tkphd on master

    link checks with Travis fix broken links Merge branch 'linting' (compare)

  • Jul 02 2018 16:21
    richardotis starred usnistgov/hiperc
  • Jun 28 2018 17:12
    ritajitk starred usnistgov/hiperc
  • Apr 26 2018 17:37
    tkphd opened #136
  • Mar 30 2018 08:42
    hellabyte starred usnistgov/hiperc
  • Feb 20 2018 21:50

    tkphd on manufactured-solutions

    fix kappa in script (compare)

  • Feb 19 2018 08:56

    tkphd on manufactured-solutions

    prettier truncation label starting temporal study (compare)

  • Feb 19 2018 07:56

    tkphd on manufactured-solutions

    tracking down SymPy source gene… clean-compiling Benchmark 7 for… improved output and 2 more (compare)

Dan Lewis
@lucentdan
Think this will be useful for the new generation of HPC planned at RPI 2018+
Trevor Keller
@tkphd
No problem, @lucentdan. Welcome!
A. M. Jokisaari
@amjokisaari

ok. The diffusion code runs on KNL!

runlog.csv results:

iter sim_time wrss conv_time step_time IO_time soln_time run_time
0 0 0 0 0.188137 0.057628 0 0.246949
10000 10000 0.000286 4.493929 1.365069 0.12165 0.005621 6.65671
20000 20000 0.000574 8.895637 2.39418 0.187032 0.006831 12.781632
30000 30000 0.000863 13.398053 3.401486 0.255395 0.008045 19.002456
40000 40000 0.001152 17.789476 4.41478 0.327418 0.009311 25.117928
50000 50000 0.001442 22.126154 5.438066 0.402769 0.012329 31.182672
60000 60000 0.001732 26.484548 6.458279 0.478117 0.013839 37.286988
70000 70000 0.002023 30.873932 7.447651 0.555889 0.015224 43.361118
80000 80000 0.002313 35.321395 8.449465 0.635728 0.016628 49.513117
90000 90000 0.002604 39.700359 9.443941 0.720135 0.018102 55.588724
100000 100000 0.002895 44.014251 10.427562 0.803771 0.019487 61.58906

diffusion.0100000.png
that's the final result.

I'm watching the output of top while running this, and I'm getting

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
22566 jokisaar 20 0 17.009g 35144 2232 R 25600 0.0 154:24.30 diffusion

Andrew Reid
@reid-a
Where's the +1 button on this thing?
+1!
Trevor Keller
@tkphd
PID PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
22566 20 0 17.009g 35144 2232 R 25600 0.0 154:24.30 diffusion
@reid-a type :+1:
Andrew Reid
@reid-a
:+1:
A. M. Jokisaari
@amjokisaari
@tkphd , ooh formatting. I shall endeavor to do that the next time
Trevor Keller
@tkphd
:+1:
A. M. Jokisaari
@amjokisaari
dumb me, but how do I interpret that %CPU? That's indicating threaded running, right? (mpi would give multiple lines in top)
Trevor Keller
@tkphd
Thank you so much for the KNL time and data! Looks like my code is 25.6/60=42% efficient, so we'll iterate :smile:
A. M. Jokisaari
@amjokisaari
haha. you are welcome! looking forward to further testing and really seeing how the phase field benchmarks work too!
Trevor Keller
@tkphd
Yes, CPU of 100% is one core, 25.6e3 is 25 cores, 60e3 would be 100% load.
A. M. Jokisaari
@amjokisaari
so these KNL nodes have 64 cores.
Trevor Keller
@tkphd
Right, just looked that up. 64e3 would be 100% load.
So my efficiency is only 40%.
A. M. Jokisaari
@amjokisaari
ok. so it was running 32.
Trevor Keller
@tkphd
Oh... interesting.
A. M. Jokisaari
@amjokisaari
right? if it's 25600%?
i added a github issue to include somewhere easily visible in the documentation how to specify the # of cores you want KNL to use.
Trevor Keller
@tkphd
Well, maybe. Depends how you launched it, and whether $OMP_NUM_THREADS was defined or not, and whether the core count was restricted by SLURM.
OpenMP takes all of them by default, which is the behavior I want. But yes, I will happily comment on that in the docs.
A. M. Jokisaari
@amjokisaari
aaahhh hrm. so I launched an interactive job via $ srun --pty -p knlall -t 1:00:00 /bin/bash
i will do it again to see if i can find out what the defaults are on the node.
ah. $OMP_NUM_THREADS was not defined. I have no idea what slurm will do for the interactive KNL job
Trevor Keller
@tkphd
OK... that should reserve the whole node for your use. When you log in, type "echo $OMP_NUM_THREADS". It should give a blank line, meaning the variable is not set. That would take all available cores, though I'm not sure whether it takes KNL hyperthreading into account.
Andrew Reid
@reid-a
Some (all?) KNL devices have up to four-way hyperthreading, so 25600% is theoretically achievable if you're very cache-friendly and have zero pipline stalls.
Pipeline.
Trevor Keller
@tkphd
OK, if you have the patience, please export OMP_NUM_THREADS=32; make run; tail runlog.csv, then export OMP_NUM_THREADS=64; make run; tail runlog.csv
A. M. Jokisaari
@amjokisaari
rofl. if I have the patience. :+1:
way ahead of you
Trevor Keller
@tkphd
Oh... touche, @reid-a. CPU=25600% means 256 cores, not 25; and the program is tiny, so I am indeed cache friendly.
A. M. Jokisaari
@amjokisaari
oh kriky. bebop instructions specifically say to limit to 128 or we might crash the nodes
i think bebop is still in sort of the shakedown stage.
Trevor Keller
@tkphd
Oh, come on. Crash it for science!
A. M. Jokisaari
@amjokisaari
DO IT FOR SCIENCE, MORTY
Trevor Keller
@tkphd
Haha!
A. M. Jokisaari
@amjokisaari

ok, with OMP_NUM_THREADS=32, we get

[jokisaar@knl-0193 phi-openmp-diffusion]$ tail runlog.csv
10000,10000.000000,0.000286,9.274177,2.227248,0.046050,0.101204,11.964242
20000,20000.000000,0.000574,18.508916,4.367604,0.068953,0.105850,23.679478
30000,30000.000000,0.000863,27.712661,6.503964,0.091987,0.110746,35.357514
40000,40000.000000,0.001152,36.919618,8.641958,0.116303,0.115921,47.044215
50000,50000.000000,0.001442,46.118503,10.776314,0.141849,0.121364,58.719061
60000,60000.000000,0.001732,55.324621,12.910489,0.168610,0.127085,70.402023
70000,70000.000000,0.002023,64.535966,15.046884,0.195611,0.132956,82.093101
80000,80000.000000,0.002313,73.741281,17.182138,0.223681,0.138957,93.778446
90000,90000.000000,0.002604,82.992712,19.341044,0.252895,0.144961,105.537492
100000,100000.000000,0.002895,92.192959,21.475859,0.282898,0.150975,117.233624

and that gave me a cpu% of 3200
Trevor Keller
@tkphd
Meta: To disable notifications for this chatroom, please click the slider-looking button next to your avatar in the top-right, select "Notifications", and override to the desired verbosity.
A. M. Jokisaari
@amjokisaari

with OMP_NUM_THREADS=64, we g et

[jokisaar@knl-0193 phi-openmp-diffusion]$ tail runlog.csv
10000,10000.000000,0.000286,4.923778,1.254317,0.040954,0.003207,6.556908
20000,20000.000000,0.000574,9.829164,2.421580,0.062855,0.005595,12.986660
30000,30000.000000,0.000863,14.759755,3.597916,0.085913,0.008138,19.450536
40000,40000.000000,0.001152,19.660510,4.761356,0.110065,0.010777,25.874405
50000,50000.000000,0.001442,24.572766,5.923231,0.136151,0.013556,32.307354
60000,60000.000000,0.001732,29.458316,7.080866,0.163546,0.016455,38.710936
70000,70000.000000,0.002023,34.351207,8.243031,0.190764,0.019482,45.127532
80000,80000.000000,0.002313,39.262969,9.408731,0.219527,0.022538,51.568219
90000,90000.000000,0.002604,44.163932,10.573124,0.248994,0.025615,57.997612
100000,100000.000000,0.002895,49.052405,11.732975,0.278920,0.028673,64.411918

and a cpu% of 6400

Trevor Keller
@tkphd
ooh, pretty good scaling!
A. M. Jokisaari
@amjokisaari
would you like any additional data?
i could do 8...
or 128
Trevor Keller
@tkphd
Yes, please repeat with either unset OMP_NUM_THREADS or OMP_NUM_THREADS=128 (the first might crash your node), then tail runlog?