Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
  • Dec 15 2018 14:41
    safford93 starred usnistgov/hiperc
  • Dec 12 2018 18:08

    tkphd on master

    Doxygen moved (compare)

  • Oct 26 2018 23:06
  • Aug 23 2018 14:49
    wme7 forked
    wme7/hiperc
  • Aug 23 2018 14:49
    wme7 starred usnistgov/hiperc
  • Jul 30 2018 10:57
    avinashprabhu starred usnistgov/hiperc
  • Jul 27 2018 17:40
    tkphd opened #138
  • Jul 27 2018 17:39
    tkphd opened #137
  • Jul 27 2018 17:39
    tkphd labeled #137
  • Jul 27 2018 17:39
    tkphd labeled #137
  • Jul 11 2018 02:30
    ablekh starred usnistgov/hiperc
  • Jul 10 2018 18:36

    tkphd on master

    merged find command (compare)

  • Jul 10 2018 18:29

    tkphd on master

    link checks with Travis fix broken links Merge branch 'linting' (compare)

  • Jul 02 2018 16:21
    richardotis starred usnistgov/hiperc
  • Jun 28 2018 17:12
    ritajitk starred usnistgov/hiperc
  • Apr 26 2018 17:37
    tkphd opened #136
  • Mar 30 2018 08:42
    hellabyte starred usnistgov/hiperc
  • Feb 20 2018 21:50

    tkphd on manufactured-solutions

    fix kappa in script (compare)

  • Feb 19 2018 08:56

    tkphd on manufactured-solutions

    prettier truncation label starting temporal study (compare)

  • Feb 19 2018 07:56

    tkphd on manufactured-solutions

    tracking down SymPy source gene… clean-compiling Benchmark 7 for… improved output and 2 more (compare)

Trevor Keller
@tkphd
@reid-a type :+1:
Andrew Reid
@reid-a
:+1:
A. M. Jokisaari
@amjokisaari
@tkphd , ooh formatting. I shall endeavor to do that the next time
Trevor Keller
@tkphd
:+1:
A. M. Jokisaari
@amjokisaari
dumb me, but how do I interpret that %CPU? That's indicating threaded running, right? (mpi would give multiple lines in top)
Trevor Keller
@tkphd
Thank you so much for the KNL time and data! Looks like my code is 25.6/60=42% efficient, so we'll iterate :smile:
A. M. Jokisaari
@amjokisaari
haha. you are welcome! looking forward to further testing and really seeing how the phase field benchmarks work too!
Trevor Keller
@tkphd
Yes, CPU of 100% is one core, 25.6e3 is 25 cores, 60e3 would be 100% load.
A. M. Jokisaari
@amjokisaari
so these KNL nodes have 64 cores.
Trevor Keller
@tkphd
Right, just looked that up. 64e3 would be 100% load.
So my efficiency is only 40%.
A. M. Jokisaari
@amjokisaari
ok. so it was running 32.
Trevor Keller
@tkphd
Oh... interesting.
A. M. Jokisaari
@amjokisaari
right? if it's 25600%?
i added a github issue to include somewhere easily visible in the documentation how to specify the # of cores you want KNL to use.
Trevor Keller
@tkphd
Well, maybe. Depends how you launched it, and whether $OMP_NUM_THREADS was defined or not, and whether the core count was restricted by SLURM.
OpenMP takes all of them by default, which is the behavior I want. But yes, I will happily comment on that in the docs.
A. M. Jokisaari
@amjokisaari
aaahhh hrm. so I launched an interactive job via $ srun --pty -p knlall -t 1:00:00 /bin/bash
i will do it again to see if i can find out what the defaults are on the node.
ah. $OMP_NUM_THREADS was not defined. I have no idea what slurm will do for the interactive KNL job
Trevor Keller
@tkphd
OK... that should reserve the whole node for your use. When you log in, type "echo $OMP_NUM_THREADS". It should give a blank line, meaning the variable is not set. That would take all available cores, though I'm not sure whether it takes KNL hyperthreading into account.
Andrew Reid
@reid-a
Some (all?) KNL devices have up to four-way hyperthreading, so 25600% is theoretically achievable if you're very cache-friendly and have zero pipline stalls.
Pipeline.
Trevor Keller
@tkphd
OK, if you have the patience, please export OMP_NUM_THREADS=32; make run; tail runlog.csv, then export OMP_NUM_THREADS=64; make run; tail runlog.csv
A. M. Jokisaari
@amjokisaari
rofl. if I have the patience. :+1:
way ahead of you
Trevor Keller
@tkphd
Oh... touche, @reid-a. CPU=25600% means 256 cores, not 25; and the program is tiny, so I am indeed cache friendly.
A. M. Jokisaari
@amjokisaari
oh kriky. bebop instructions specifically say to limit to 128 or we might crash the nodes
i think bebop is still in sort of the shakedown stage.
Trevor Keller
@tkphd
Oh, come on. Crash it for science!
A. M. Jokisaari
@amjokisaari
DO IT FOR SCIENCE, MORTY
Trevor Keller
@tkphd
Haha!
A. M. Jokisaari
@amjokisaari

ok, with OMP_NUM_THREADS=32, we get

[jokisaar@knl-0193 phi-openmp-diffusion]$ tail runlog.csv
10000,10000.000000,0.000286,9.274177,2.227248,0.046050,0.101204,11.964242
20000,20000.000000,0.000574,18.508916,4.367604,0.068953,0.105850,23.679478
30000,30000.000000,0.000863,27.712661,6.503964,0.091987,0.110746,35.357514
40000,40000.000000,0.001152,36.919618,8.641958,0.116303,0.115921,47.044215
50000,50000.000000,0.001442,46.118503,10.776314,0.141849,0.121364,58.719061
60000,60000.000000,0.001732,55.324621,12.910489,0.168610,0.127085,70.402023
70000,70000.000000,0.002023,64.535966,15.046884,0.195611,0.132956,82.093101
80000,80000.000000,0.002313,73.741281,17.182138,0.223681,0.138957,93.778446
90000,90000.000000,0.002604,82.992712,19.341044,0.252895,0.144961,105.537492
100000,100000.000000,0.002895,92.192959,21.475859,0.282898,0.150975,117.233624

and that gave me a cpu% of 3200
Trevor Keller
@tkphd
Meta: To disable notifications for this chatroom, please click the slider-looking button next to your avatar in the top-right, select "Notifications", and override to the desired verbosity.
A. M. Jokisaari
@amjokisaari

with OMP_NUM_THREADS=64, we g et

[jokisaar@knl-0193 phi-openmp-diffusion]$ tail runlog.csv
10000,10000.000000,0.000286,4.923778,1.254317,0.040954,0.003207,6.556908
20000,20000.000000,0.000574,9.829164,2.421580,0.062855,0.005595,12.986660
30000,30000.000000,0.000863,14.759755,3.597916,0.085913,0.008138,19.450536
40000,40000.000000,0.001152,19.660510,4.761356,0.110065,0.010777,25.874405
50000,50000.000000,0.001442,24.572766,5.923231,0.136151,0.013556,32.307354
60000,60000.000000,0.001732,29.458316,7.080866,0.163546,0.016455,38.710936
70000,70000.000000,0.002023,34.351207,8.243031,0.190764,0.019482,45.127532
80000,80000.000000,0.002313,39.262969,9.408731,0.219527,0.022538,51.568219
90000,90000.000000,0.002604,44.163932,10.573124,0.248994,0.025615,57.997612
100000,100000.000000,0.002895,49.052405,11.732975,0.278920,0.028673,64.411918

and a cpu% of 6400

Trevor Keller
@tkphd
ooh, pretty good scaling!
A. M. Jokisaari
@amjokisaari
would you like any additional data?
i could do 8...
or 128
Trevor Keller
@tkphd
Yes, please repeat with either unset OMP_NUM_THREADS or OMP_NUM_THREADS=128 (the first might crash your node), then tail runlog?
wait, you already did
A. M. Jokisaari
@amjokisaari
I did the unset one. now I'm doing 128.
Trevor Keller
@tkphd
256 threads gave 61.58906 seconds (bottom-right value)
Thanks!
A. M. Jokisaari
@amjokisaari

OMP_NUM_THREADS=128, gives me a %cpu of 12800 and this result:

[jokisaar@knl-0193 phi-openmp-diffusion]$ tail runlog.csv
10000,10000.000000,0.000286,4.699317,1.313448,0.079539,0.003133,6.530344
20000,20000.000000,0.000574,9.336970,2.435180,0.119604,0.004645,12.760757
30000,30000.000000,0.000863,13.947611,3.561319,0.161630,0.006270,18.970599
40000,40000.000000,0.001152,18.538394,4.700199,0.208136,0.008267,25.179616
50000,50000.000000,0.001442,23.114247,5.820901,0.254010,0.010547,31.355529
60000,60000.000000,0.001732,27.668618,6.928849,0.689720,0.013254,37.888158
70000,70000.000000,0.002023,32.211126,8.041696,0.737063,0.015686,44.023515
80000,80000.000000,0.002313,36.814283,9.161558,0.787185,0.017582,50.228136
90000,90000.000000,0.002604,41.385938,10.275205,0.838038,0.019464,56.394135
100000,100000.000000,0.002895,45.994756,11.394499,0.890074,0.021359,62.605318

looks like 64 is good but 128 is no better for this problem.
Trevor Keller
@tkphd
Sure does.
So, hyperthreading does not appear to help the code as written.
A. M. Jokisaari
@amjokisaari
i wonder if the problem is too small?