Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
    bpetit
    @bpetit
    The last data I got from an IT professionals working group on measurement (boavizta project, I can invite you if you are interested) is that on general case servers CPU represents at least 90% of the power usage of the machine (fans, supply loss, included, etc.)
    (that was data from machines built in the last decade iirc)
    but this is surely less true with ML use cases or mining bitcoin... ^^
    René Ribaud
    @uggla
    yes GPU will greatly impact this I guess.
    bpetit
    @bpetit
    there is a project on open sourcing the researches of this group : https://github.com/Boavizta/environmental-footprint-data (not live yet, just a bunch of ressources)
    maybe we could allow scaphandre to send relevant data in a research-focussed database, containing specs of hardware, to get to more reliable and accurate data about what consumes what on a given machine
    René Ribaud
    @uggla
    I worked a bit an Redfish (BMC management interface) I will have a look if there is info about power. I know there is info about power supply but probably not consumption.
    By the way I have good contacts with HPE solution center (I worked for them before). They have a lot of server hardware, maybe they can give a bit of help regarding consumption and possible measure.
    bpetit
    @bpetit
    That would be great. Maybe we should synchronize with boavizta on that, as they are actually working on classifying the hardware by consumption "profiles"
    Pierre Rust
    @PierreRust
    on the GPU / AI side, there is codecarbon https://github.com/mlco2/codecarbon which works on that. However, they don't do any splitting of the energy consumption of the CPU or GPU
    2 replies
    @uggla Redfish gives the consumption of the whole server, and there is a prm exporter for that, we are using it to monitor a set of serveurs
    René Ribaud
    @uggla
    @PierreRust good to know.
    Florimond Manca
    @florimondmanca
    This got mentioned in a Python newsletter I follow... RAPL applies to individual functions in code: https://github.com/powerapi-ng/pyRAPL This particular repo seems related to the PowerAPI research teams. I find it somewhat interesting that things related to sustainability (even from 1000ft afar and without a clear “why”) start popping up in programming newsletters like this...
    Applied*
    bpetit
    @bpetit
    Thanks ! what's the newsletter ? I've looked at that project at some point. It's somewhat comparable to code carbon afaik
    bpetit
    @bpetit
    René Ribaud
    @uggla
    I'm gonna have a look despite I'm not an expoert.
    René Ribaud
    @uggla
    There is a double and here : he can collect the metrics in different ways (sensors) and and
    measuring the power consumption of the virtual machines running on the host (working for Qemu/KVM so far) --> you should add a note saying that a "file" must be shared between host and guest.
    René Ribaud
    @uggla
    maybe a { that explains current problems to have a accurate measure (consumption measure in puclic cloud where the above file can not be shared, GPU, other devices not related with cpu). However you can say that even if the measure is not fully accurate it allows to have a good idea, compare different solutions, find ways to reduce consumption. In any case it is better than no measure at all. And of course we are working to improve it and people can join to help.
    René Ribaud
    @uggla
    You can also say that energy measurement was not really a concern especially in the DC, so hw are lacking a standard sensor solution to measure that. That's just disappointing to not be able to have device consumption of all hw parts in a standard/common way.
    bpetit
    @bpetit
    Thanks for the feedback !
    I get the ideas but I'm not sure about how to talk about those without going into too much details. (I'd like to keep the post short, and maybe write more detailled ones later)
    I fixed the typo btw, thanks !
    René Ribaud
    @uggla
    no pb, this is just suggestion, the main idea is to not give too much expectation to users as there are limitations to measure the consumption. And this is mainly due to hw.
    bpetit
    @bpetit
    I'm not that afraid to raise expectations, as it's pretty clear it's a moving project and that it's perfectible and open to contributions. The idea is also to create some impulse around those topics and try to make more people interested in contributing. So if they think something is possible and it's not yet or won't be. Maybe they'll leave, but they'll have a better idea of the topic, which could create movement later. And in the best case scenario, they may contribute :)
    I mean almost everything is virtually possible (on those topics at least), it's more about how much time and workforce is allocated to make it happen.
    René Ribaud
    @uggla
    ok anyway the blog post is good, so no worries.
    bpetit
    @bpetit
    thanks for the feedback and thanks for the ideas, it make me think more precisely about the next posts anyway :)
    Does someone have an idea on hubblo-org/scaphandre#59 ?
    I mean the person reporting can't load the intel_rapl kernel modules but sees files related to it in some folders named /lib/modules/5.4.0-62-generic/modules.order:XXXX:kernel
    on my machine (same OS but more recent kernel, the files are in:
    /lib/modules/5.8.0-38-generic/kernel/arch/x86/events/rapl.ko
    /lib/modules/5.8.0-38-generic/kernel/drivers/powercap/intel_rapl_common.ko
    /lib/modules/5.8.0-38-generic/kernel/drivers/powercap/intel_rapl_msr.ko
    I don't know what the modules.order:XXXXX folders are coming from
    Could this be related to some DKMS magic or something ?
    bpetit
    @bpetit
    I decided to ask you once I started reading patches in the kernel mailing list :laughing:
    Time to stop digging my own grave ! :D
    René Ribaud
    @uggla
    The guy seems to have a prompt with a '$'. Maybe stupid, but is he root ?
    bpetit
    @bpetit
    I didn't check but the Module rapl not found in directory error makes me think more about a link or reference missing. I have the same one if I try to load an inexistent module.
    René Ribaud
    @uggla
    yes I agree permission should rise a different msg.
    To my mind, it looks like he doesn't have the module file
    the locate does not output .ko files. (maybe not up2date)
    maybe the rapl is part of another pkg not installed can you check on your system ?
    bpetit
    @bpetit
    I did, and it was available by default, just not loaded. But I guess it can be different if you configure your distribution installation differently.
    René Ribaud
    @uggla
    he has not the same kernel as you older distro ?
    bpetit
    @bpetit
    same distro but he has an older kernel
    maybe he's using a kernel he compiled ? and didn't embed those drivers ?
    René Ribaud
    @uggla
    $ find /lib/modules/$(uname -r) -type f -name '*.ko' | grep rapl
    $
    files missing. unless wrong path
    so yes maybe he recompile the kernel and does not compile this module
    bpetit
    @bpetit
    I asked him, let's see.
    René Ribaud
    @uggla
    seems to be in extra before
    image.png
    I think he just need to install linux-modules-extra-5.4.0-26-generic