Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
  • 17:57
    johnsonw synchronize #2395
  • 17:57

    johnsonw on fix-unit-tests

    start fixing unit tests Signed… Updates. Tests now pass. Signe… fix errors after rebasing. Need… and 6 more (compare)

  • 17:53
    jgrund synchronize #2366
  • 17:53

    jgrund on remove-fs-detect

    Refactor fs detection Refactor… (compare)

  • 17:37
    nlinker synchronize #2404
  • 17:37

    nlinker on EX-1687-graphql-jobs

    Trying to use strum Signed-off… (compare)

  • 16:55
    jgrund commented #2388
  • 16:53
    jgrund synchronize #2366
  • 16:53

    jgrund on remove-fs-detect

    Refactor fs detection Refactor… (compare)

  • 16:12
    mkpankov edited #2388
  • 15:16
    johnsonw commented #2395
  • 15:08
    jgrund edited #2405
  • 15:08
    utopiabound labeled #2405
  • 15:08
    utopiabound review_requested #2405
  • 15:08
    utopiabound assigned #2405
  • 15:08
    utopiabound review_requested #2405
  • 15:08
    utopiabound opened #2405
  • 15:07

    utopiabound on agent-clippy-warnings

    agent: Fix clippy warnings Sig… (compare)

  • 15:04
    johnsonw synchronize #2395
  • 15:04

    johnsonw on fix-unit-tests

    Delete unnecessary tests Signe… (compare)

Joe Grund
@jgrund
yeah, I’ve read that you need to copy files to do that
Brian J. Murrell
@brianjmurrell
or you can do it without a folder with a rename, copy, rm. it's not atomic though, obviously. it would be an interesting feature to be able to transparently re-stripe files, in place so that any processes that have it open don't notice the way they would with mv/cp/rm. i'm sure it's not an enhancement that has not been thought of.
yguvvala
@yguvvala
@uberlinuxguy i think as the capacity you are re-striping by copying is a longer process.
Jason Williams
@uberlinuxguy
Quick question: All of my realtime graphs in IML have suddenly stopped working.
Do you all know where I might look to figure out why they look like this now?
image.png
Will Johnson
@johnsonw
Hi @yguvvala and @uberlinuxguy, can you open a support issue in Jira?
Jason Williams
@uberlinuxguy
Ok, sounds good.
Zeeshan Ali Shah
@zeeshanali
For new Lustre cluster what is the big difference in installing it via ILM or manual , we have 6 OSS with 5 OST each , 2 MDS , 2 LNET router setup of about 10PB ..
and wil ILM also configure zfspool over OSTs or that have to be done first manually ?
Tom Nabarro
@tanabarr
currently zfspools have to be configured manually, then they can be used as volumes to create the OSTs and other targets on
Zeeshan Ali Shah
@zeeshanali
thanks Tom
Zeeshan Ali Shah
@zeeshanali
in large scale production , is it preferable to go via ILM or manual ?
Joe Grund
@jgrund
Can you provide details on expected setup?
Zeeshan Ali Shah
@zeeshanali
sure , we have 6 OSS with 12 OSTs totally 1080 Disks -- 2 MDS with 1 MDT , 2nd MDS wl be used as MGS with seperate storage of MGT , furthermore we have 2 Lnet routers and 2 cifs routers
4 OST per 2 OSS --
also OST wl be based on zfs
and my second question is about that.. The 2OSS will be active/active but as zfs pools can only be imported in single OSS host in this case how to achieve active/active HA ?
As what i read is that for active/active both HA hosts should have access to a same sets of disks/volumes.
Joe Grund
@jgrund
hrrm, I haven’t heard of active/active for ZFS
Here’s a doc on how IML expects an HA setup in managed mode: https://whamcloud.github.io/Online-Help/docs/Install_Guide/ig_ch_03_building.html
Joe Grund
@jgrund
We also support a monitor only mode, where you would setup the FS according to your specific HA needs and IML will monitor things like server states and Lustre stats (but will not actively manage your HA setup)
Brian J. Murrell
@brianjmurrell
active/active in the lustre context means that an OSS is active for a subset (usually half in the case of 2-node OSS pairs) of OSTs and it's peers (partner in the 2-node case again) is active for the remaining (other half in the 2-node case) OSTs. so it's active/active OSSes, not active/active OSTs.
Joe Grund
@jgrund
ah, so in with that definition IML supports active/active in managed mode
Brian J. Murrell
@brianjmurrell
yes
Zeeshan Ali Shah
@zeeshanali
Thanks Brian and Joe
e73kiel
@e73kiel
Hi All, Can i install iml to one of my mds servers?
Joe Grund
@jgrund
Not at the moment, but we are looking at using containers to collocate iml with a storage-server.
I’ll update here if it works ok
Alex Talker
@AlexTalker
Hello! Can somebody help me to understand how to deploy your software using Docker? I found the file https://github.com/whamcloud/integrated-manager-for-lustre/blob/master/docker/docker-compose.yml but docker-compose up tells me that ERROR: for setup Cannot create container for service setup: invalid mount config for type "bind": bind mount source path does not exist: /tmp/iml_pw what the file is for?
Alex Talker
@AlexTalker
Okay, I figured that one out. Now I'm trying to rebuild iml-node-libzfs package because it conflicts with my version of nodejs(and if I force its installation, then it fails at runtime). But I stuck on part where libzfs-sys Rust package can't find libzfs_impl.h just because it is in libzfs folder in /usr/include. Does anybody know full recipe for cooking this thing? I'm targeting CentOS 7.4.
LinuxLustre
@LinuxLustre
I am looking for some help on a project where I am trying to use the IML API to download Lustre performance metrics (bytes written to and read from the filesystem). I believe that this is possible because this of page: https://whamcloud.github.io/Online-Help/docs/api/rest_API.html. That document indicates that this task should be possible and it refers to downloading time series of data using the /metrics/ sub-URL, but I haven't been able to make this work yet. Would you be willing to help me see where I might be going wrong? I am running IML version 2.1.2, so that might be a complicating factor, but most things I am seeing with the API appear to be consistent with the current documentation. Thanks for any help you can give!
LinuxLustre
@LinuxLustre
Someone pointed me to this page:
whamcloud/integrated-manager-for-lustre#449
where @jgrund had already posted just what I needed. This URL got me the time-series data on read/write throughput:
https://url-to-manager-here/api/target/metric/?kind=OST&reduce_fn=sum&metrics=stats_read_bytes,stats_write_bytes&begin=2018-01-31T00:00:00.000Z
I wanted to say thanks and post it again in case this helps someone else in the future.
Joe Grund
@jgrund

@AlexTalker In regards for deploying with docker, we’ve intended it to be used with docker stack. Here is a doc for that:

https://whamcloud.github.io/Online-Help/docs/Install_Guide/ig_docker_stack.html

Alex Talker
@AlexTalker
@jgrund As far as I know, stack is good for deploying into cluster, which seems to be possible for your architecture but project seems to be oriented on standalone installation(if I'd used rpm and lets say CentOS). Besides, debugging with compose is a way easier. And I see no real difference for your project between these approaches.
@jgrund Production installation is run on standalone server anyway, so I use this only as temporary environment.
Joe Grund
@jgrund
Sure, no reason why you can’t use compose, just know stack is how we are using it for deployment
Alex Talker
@AlexTalker
@jgrund Also, can you tell me what you mean every time you write "test this please" in PR? I get confused since I'm not part of your team and do not have access to test infrastructure, while surely I do test everything manually, otherwise there's no PR
@jgrund Regarding stack, you do share services between nodes or bind them all to one and the same?
Joe Grund
@jgrund
@AlexTalker sorry, not intended to say you haven’t tested :) It’s how we trigger jenkins runs for external contributions using this plugin: https://wiki.jenkins.io/display/JENKINS/GitHub+pull+request+builder+plugin
Alex Talker
@AlexTalker
@jgrund Wow, can't you mention the bot(which seems to be exist) so it will look more targeting? Or it won't work this way?
Joe Grund
@jgrund
Yeah, the phrasing is unfortunate
I’ll check if I can have a custom trigger
Alex Talker
@AlexTalker
@jgrund Thanks, also, quite often testing process seems to be failing due to dependency installation issue, you might need to pay more attention to such cases.
Joe Grund
@jgrund
Retriggerd that run
Joe Grund
@jgrund

@jgrund Regarding stack, you do share services between nodes or bind them all to one and the same?

All on one node, for now anyway

Alex Talker
@AlexTalker
@jgrund Also, regarding whamcloud/integrated-manager-for-lustre#917 I've got a suggestion that this fix somehow doesn't work on production installation and I think I saw there reverse situation(volume node, associated with target on passive node were deleted after it fail-back) but I could not reproduce this case on Docker and I reproduced this case a few times and it always ended up successfully, so I think this one good to go when you decide it is appropriate to.
But I'll look into it tomorrow I think, should be okay
@jgrund Also, regarding issue with device-scanner if you remember. I dig into the problem why multipath triggered events and it seems every time somebody opens device-mapper device for writing and closes the file description, the event is generated. Even if nothing has been written. Since it requires deep kernel knowledge, I delegated this task but you still might want to check if data you supply has actually changed and supply it only if it is.