Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
  • 21:58
    johnsonw labeled #2321
  • 21:58
    johnsonw ready_for_review #2321
  • 21:57
    johnsonw synchronize #2321
  • 21:57

    johnsonw on linux-devices

    Begin converting the linux plug… (compare)

  • 20:09

    jgrund on fixup

    (compare)

  • 20:09

    jgrund on master

    Fixup changed structure path (#… (compare)

  • 20:09
    jgrund closed #2345
  • 20:04
    utopiabound synchronize #2337
  • 20:04

    utopiabound on task-stats

    Do stats accounting for tasks … (compare)

  • 19:59
    jgrund edited #2345
  • 19:59
    jgrund assigned #2345
  • 19:59
    jgrund labeled #2345
  • 19:59
    jgrund opened #2345
  • 19:59

    jgrund on fixup

    Fixup changed structure path S… (compare)

  • 19:49

    jgrund on ex

    (compare)

  • 19:49

    jgrund on master

    Implement GUI for Stratagem rep… (compare)

  • 19:49
    jgrund closed #2334
  • 19:42
    jgrund commented #2344
  • 19:38
    jgrund synchronize #2134
  • 19:35
    jgrund review_requested #2334
Joe Grund
@jgrund
Here’s a doc on how IML expects an HA setup in managed mode: https://whamcloud.github.io/Online-Help/docs/Install_Guide/ig_ch_03_building.html
Joe Grund
@jgrund
We also support a monitor only mode, where you would setup the FS according to your specific HA needs and IML will monitor things like server states and Lustre stats (but will not actively manage your HA setup)
Brian J. Murrell
@brianjmurrell
active/active in the lustre context means that an OSS is active for a subset (usually half in the case of 2-node OSS pairs) of OSTs and it's peers (partner in the 2-node case again) is active for the remaining (other half in the 2-node case) OSTs. so it's active/active OSSes, not active/active OSTs.
Joe Grund
@jgrund
ah, so in with that definition IML supports active/active in managed mode
Brian J. Murrell
@brianjmurrell
yes
Zeeshan Ali Shah
@zeeshanali
Thanks Brian and Joe
e73kiel
@e73kiel
Hi All, Can i install iml to one of my mds servers?
Joe Grund
@jgrund
Not at the moment, but we are looking at using containers to collocate iml with a storage-server.
I’ll update here if it works ok
Alex Talker
@AlexTalker
Hello! Can somebody help me to understand how to deploy your software using Docker? I found the file https://github.com/whamcloud/integrated-manager-for-lustre/blob/master/docker/docker-compose.yml but docker-compose up tells me that ERROR: for setup Cannot create container for service setup: invalid mount config for type "bind": bind mount source path does not exist: /tmp/iml_pw what the file is for?
Alex Talker
@AlexTalker
Okay, I figured that one out. Now I'm trying to rebuild iml-node-libzfs package because it conflicts with my version of nodejs(and if I force its installation, then it fails at runtime). But I stuck on part where libzfs-sys Rust package can't find libzfs_impl.h just because it is in libzfs folder in /usr/include. Does anybody know full recipe for cooking this thing? I'm targeting CentOS 7.4.
LinuxLustre
@LinuxLustre
I am looking for some help on a project where I am trying to use the IML API to download Lustre performance metrics (bytes written to and read from the filesystem). I believe that this is possible because this of page: https://whamcloud.github.io/Online-Help/docs/api/rest_API.html. That document indicates that this task should be possible and it refers to downloading time series of data using the /metrics/ sub-URL, but I haven't been able to make this work yet. Would you be willing to help me see where I might be going wrong? I am running IML version 2.1.2, so that might be a complicating factor, but most things I am seeing with the API appear to be consistent with the current documentation. Thanks for any help you can give!
LinuxLustre
@LinuxLustre
Someone pointed me to this page:
whamcloud/integrated-manager-for-lustre#449
where @jgrund had already posted just what I needed. This URL got me the time-series data on read/write throughput:
https://url-to-manager-here/api/target/metric/?kind=OST&reduce_fn=sum&metrics=stats_read_bytes,stats_write_bytes&begin=2018-01-31T00:00:00.000Z
I wanted to say thanks and post it again in case this helps someone else in the future.
Joe Grund
@jgrund

@AlexTalker In regards for deploying with docker, we’ve intended it to be used with docker stack. Here is a doc for that:

https://whamcloud.github.io/Online-Help/docs/Install_Guide/ig_docker_stack.html

Alex Talker
@AlexTalker
@jgrund As far as I know, stack is good for deploying into cluster, which seems to be possible for your architecture but project seems to be oriented on standalone installation(if I'd used rpm and lets say CentOS). Besides, debugging with compose is a way easier. And I see no real difference for your project between these approaches.
@jgrund Production installation is run on standalone server anyway, so I use this only as temporary environment.
Joe Grund
@jgrund
Sure, no reason why you can’t use compose, just know stack is how we are using it for deployment
Alex Talker
@AlexTalker
@jgrund Also, can you tell me what you mean every time you write "test this please" in PR? I get confused since I'm not part of your team and do not have access to test infrastructure, while surely I do test everything manually, otherwise there's no PR
@jgrund Regarding stack, you do share services between nodes or bind them all to one and the same?
Joe Grund
@jgrund
@AlexTalker sorry, not intended to say you haven’t tested :) It’s how we trigger jenkins runs for external contributions using this plugin: https://wiki.jenkins.io/display/JENKINS/GitHub+pull+request+builder+plugin
Alex Talker
@AlexTalker
@jgrund Wow, can't you mention the bot(which seems to be exist) so it will look more targeting? Or it won't work this way?
Joe Grund
@jgrund
Yeah, the phrasing is unfortunate
I’ll check if I can have a custom trigger
Alex Talker
@AlexTalker
@jgrund Thanks, also, quite often testing process seems to be failing due to dependency installation issue, you might need to pay more attention to such cases.
Joe Grund
@jgrund
Retriggerd that run
Joe Grund
@jgrund

@jgrund Regarding stack, you do share services between nodes or bind them all to one and the same?

All on one node, for now anyway

Alex Talker
@AlexTalker
@jgrund Also, regarding whamcloud/integrated-manager-for-lustre#917 I've got a suggestion that this fix somehow doesn't work on production installation and I think I saw there reverse situation(volume node, associated with target on passive node were deleted after it fail-back) but I could not reproduce this case on Docker and I reproduced this case a few times and it always ended up successfully, so I think this one good to go when you decide it is appropriate to.
But I'll look into it tomorrow I think, should be okay
@jgrund Also, regarding issue with device-scanner if you remember. I dig into the problem why multipath triggered events and it seems every time somebody opens device-mapper device for writing and closes the file description, the event is generated. Even if nothing has been written. Since it requires deep kernel knowledge, I delegated this task but you still might want to check if data you supply has actually changed and supply it only if it is.
@jgrund You can reproduce it if you write with dd on a multipath disk, as often dd ends - so often the even is triggered.
Alex Talker
@AlexTalker
@jgrund I do not know for sure but I think this triggers device-aggregator on server side and since nothing changes, it is useless to go further than device-scanner process
Joe Grund
@jgrund
I’ll take a look at that.

In regards to filtering, I have an issue that I still need to get to for that:

whamcloud/device-scanner#193

Alex Talker
@AlexTalker
The issue seems to cover my problem
Unfortunately, we required to interact with devices so often by design. Yet another way to implement HA, you know
But if we'll figure out how to disable triggering this event, I'll let you know
Joe Grund
@jgrund
Are the UEvents emitted identical for each write?
Alex Talker
@AlexTalker
@jgrund The only thing that changes is ID, no particular change in data that I could notice via udevadm monitor -p. Can't remember how ID's named now :/
Amit Kumar
@ahkumar
@jgrund Hi Joe, I am was able to get iml5 after reproving the OS to remove previous version of postgress. I have IML up and now I don't see that the OST's are going offline
@jgrund this one looks great!!
Joe Grund
@jgrund
@ahkumar Ok, I can stop by and take a look at what’s happening next break
@ahkumar Thanks :)
Amit Kumar
@ahkumar
@jgrund thank you!!
Amit Kumar
@ahkumar
@jgrund Wondering if this doc https://whamcloud.github.io/Online-Help/docs/Contributor_Docs/cd_Installing_IML_On_Vagrant.html is available offline? so I can work offline ?
Joe Grund
@jgrund

Not really. If you have IML installed, those docs are bundled in under the help link, but I suspect you want a standalone solution.

We have an open that describes a way to do so (which we plan to automate) here: whamcloud/Online-Help#142

But there are also plugins available that can download webpages for you in Chrome or Firefox if you want to go that route.

Amit Kumar
@ahkumar
@jgrund Cool will try other options you mention, thank you!!
Alex Talker
@AlexTalker
@jgrund whamcloud/device-scanner#270 We checked and udev event is triggered on closing multipath device only on systems with IML. I asked to find out why systemd-udevd actually triggers the event.
Joe Grund
@jgrund
Ok, I’ll take a look