Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
  • Sep 14 23:10
    codecov-io commented #1202
  • Sep 14 23:08
    codecov-io commented #1202
  • Sep 14 22:52
    codecov-io commented #1202
  • Sep 14 22:15
    codecov-io commented #1202
  • Sep 14 22:14
    codecov-io commented #1202
  • Sep 14 22:04
    codecov-io commented #1202
  • Sep 14 22:04
    jgrund synchronize #1202
  • Sep 14 22:04

    jgrund on django-bump

    Update to Django 1.11.23 COPR … (compare)

  • Sep 14 21:42
    codecov-io commented #1207
  • Sep 14 19:47
    codecov-io commented #1202
  • Sep 14 19:47
    codecov-io commented #1207
  • Sep 14 18:50
    codecov-io commented #1202
  • Sep 14 18:38
    codecov-io commented #1202
  • Sep 14 18:23
    codecov-io commented #1202
  • Sep 14 17:43
    codecov-io commented #1202
  • Sep 14 16:55
    codecov-io commented #1202
  • Sep 14 16:54
    codecov-io commented #1202
  • Sep 14 16:44
    codecov-io commented #1202
  • Sep 14 16:44
    jgrund synchronize #1202
  • Sep 14 16:44

    jgrund on django-bump

    Update to Django 1.11.23 COPR … (compare)

Joe Grund
@jgrund

It’s going to be a few weeks until I can get it promoted to 5.0.

I need to make sure the new version works well enough for managed and monitored modes and upgrades.
I think I have managed mode working well, I need to spend some time now making sure monitored mode works.

There are patches on the agent and manager sides as well that will need to land:

whamcloud/iml-agent#98
whamcloud/integrated-manager-for-lustre#947

@jgrund From here? https://copr-be.cloud.fedoraproject.org/results/managerforlustre/device-scanner-devel/epel-7-x86_64/

Yes, that’s where the devel patches are being built

Alex Talker
@AlexTalker
I'm just hanging around this problem with "VolumeNode does not exists" problem but I'm going on a vacation next week. So I would really like to resolve the problem before it but since I debug only monitored mode, your current direction of resolving this issue is a bit deviated from mine
Which arises a problem that we look on different behavior directions
Also, as you might seen, this problem with volumes in UI hasn't been fixed for me.
Joe Grund
@jgrund
@AlexTalker can you share the rows in the database where you are seeing the VolumeNode error?
I’ll start integrating monitored mode support this week
Alex Talker
@AlexTalker
@jgrund Yeah, sure, just let me reproduce it
@jgrund Today I updated all agent-side code to packages for 5.0 as it seems to be fresher than devel repo, so now I haven't checked if the case still can be reproduced
Amit Kumar
@ahkumar
@jgrund i have an older 4.x install of IML that I am trying to wipe clean and upgrade. Can I just wipe the OS and install the NEW IML without having to remove the servers from the 4.x IML? I am just being careful/worries in case the agents on the production server that were added in monitoring mode will start behaving oddly if i were to add them back to the newly installed IML 5.x ?
Joe Grund
@jgrund
@ahkumar wipe the manager node OS?
Amit Kumar
@ahkumar
@jgrund yes wipe the manager nodes OS only, lustre server OS will remain as is as they are production ;)
node*
Joe Grund
@jgrund
You shouldn’t need to wipe the manager OS, just update it in place
What version OS are you running?
Amit Kumar
@ahkumar
@jgrund running 4.1.5 and centos 7.4 on the Lustre servers and centos 7.5 on IMLmanager node. Reason I want to wipe clean is the current IML manager node that I previously attempted with 4.x is in state where no action is allowed for me to even remove the production lustre server which were added in monitoring mode. Given this manager node is in the state I thought it would be nice for a clean install
Amit Kumar
@ahkumar
@jgrund If it does not hurt wiping clean I would prefer that, rather than digging through any issues after the upgrade . But my only concern is if this could pose any issues? if then I will follow the upgrade path?
Alex Talker
@AlexTalker
@ahkumar I'd rather recommended removing all nodes from IML if you want re-install. If you just shutdown IML server node without de-registering agents, this might lead to interesting behavior I think.
Amit Kumar
@ahkumar
image.png
@AlexTalker thank you for the note. But current state of my monitoring only install of IML is such that I cannot remove the nodes. The action button is grayed out. Any other way to de-register the agents?
Alex Talker
@AlexTalker
@ahkumar Is there no button "Force remove"?
@ahkumar If not, on each node you must stop "iml-storage-server.target" and clean up files in /var/lib/chroma/I think
Amit Kumar
@ahkumar
@AlexTalker yup there is no button "Force Remove" Even when I hover over the actions button there is no drop down that gives me any option
Alex Talker
@AlexTalker
@ahkumar Then, obviously you either want to wipe database on IML server or re-install the system
Amit Kumar
@ahkumar
@AlexTalker Would this be appropriate link to follow to remove agents https://whamcloud.github.io/Online-Help/docs/Contributor_Docs/cd_UnInstall_IML.html . Although this link assumes that IML was installed in managed mode. I might have to be careful in removing on agent based components true?
Alex Talker
@AlexTalker
@ahkumar Well, this seems about right but limit you actions to the agent. Mean, you don't need to disable corosync or delete the network, just remove the bloody agent.
@ahkumar And skip the step about removing it via UI
@ahkumar Just reinstall
Amit Kumar
@ahkumar
@AlexTalker thank you, will just stick to stopping iml-storage-server.target & chroma-agent.service and removing python2-iml-agent-4.1.4-1.el7.noarch. & clean up /var/lib/chroma
@AlexTalker Hope this works ..
Alex Talker
@AlexTalker
@ahkumar Yeah, good luck
Amit Kumar
@ahkumar
@AlexTalker thank you!!
Amit Kumar
@ahkumar
@AlexTalker @jgrund removing agent and reinstalling went smooth. thank you for your input and help!
Joe Grund
@jgrund
@ahkumar Thanks to @AlexTalker glad it worked for you
Amit Kumar
@ahkumar
@jgrund yes thank you @AlexTalker as well. One of these warnings have been repeating itself. Can this be made not to complain: (/usr/lib/python2.7/site-packages/requests/packages/urllib3/connectionpool.py:821: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.org/en/latest/security.html) ??
@jgrund another question: can I dump the metrics(jobstats) to elasticsearch ? so I can have some historical info?
Joe Grund
@jgrund
@ahkumar, you should be able to script some ingest against the IML REST API
Amit Kumar
@ahkumar
@jgrund Great. I will use the API and script it found an example in the docs. Also wondering if you have an answer for the InsecureRequestWarning above?
Joe Grund
@jgrund
WIll need to take a look futher. Can you open an issue at https://github.com/whamcloud/integrated-manager-for-lustre/issues/new/choose with more detail?
Amit Kumar
@ahkumar
@jgrund sounds good will do!
Amit Kumar
@ahkumar
@jgrund Wondering how frequently is "jobstats" collected? and is it stored in the database that I can retrieve for historical data on jobs IOPS etc? if yes is there an API i can use to get that infromation?
Alex Talker
@AlexTalker
@ahkumar Yes, if you grant access to postgres manually, you can find jobstat info, look for tables with numbers in their names. As for API, AFAIK, the current one just streams information from database by websocket and you can inspect it on jobstat web ui page. Jobstats collected each 10 seconds I think.
Alex Talker
@AlexTalker
@ahkumar Here's my approach to request statistics from PostgreSQL: select s.dt, s."sum", e.name, e.id, t.name from chroma_core_sample_10 s, chroma_core_series e, chroma_core_managedtarget t where s.id = e.id and e.object_id = t.id and t.state = 'mounted' --and t.name = 'lustre1-OST0001' and s.dt >= '2019-06-28 12:30:40.000Z' and s.dt <= '2019-06-28 12:33:10.000Z' and e.name like '%_write_bytes%';
Change parameters to fit your needs, you may remove some conditions to check if data is present
Joe Grund
@jgrund
@ahkumar @AlexTalker There is also Rest JSON API access if you don’t want to hit the db directly
bmerchant
@bmerchant
Is it possible to create a non-HA server with IML? We have a customer with a single MDS/MGS node and the filesystem format fails at configuring corosync and doesn't even attempt to format or start targets
Joe Grund
@jgrund
Deploy Lustre from IML, but no HA?
Not at the moment. IML can monitor existing filesystems without needing HA
bmerchant
@bmerchant
That's what I figured, thanks for confirming. We went manual install + monitor mode route