by

Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
  • Aug 17 16:41
    hintron commented #84
  • Aug 17 05:23
    mej closed #18
  • Aug 17 05:23
    mej commented #18
  • Aug 17 05:18
    mej closed #75
  • Aug 17 05:18
    mej closed #20
  • Aug 17 05:18

    mej on master

    nhc: Reorder RM detection test… nhc: Don't use "/usr/libexec" … nhc: Activate lazy creation of… (compare)

  • Aug 17 05:18

    mej on dev

    add -D option to nhc-genconf README.md: Add `-e` option doc… nhc: Reorder RM detection test… and 3 more (compare)

  • Aug 17 02:32
    mej milestoned #65
  • Aug 17 02:32
    mej demilestoned #65
  • Aug 17 02:32
    mej assigned #65
  • Aug 17 02:32
    mej labeled #65
  • Aug 17 02:30
    mej commented #84
  • Aug 17 01:05
    mej milestoned #84
  • Aug 17 01:05
    mej labeled #84
  • Aug 17 01:05
    mej labeled #84
  • Aug 17 01:04
    mej assigned #84
  • Aug 17 00:38
    mej closed #79
  • Aug 17 00:38

    mej on master

    add -D option to nhc-genconf README.md: Add `-e` option doc… (compare)

  • Aug 17 00:31

    mej on add_eval_line_docs

    README.md: Add `-e` option doc… (compare)

  • Aug 17 00:30

    mej on feature

    add -D option to nhc-genconf README.md: Add `-e` option doc… (compare)

smichnowicz
@smichnowicz

I am posting here as nhc@lbl.gov bounced. I am using nhc 1.4.2 on a centos7 system. using bash
4.2.46(1)-release (x86_64-redhat-linux-gnu)

Our logs are filling up with many smash stacking errors, I traced the problem to /opt/nhc-1.4.2/sbin/nhc
about line 143. which produces a command like
kill -s USR2 -- -17525 17525
where the problem occurs

Is anyone able to give us any guidance as to how to resolve this issue?

Michael Jennings
@mej
@smichnowicz In order to send e-mail to nhc@lbl.gov, you need to be subscribed to the ML or use the online web forum.
As to your question, are you sure you're using the 1.4.2 release version? That line no longer exists in the nhc script in 1.4.2, in part due to the fact that it was triggering some weird bug in Bash.
Michael Jennings
@mej
@smichnowicz I changed the Group settings so that you can now send e-mail to nhc@lbl.gov even if you're not subscribed. Feel free to resend your e-mail if you'd like.
smichnowicz
@smichnowicz
Thanks for response. We have modified our nhc to reflect the latest changes and the bash error went away. regards Simon
ytghazal
@ytghazal
Hello, I was wondering if there was a way to check the contents of a file, but only the recent contents.
Basically we are using a negative match string in check_file_contents to check on some logs. Unfortunately, after solving the issue, the program does not create a new log file and so nhc continues to hit on the negative match string
Michael Jennings
@mej
@ytghazal At present there isn't. check_file_contents() wasn't really written with log files in mind, and unfortunately bash doesn't have an ability to seek() within an existing file to a particular spot, so I'd have to skip a user-specified number of lines. And even then, that wouldn't allow me to track "recent" changes. Nothing in Linux/UNIX tracks when different parts of a file were written (at least not generically and in a way userland programs could query it), so the only way to track that would require (1) saving state, and (2) a way in bash to seek to a byte value in a file.
It would, however, be possible to write an external Perl/Python script or a C/Go program that NHC could invoke that would be capable of doing that sort of thing.
@ytghazal The external script/program could track the size of the file on last run somewhere (e.g., /var/state/<something>), then seek() to that position on startup and output the rest of the file. Then NHC's check_cmd_output() could be used to assert that your search string wasn't in the new portion of the file. Should be pretty trivial to write. In fact, apart from the tracking-where-to-seek-to portion, tail -c can do exactly that (dump the remainder of a file to stdout).
Michael Jennings
@mej
@ytghazal Actually, now that I'm thinking about it, you could do something like this: * || read LOG_FILE_LINES < /tmp/log_file_lines.tmp && wc -l /path/to/file > /tmp/log_file_lines.tmp && check_cmd_output -m '!/error you want to look for/' tail -n +$LOG_FILE_LINES /path/to/file
@ytghazal That would read in the old line count, store the new line count for the next run, and then use the old line count to tell where to start reading from.
@ytghazal Note that I haven't tested this at all, but hopefully it's not too far off. :-)
novosirj
@novosirj
Hi folks. I'm trying to work with OpenHPC to get this software readmitted to the repository. It came in via Warewulf the first time, and then when the projects split, it was removed. Their concern is support for PBSPro, as it seems primarily geared to TORQUE. My answer to them was that I don't think anything NHC does is so complicated that it's different between the two, and I see the config file mentions PBSPro. Can I get confirmation that it does work with PBSPro? I use SLURM instead, so I've not personally tried it.
novosirj
@novosirj
I should say, they're really talking about PBSPro open source. Which to me seems like some kind of poor naming. :)
downloadico
@downloadico
Hello everyone! I'm rewriting my node health checker script for the bajillionth time. I came across NHC and was intrigued. Is it easy to get a fairly basic configuration up and running? Is there a "I'm too busy with other work to even breathe so I need something quick" guide to NHC? I really just want to check that 1. my filesystems are mounted 2. this machine can resolve users from LDAP/NIS and 3. the resource manager daemon is running
Michael Jennings
@mej
@downloadico If you run the nhc-genconf utility that comes with NHC, it'll generate a config file for you that has most of that stuff already covered. :-)
Just make sure all the filesystems you want to check are mounted when you run it.
Michael Jennings
@mej
There will also be some sample check_ps_service tests in there which you can easily modify to look for your RM daemon, whatever that happens to be. Then add a check like this to verify LDAP/NIS resolution: check_cmd_status getent passwd <someuser>
You can nuke all the other sample checks if that's all you care about. :-)
downloadico
@downloadico
thanks!