Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
Gabriel Indurskis
@gabindu
Are there some load (or other) issues on cocalc.com today? I had a class this morning with about 15 out of 30 students unable to get into their projects (even though I had pre-started them, and they are members-only), and since then I've also had intermittent connection issues, sometimes getting a "Bad Gateway" error pointing towards the host (browser and cloudfare are reported working, but the host is not), e.g. in the pdf-preview while editing a latex document. (by the way, is there a new gitter room for cocalc, or is this still the best place for it?) Thanks!
William Stein
@williamstein
This is still fine for CoCalc.
Yes, we had very substantial problems earlier today, though they have all been resolved as of about 2-3 hours ago. At present the load is minimal and everything is working very well.
The problems we had earlier today were not at all related to too much load from students. Mainly there was an issue with how Kubernetes was scheduling pods unevenly across our network...
Here's our actual load right now (there are 4-8 core machines so each number is out of 4):
kucalc-prod-node-qy3l 20:45:45 up 2:53, 0 users, load average: 0.06, 0.23, 0.30
kucalc-prod-node-o2t0 20:45:45 up 2:53, 0 users, load average: 0.07, 0.23, 0.29
kucalc-prod-node-6ubb 20:45:45 up 2:53, 0 users, load average: 0.13, 0.33, 0.41
kucalc-prod-node-8yl2 20:45:45 up 2:53, 0 users, load average: 0.14, 0.66, 0.68
kucalc-prod-node-qknh 20:45:45 up 2:48, 0 users, load average: 0.18, 0.34, 0.39
kucalc-prod-node-0j2v 20:45:45 up 2:53, 0 users, load average: 0.20, 0.48, 0.53
kucalc-prod-node-u0by 20:45:45 up 2:53, 0 users, load average: 0.30, 0.46, 0.40
kucalc-prod-node-gd20 20:45:45 up 2:53, 0 users, load average: 0.30, 0.64, 0.56
kucalc-prod-node-1p1e 20:45:45 up 2:53, 0 users, load average: 0.31, 0.56, 0.51
kucalc-prod-node-mgnp 20:45:45 up 2:53, 0 users, load average: 0.33, 0.38, 0.40
kucalc-prod-node-1zyf 20:45:45 up 2:53, 0 users, load average: 0.44, 0.37, 0.35
kucalc-prod-node-vqpr 20:45:45 up 2:53, 0 users, load average: 0.47, 0.91, 0.87
kucalc-prod-node-6yoa 20:45:45 up 2:53, 0 users, load average: 0.53, 0.49, 0.45
kucalc-prod-node-i0xn 20:45:45 up 2:53, 0 users, load average: 0.60, 0.84, 1.02
kucalc-prod-node-4scz 20:45:45 up 2:53, 0 users, load average: 0.63, 0.79, 0.64
kucalc-prod-node-ej68 20:45:45 up 2:48, 0 users, load average: 0.67, 0.65, 0.54
kucalc-prod-node-k62a 20:45:45 up 2:48, 0 users, load average: 0.72, 0.48, 0.42
kucalc-prod-node-xrpb 20:45:45 up 2:53, 0 users, load average: 0.74, 0.42, 0.34
kucalc-prod-node-1e5z 20:45:45 up 2:53, 0 users, load average: 0.75, 0.62, 0.57
kucalc-prod-node-dmhh 20:45:45 up 2:53, 0 users, load average: 0.85, 0.74, 0.75
kucalc-prod-node-igq9 20:45:45 up 2:53, 0 users, load average: 0.91, 0.80, 0.70
kucalc-prod-node-gm97 20:45:45 up 2:53, 0 users, load average: 1.01, 1.83, 1.27
kucalc-prod-node-6btx 20:45:45 up 2:53, 0 users, load average: 1.02, 0.55, 0.47
kucalc-prod-node-8v7u 20:45:45 up 2:53, 0 users, load average: 1.05, 0.93, 0.75
kucalc-prod-node-u1dt 20:45:45 up 2:53, 0 users, load average: 1.48, 1.60, 1.48
kucalc-prod-node-gxs6 20:45:45 up 2:53, 0 users, load average: 2.54, 3.12, 2.54
kucalc-prod-node-qqnk 20:45:45 up 2:53, 0 users, load average: 2.58, 2.38, 1.90
kucalc-prod-node-d70z 20:45:45 up 2:53, 0 users, load average: 2.83, 2.17, 1.73
(these are gce n1-highmem-4's...)
(note -- some of the nodes are n1-highmem-8)
Gabriel Indurskis
@gabindu
ok, great to hear that a) I wasn't imaging things, b) the problem wasn't on our college's IT side (which is not always a given...), and c) you got it under control now. Thanks!
William Stein
@williamstein
@gabindu it’s been a very hard week — I was up very late last night and we fully implemented and deployed a solution to another class of problems we were having (involving google pre-emptible instances, kubernetes daemonsets, mounting, race conditions, and missing features in kubernetes that people want...). Today everything is working really well, which is quite a relief and exciting.
This new backend provides a foundation for us to fairly easily implement features we’ve wanted for a long time. E.g., imagine paying to get a dedicated 64-vCPU virtual machine with 208GB RAM for 8 hours where all projects you want to run there, run there, and having it be about $1/hour billed at the minute-level resolution. And having that be fast and automated — it could be great for serious computations during meetings, etc....
Gabriel Indurskis
@gabindu
sorry to hear you had so much trouble this week - but I hope it'll be smooth(er) sailing from now on, it does sound like the backend switch was a good/necessary move for the future. Apart from the snafu yesterday morning, my students really are getting into using cocalc as well, so kudos (again) for a great system!
William Stein
@williamstein
Thanks!
Gabriel Indurskis
@gabindu
I know it's not officially supported, but I've been using the Dropbox commandline client without troubles (on a members-only, networked project) until now - but since the move to Kubernetes, I've had some trouble: whenever my project restarts, the DB client needs to be relinked to my account. I presume this might be because the project's virtual server instance (including ip address) changes? Is there any way we could fix this?
William Stein
@williamstein
Yes, the ip address of the project does change whenever it restarts, and there is absolutely no way around that. However, we do have a new (undocumented) facility for automatically running a script whenever a project starts. Just put anything you want run when your project starts in a file called project.init in your home directory.
(In general, look in /cocalc/supervisor/supervisord.conf to see what happens when now projects start.)
For the record I don’t like the name project.init so much, so we’ll likely change it, though maybe keep it working for backward compatibility...
Harald Schilly
@haraldschilly
@gabindu hi, I'm the one who added this as a test. Once I know a better name I'll change it. Also ,there'll only be one entry for such a daemon script (which won't restart if the exit value is 0). Maybe we should have two for python and bash script, e.g. project_init.py and project_init.sh.
Gabriel Indurskis
@gabindu
cool, that's great to know, I've actually just put it in .bashrc for the moment (dropbox start; dropbox status), which works even if it was already running, and gives a quick feedback on the current status whenever I open a terminal. (By the way, this could be a neat feature for the future: add a custom tray icon/status area in the status bar somewhere which can be accessed by scripts.)
(to show the status of running daemons etc.)
Harald Schilly
@haraldschilly
yes, .bashrc is run when you open the terminal, but a restarted project doesn't run anything by default :-)
I think in your case it's sufficient if you make this project.init file with the content
#!/usr/bin/env bash
dropbox start
Gabriel Indurskis
@gabindu
very good point, so this is definitely better in the long run (just move the dropbox start into project.init, and leave the dropbox status in .bashrc. ;-) )
Harald Schilly
@haraldschilly
yes
and well, it will break when we rename the supposed name of this magic file ... but that's the cost of using unreleased features :-p
Gabriel Indurskis
@gabindu
oh, that's quite alright, I'm used to living on the edge. ;-)
But thinking about the Dropbox relinking issue some more, I begin to wonder if the changing IP address is really the culprit: most laptops and other devices will have wildly changing IP addresses. So it's a bit of a mystery to me why DB stops syncing. I guess I'll have to do more testing to identify when it actually happens - for the moment I've increased the timeout for this particular project so that it'll (maybe) never restart.
Thanks for the help and insight!
Harald Schilly
@haraldschilly
well, it definitely stops when the project stops. everything your run in your project is containerized.
I'm not a user of dropbox, but I'm curious if there is somewhere a logfile?
There might be some explanation inside of it (maybe in ~/.local/... or so)
Gabriel Indurskis
@gabindu
The default daemon script doesn't keep any log files unfortunately, but it keeps various config and account information files in ~/.dropbox
Harald Schilly
@haraldschilly

I just started it:

$ find ~/.dropbox/logs/
/home/user/.dropbox/logs/
/home/user/.dropbox/logs/0
/home/user/.dropbox/logs/1
/home/user/.dropbox/logs/1/1-59b028a2

but no idea if this stays there after a restart

Gabriel Indurskis
@gabindu
hm, yes, just checked in my own folder, there is indeed a logs folder, with two subfolders, but no files in it.
Harald Schilly
@haraldschilly
well, no idea actually. that logfile here above contains some binary information... weird
Gabriel Indurskis
@gabindu
I just did a test restarting the project and the dropbox daemon manually. It then indeed recreates logfiles in that directory - but they are all binary files. I tried gunzip and tar tfz as some obvious candidates, but file only returns data as file type.
Anyhow, the more important observation is that restarting the project manually does not affect the dropbox daemon, the account stays linked. So maybe the issue only arises when the project has been stopped for a longer period of time. The dropbox daemon keeps track of a hostkey in one of its config files in .dropbox, so my guess is that this somehow is recomputed and gives a different value under certain circumstances.
But let's not waste too much time with this now, I'll investigate this some more if and when it happens again. Until then, back to more important work. ;-) Thanks again!
Gabriel Indurskis
@gabindu
Just as a quick note to others who read this and want to use project.init (until it gets renamed): it has be an executable file, so make sure to use chmod a+x project.init
William Stein
@williamstein
Projects are no longer attached to specific host computers — when you restart them (or stop and start them later) — they come up on the “best” (by some criterion) available host. That may or may not be the same computer you were using before.
This model is much more efficient. For example, when a pre-emptible node goes down, all the projects running on it fan out to the rest of the cluster and are running again in seconds, rather than waiting 10 minutes for the preemptible to full boot up, bootstrap, join the mesh network, etc.
Gabriel Indurskis
@gabindu
I see, that makes a lot of sense - and might indeed explain the Dropbox relinking problem.
iara iapc
@iaraiapc
127/5000
Hello everyone. Is there a command equivalent to "typeset (True)" for Jupyter? I want the outputs of the evaluations to be pretty.
ops, "typeset _mode(True)"
iara iapc
@iaraiapc
Hi W Stein
thanks for this ;) it works now
Gabriel Indurskis
@gabindu
Hi there, I just hit a weird problem from one moment to the other: typeset math in a Sage worksheet suddenly doesn't display anymore. For example: A=matrix(2,2,[1,2,3,4]); show(A); A only shows the non-typeset version of the matrix. I've tested with Chrome and Firefox, so it doesn't seem to be a browser issue (in particular since it did work just a few minutes ago...) Was there a (very) recent change in the backend to cause this? Thanks for checking! (my students are writing an exam this week on CoCalc, so I do hope we can fix this quickly...)
Gabriel Indurskis
@gabindu
ah, it just came back and now seems to work again. No idea if this was just a temporary issue on my end or a fix on your end, but thanks in any case (and sorry for the false-alert if it was indeed on my end). ;-)
Michael McNeil Forbes
@mforbes
Is there any way to use Notebooks in the CoCalc environment using something like the RISE extension? I can't find reference to this, so I suspect not, but we have some upcoming presentations where most students are using Jupyter notebooks and I thought it would be a nice way for them to show their work.
Ah. I missed sagemathinc/cocalc#2158. Could be very useful.
William Stein
@williamstein

Was there a (very) recent change in the backend to cause this?

I changed the frontend to not show raw stuff when rendering mathjax; maybe it was taking a little while to load...