Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
    Trophime
    @Trophime

    @emepetres

    Dear @/all ,
    Orchestrator new features will be deployed on the canary branch. That means that, to test your applications with latest changes, you will need to change the plugin import from master to canary:
    http://raw.githubusercontent.com/MSO4SC/cloudify-hpc-plugin/master/plugin.yaml -> https://raw.githubusercontent.com/MSO4SC/cloudify-hpc-plugin/canary/plugin.yaml

    In canary version two changes have been added to enable authentication through private key, and through a tunnelled ssh connection (see README). This changes have allowed the integration with Atlas HPC, pending thorough tests from Unistra side.

    where is the README you are talking about?

    Guillaume Dollé
    @gdolle
    @nde It seems so. But I think it's related to the orchestrator and/or sregistry
    currently trying to pull from cesga sregistry is totally blocked
    @victorsndvg are you working on it ?
    Trophime
    @Trophime
    regarding the trouble with singularity I've experienced this issue on my "canary" registry. I add to change some settings in nginx.conf:
    server {
      listen                *:80;
      server_name           localhost;
    
      client_max_body_size 8000M;
      client_body_buffer_size 8000M;
      client_body_timeout 120;
    
    ...
    as far as I remember the trouble was "removed" with http but not https
    But I think @victorsndvg can comment on this
    Javi Carnero
    @emepetres
    @Trophime that's great, @victorsndvg was looking into nginx config as well
    @Trophime could you post this info on the issue? -> MSO4SC/cloudify-hpc-plugin#65
    victorsndvg
    @victorsndvg
    yes, sorry @gdolle , give one hour more please
    Guillaume Dollé
    @gdolle
    Arf we planned a test session with students this morning, I think around 11h (time to prepare settings).
    Do you think it'll be ok in one hour ?
    victorsndvg
    @victorsndvg
    Hopefully yes
    Guillaume Dollé
    @gdolle
    @victorsndvg Just a remark, around 9h30 I run 3 consecutive times different deployment via the orchestrator and the download finished. (In case you changed some config)
    Guillaume Dollé
    @gdolle
    @emepetres @victorsndvg If you want to test via the portal you can buy on the marketplace the new Feel++ CSM Toolbox. It's simpler to use.
    Javi Carnero
    @emepetres
    @gdolle is the same app or a new one?
    Guillaume Dollé
    @gdolle
    It's a new one with very few input
    Javi Carnero
    @emepetres
    I mean, is a new product in the marketplace?
    Guillaume Dollé
    @gdolle
    Yes
    I added 5 simple toolboxes app. You can take the CSM, I know it works well if the download finish.
    Javi Carnero
    @emepetres
    great thnks
    victorsndvg
    @victorsndvg
    @gdolle, I'm not going to reset the server more this morning. Some random timeouts still occur. But simultaneous images download has been fixed
    Please try to not download simultaneously 20x3.5Gb, too much load
    Niyazi Cem Degirmenci
    @ncde
    from my side now my image can be downloaded from tegner using shub
    the image is 1.1G
    it was failing at 93% before now it could download
    Guillaume Dollé
    @gdolle
    Ok thx @victorsndvg, we will try with students.
    victorsndvg
    @victorsndvg
    :+1: good luck
    Guillaume Dollé
    @gdolle
    :+1: thx
    victorsndvg
    @victorsndvg
    Again, try to scale students. not all at the same time downloading the images
    Guillaume Dollé
    @gdolle
    Ok I'll mention that.
    victorsndvg
    @victorsndvg
    takes about 3-4 minutes per image
    Guillaume Dollé
    @gdolle
    ok
    Do you have an idea how many simultenaous download is ok ?
    victorsndvg
    @victorsndvg
    1 is the best choice .. but maybe 2 or 3 ... I've no check more. But more simultaneous pulls more error prompt
    Guillaume Dollé
    @gdolle
    aw.. but is it because of nginx ?
    victorsndvg
    @victorsndvg
    All nginx conf (i need to refine it), host VM resources, etc.
    Trophime
    @Trophime
    can no longer test app in orchestrator-cli: Deployment environment creation is pending...
    is this normal?
    Trophime
    @Trophime
    @empetres without cli working I cannot debug any blueprint!!!
    victorsndvg
    @victorsndvg
    @Trophime , take a look to this example. You can use the cli without connecting to the server
    Trophime
    @Trophime
    going back to my stuck deployment I had some new info.
    I've tried to add debug flag. When it gets pending in create_deployment_environment the debug mode repetitively report:
    response header:  Server: nginx
    response header:  Date: Thu, 14 Jun 2018 07:29:30 GMT
    response header:  Content-Type: application/json
    response header:  Content-Length: 6101
    response header:  Connection: keep-alive
    Sending request: GET http://193.144.35.131:80/api/v3.1/executions/560e97ca-0217-444b-98e0-bf3452a16a88
    request header:  Connection: keep-alive
    request header:  Accept-Encoding: gzip, deflate
    request header:  Accept: */*
    request header:  User-Agent: python-requests/2.18.4
    request header:  Content-type: application/json
    request header:  Tenant: default_tenant
    request header:  Authorization: Basic YWRtaW46bXNvNHNjSXRlcmF0aW9uMnBvd2Vy
    reply:  "200 OK" {"status": "pending", "is_system_workflow": false, "parameters": {"deployment_plugins_to_install": [{"distribution_release": null, "install_arguments": null, "name": "hpc", "package_name": "cloudify-hpc-plugin", "distribution_version": null, "package_version": "1.1.1", "supported_platform": null, "source": "https://github.com/Trophime/cloudify-hpc-plugin/archive/add_checks.zip", "install": true, "executor": "central_deployment_agent", "distribution": null}, {"distribution_release": null, "install_arguments": null, "name": "agent", "package_name": null, "distribution_version": null, "package_version": null, "supported_platform": null, "source": null, "install": false, "executor": "central_deployment_agent", "distribution": null}], "workflow_plugins_to_install": [{"distribution_release": null, "install_arguments": null, "name": "default_workflows", "package_name": null, "distribution_version": null, "package_version": null, "supported_platform": null, "source": null, "install": false, "executor": "central_deployment_agent", "distribution": null}, {"distribution_release": null, "install_arguments": null, "name": "hpc", "package_name": "cloudify-hpc-plugin", "distribution_version": null, "package_version": "1.1.1", "supported_platform": null, "source": "https://github.com/Trophime/cloudify-hpc-plugin/archive/add_checks.zip", "install": true, "executor": "central_deployment_agent", "distribution": null}], "policy_configuration": {"policy_types": {"cloudify.policies.types.host_failure": {"source": "file:///opt/manager/resources/cloudify/policies/host_failure.clj", "properties": {"policy_operates_on_group": {"default": false, "description": "If the policy should maintain its state for the whole group\nor each node instance individually.\n"}, "is_node_started_before_workflow": {"default": true, "description": "Before triggering workflow, check if the node state is started"}, "service": {"default": ["service"], "description": "Service names whose events should be taken into consideration"}, "interval_between_workflows": {"default": 300, "description": "Trigger workflow only if the last workflow was triggered earlier than interval-between-workflows seconds ago.\nif < 0  workflows can run concurrently.\n"}}}, "cloudify.policies.types.ewma_stabilized": {"source": "file:///opt/manager/resources/cloudify/policies/ewma_stabilized.clj", "properties": {"upper_bound": {"default": true, "description": "boolean value for describing the semantics of the threshold.\nif 'true': metrics whose value is bigger than the threshold will cause the triggers to be processed.\nif 'false': metrics with values lower than the threshold will do so.\n"}, "is_node_started_before_workflow": {"default": true, "description": "Before triggering workflow, check if the node state is started"}, "ewma_timeless_r": {"default": 0.5, "description": "r is the ratio between successive events. The smaller it is, the smaller impact on the computed value the most recent event has.\n"}, "service": {"default": "service", "description": "The service name"}, "stability_time": {"default": 0, "description": "How long a threshold must be breached before the triggers will be processed"}, "policy_operates_on_group": {"default": false, "description":
    Javi Carnero
    @emepetres
    But other deployments doesn't fail right?
    Have you tried in the canary orchestrator?
    It seems to me that it is a problem with this particular blueprint
    Pablo Díaz
    @pdiaz123
    Hi @all, we need to stop the VM which contains the orchestrator for maintenance, so please do not send new jobs through the portal until tomorrow at 10 am. The stop is scheduled for tomorrow July 6th from 8am to 10am. Jobs currently queued will end before this time (I expect)
    Atgeirr Flø Rasmussen
    @atgeirr
    I still have some tiny jobs pending, that were supposed to verify that the dev-portal was working. Killing them is ok.
    Pablo Díaz
    @pdiaz123
    The orchestrator is up&running again
    @atgeirr, your jobs are still in pending state
    Christophe Prud'homme
    @prudhomm
    @victorsndvg where does the logging/monitoring feature stand ?
    victorsndvg
    @victorsndvg
    I think the integration into the portal is planned for this week