Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
    William Huhn
    @william_p_huhn_twitter
    So, the $XXX dollar question: how do we estimate how much money using AWS ParallelCluster will cost us?
    1 reply
    Sean Smith
    @sean-smith
    image.png
    @jayltee1_gitlab ^
    David Leonard
    @dbl4git
    TeamRole:~/environment $ # upload to your bucket
    TeamRole:~/environment $ aws s3 cp s3dkq4m2.mtx.gz s3://mybucket-${BUCKET_POSTFIX}/s3dkq4m2.mtx.gz
    upload failed: ./s3dkq4m2.mtx.gz to s3://mybucket-87da0bb5/s3dkq4m2.mtx.gz An error occurred (ExpiredToken) when calling the CreateMultipartUpload operation: The provided token has expired.
    TeamRole:~/environment $ aws s3 cp SEG_C3NA_Velocity.sgy s3://mybucket-${BUCKET_POSTFIX}/SEG_C3NA_Velocity.sgy
    upload failed: ./SEG_C3NA_Velocity.sgy to s3://mybucket-87da0bb5/SEG_C3NA_Velocity.sgy An error occurred (ExpiredToken) when calling the CreateMultipartUpload operation: The provided token has expired.
    TeamRole:~/environment $
    5 replies
    William Huhn
    @william_p_huhn_twitter
    How do we access the various environment variables we need to re-provide our token? They're no longer showing up when I provide my hash to EventEngine
    David Leonard
    @dbl4git
    Any answer to my error?
    Jason Taylor
    @jayltee1_gitlab
    image.png
    Open a new terminal. Copy the variables from your dashboard.
    @william_p_huhn_twitter - I had to click the link from yesterday's email again to get back in properly. But I still hit the error that @dbl4git has. Doing the above resolved it for me.
    William Huhn
    @william_p_huhn_twitter
    @jayltee1_gitlab Yeah, that also worked for me.
    cb4github
    @cb4github
    How to find my "S3 Dashboard" from my AWS Cloud9 web page?
    3 replies
    Pierre-Yves
    @perifaws

    Any answer to my error?

    Hi David, Sean posted a solution in reply to the question above.

    Diana Guttman
    @dianarg
    I am stuck on step i (curl POST) from yesterday's tutorial. It returns {"message": "Internal server error"}
    10 replies
    Tina Odaka
    @tinaok
    Hi, i’m at "Status: ComputeFleetHITSubstack - CREATE_IN_PROGRESS « and I would like to know how i can monitor that whats going on other than just waiting?
    2 replies
    RamNagappan
    @RamNagappan
    TeamRole:~/environment $ pcluster create my-fsx-cluster -c my-fsx-cluster.ini
    Beginning cluster creation for cluster: my-fsx-cluster
    Creating stack named: parallelcluster-my-fsx-cluster
    Status: RootInstanceProfile - CREATE_COMPLETE
    Status: MasterServerSubstack - CREATE_IN_PROGRESS ERROR: Could not retrieve CloudFormation stack data. Failed with error: The security token included in the request is expired
    TeamRole:~/environment $
    4 replies
    Tony Kew
    @tonykew
    A note regarding yesterdays Lab 2
    In section b "Build an IAM Policy with Cloudformation" the right arrow does not go the next item, but jumps to section f "Configure the function"
    1 reply
    Pierre-Yves
    @perifaws

    TeamRole:~/environment $ pcluster create my-fsx-cluster -c my-fsx-cluster.ini
    Beginning cluster creation for cluster: my-fsx-cluster
    Creating stack named: parallelcluster-my-fsx-cluster
    Status: RootInstanceProfile - CREATE_COMPLETE
    Status: MasterServerSubstack - CREATE_IN_PROGRESS ERROR: Could not retrieve CloudFormation stack data. Failed with error: The security token included in the request is expired
    TeamRole:~/environment $

    Hello, the keys are rotated by Cloud9 and ParallelCluster keeps an old copy in memory. You can check the status of the cluster creation using the command pcluster status my-fsx-cluster or check the Stack being created by CloudFormation on this page: https://console.aws.amazon.com/cloudformation/home?region=us-east-1

    7 replies

    A note regarding yesterdays Lab 2
    In section b "Build an IAM Policy with Cloudformation" the right arrow does not go the next item, but jumps to section f "Configure the function"

    Hi Tony, this is caused by Hugo, we'll find a workaround. The best is to browse sections on the left menu.

    lopl1360
    @lopl1360
    How long does generating the cluster take?
    Sean Smith
    @sean-smith

    How long does generating the cluster take?

    15-20 mins

    get some ☕️ while you wait
    Pierre-Yves
    @perifaws
    image.png
    William Huhn
    @william_p_huhn_twitter
    I'm a little unclear on the data layout for section d, let me see if I got this right. So the data is initially sitting in an S3 bucket but in a "released" state in the Lustre filesystem, with only the metadata in Lustre. Then when we access the data within our cluster, it gets loaded into the Lustre filesystem and the instance cache storage. Then when we delete the cache version, it's still in the Lustre filesystem, so it's still (relatively) fast to access.
    2 replies
    Fabio Baruffa
    @fbaru-dev
    what kind of system counters can be read for monitoring the system ? thanks
    5 replies
    I also have a second question: the roofline model can be generated automatically by reading the counters or an additional tool is required? thanks
    2 replies
    William Huhn
    @william_p_huhn_twitter
    Out of curiosity, why was a c5.xlarge instance chosen for the head node in this tutorial when t2.micro is the default?
    2 replies
    aultj
    @aultj
    Hello. I was working through the storage lab 3 and I succeeded until the step with sbatch - ior is built and /lustre is mounted, but sbatch gives command not found.
    7 replies
    bsd43
    @bsd43
    I might have missed this in the talk, but what's the agent that's sending Prometheus data on the compute nodes? (Or the head node, for that matter?)
    Sean Smith
    @sean-smith
    There's node exporter which is a prometheus plugin, see https://sc20.hpcworkshops.com/10-monitoring/grafana.html
    @bsd43 ^
    1 reply
    Lily Li
    @ligit2_gitlab
    Hi, trying lab4, not able to create parallel cluster. The error is: TeamRole:~/environment $ pcluster create perflab-lily -c ~/environment/my-perf-cluster-config.ini
    Beginning cluster creation for cluster: perflab-lily
    Creating stack named: parallelcluster-perflab-lily
    Status: RootInstanceProfile - CREATE_COMPLETE
    Status: ComputeFleetHITSubstack - CREATE_IN_PROGRESS ERROR: Could not retrieve CloudFormation stack data. Failed with error: The security token included in the request is expired
    TeamRole:~/environment $
    3 replies
    kraman-aws
    @kraman-aws
    image.png
    @ligit2_gitlab see ^^
    when you login to event engine using your hash, you should see this
    Lily Li
    @ligit2_gitlab
    Got it. Thanks.
    William Huhn
    @william_p_huhn_twitter
    You're still streaming.
    1 reply
    lopl1360
    @lopl1360
    Thanks everyone for providing this presentation and your great help during labs. I really enjoyed it and learnt a lot.
    1 reply
    Pierre-Yves
    @perifaws
    Thank you @lopl1360 ! We really appreciate your feedback.
    Jason Taylor
    @jayltee1_gitlab
    In Lab 4, exercise F - the ior.sbatch script does not have the full path to ior and so you get errors and the job instantly completes. You need to add the full path in the script and then it will work.
    Pierre-Yves
    @perifaws
    Thanks @jayltee1_gitlab , we'll modify the instructions if not clear
    Jason Taylor
    @jayltee1_gitlab
    Can I just check, our accounts will be closed tomorrow but the labs will still be available? How will we get credentials to be able to re-run the lab exercises?
    1 reply
    Pierre-Yves
    @perifaws
    Hi @jayltee1_gitlab , the accounts will be closed tomorrow but the labs will be available later on (and improved). If you like we can check if someone at AWS is in contact with your institution for additional training and credits. We are seeing internally if there are other things we can do. Will keep you posted.
    2 replies
    Tina Odaka
    @tinaok

    hello, i have following err on graphana dashboards; what did I do wrong??

    SharedCredsLoad: failed to load shared credentials file caused by: FailedRead: unable to open file caused by: open /usr/share/grafana/.aws/credentials: no such file or directory
    Object
    status:400
    statusText:"Bad Request"
    data:Object
    results:Object
    message:"SharedCredsLoad: failed to load shared credentials file
    caused by: FailedRead: unable to open file
    caused by: open /usr/share/grafana/.aws/credentials: no such file or directory"
    config:Object
    method:"POST"
    url:"api/tsdb/query"
    data:Object
    retry:0
    headers:Object
    hideFromInspector:false
    message:"SharedCredsLoad: failed to load shared credentials file
    caused by: FailedRead: unable to open file
    caused by: open /usr/share/grafana/.aws/credentials: no such file or directory"

    1 reply
    Pierre-Yves
    @perifaws
    checking Tina
    kraman-aws
    @kraman-aws
    @tinaok we are able to see the dashboard on your cluster from our end
    Fabrice Cantos
    @poulacou
    could someone check why my job is not running on labs "SIMULATIONS ON AWS BATCH". thank you
    46 replies
    Foose
    @foosatraz
    I am trying to re-watch the "Best Practices for HPC in the Cloud: Part 2" but I get this error:
    image.png
    2 replies
    Yuankun Fu
    @qoofyk
    I just created a student educator account, then I want to use Cloud9 to connect to a school server. I have copied the cloud9 authorized key to the server, then when I try to create an environment to connect to the server, it shows: AWS Cloud9 couldn't connect to SSH server.
    3 replies
    Yuankun Fu
    @qoofyk
    image.png
    Hi Since the workshop has finished, I registered with a student education account and try to run the lab. Then I work on Lab 2 now and I run into a security token error at the step "G. ATTACH AN IAM ROLE TO LAMBDA - - ATTACH THE IAM POLICY TO THE ROLE". Since I am using the student education account, where can I find the "AWS_KEY" and "AWS_TOKEN"?