Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
    Hoja
    @Guptajakala
    [10/23/2019, 10:59:46 PM] DEBUG [ 'Database directory: /home/X/nni/SJ0LBrL3/db' ]
    [10/23/2019, 10:59:47 PM] INFO [ 'Datastore initialization done' ]
    [10/23/2019, 10:59:47 PM] INFO [ 'Rest server listening on: http://0.0.0.0:8080' ]
    [10/23/2019, 10:59:47 PM] INFO [ 'RestServer start' ]
    [10/23/2019, 10:59:47 PM] INFO [ 'Construct remote machine training service.' ]
    [10/23/2019, 10:59:47 PM] INFO [ 'RestServer base port is 8080' ]
    [10/23/2019, 10:59:49 PM] DEBUG [ 'GET: /check-status: body:\n{}' ]
    [10/23/2019, 10:59:49 PM] DEBUG [ 'PUT: /experiment/cluster-metadata: body:\n{\n "machine_list": [\n {\n "ip": "128.6.4.101",\n "port": 22,\n "username": "X",\n "passwd": "XXXX",\n "gpuIndices": "0,1,2,3",\n "maxTrialNumPerGpu": 1,\n "useActiveGpu": true\n },\n {\n "ip": "128.6.4.103",\n "port": 22,\n "username": "X",\n "passwd": "XXXX",\n "gpuIndices": "0,1,2,3",\n "maxTrialNumPerGpu": 1,\n "useActiveGpu": true\n }\n ]\n}' ]
    [10/23/2019, 10:59:49 PM] INFO [ 'NNIManager setClusterMetadata, key: machine_list, value: [{"ip":"128.6.4.101","port":22,"username":"X","passwd":"XXXX","gpuIndices":"0,1,2,3","maxTrialNumPerGpu":1,"useActiveGpu":true},{"ip":"128.6.4.103","port":22,"username":"X","passwd":"XXXX","gpuIndices":"0,1,2,3","maxTrialNumPerGpu":1,"useActiveGpu":true}]' ]
    [10/23/2019, 10:59:49 PM] DEBUG [ '------------------training service try keyboard-interactive mode---------------' ]
    [10/23/2019, 10:59:49 PM] DEBUG [ '------------------training service 336---------------' ]
    [10/23/2019, 10:59:49 PM] DEBUG [ 'Connecting to remote machines: [{"ip":"128.6.4.101","port":22,"username":"X","passwd":"XXXX","gpuIndices":"0,1,2,3","maxTrialNumPerGpu":1,"useActiveGpu":true},{"ip":"128.6.4.103","port":22,"username":"X","passwd":"XXXX","gpuIndices":"0,1,2,3","maxTrialNumPerGpu":1,"useActiveGpu":true}]' ]
    [10/23/2019, 10:59:49 PM] DEBUG [ '-------------------in trainingservice 466------------------' ]
    [10/23/2019, 10:59:49 PM] DEBUG [ '-------------------in trainingservice 471------------------' ]
    [10/23/2019, 10:59:49 PM] DEBUG [ '-------------------in trainingservice 471------------------' ]
    [10/23/2019, 10:59:49 PM] DEBUG [ '--------------ssh client--147------------' ]
    [10/23/2019, 10:59:49 PM] DEBUG [ '--------------ssh client--147------------' ]
    [10/23/2019, 10:59:49 PM] DEBUG [ '------------ssh client----158------------' ]
    [10/23/2019, 10:59:49 PM] DEBUG [ '-----------start to initialize client-----------' ]
    [10/23/2019, 10:59:49 PM] DEBUG [ '128.6.4.101' ]
    [10/23/2019, 10:59:49 PM] DEBUG [ 22 ]
    [10/23/2019, 10:59:49 PM] DEBUG [ 'X' ]
    [10/23/2019, 10:59:49 PM] DEBUG [ '------------ssh client----158------------' ]
    [10/23/2019, 10:59:49 PM] DEBUG [ '-----------start to initialize client-----------' ]
    [10/23/2019, 10:59:49 PM] DEBUG [ '128.6.4.103' ]
    [10/23/2019, 10:59:49 PM] DEBUG [ 22 ]
    [10/23/2019, 10:59:49 PM] DEBUG [ 'X' ]
    [10/23/2019, 10:59:49 PM] DEBUG [ '---------------ssh client keyboard-interactive-----------' ]
    [10/23/2019, 10:59:49 PM] DEBUG [ '---------------ssh client keyboard-interactive-----------' ]
    [10/23/2019, 10:59:51 PM] DEBUG [ '---------------ssh client error-----------' ]
    [10/23/2019, 10:59:51 PM] DEBUG [ { Error: All configured authentication methods failed
    at tryNextAuth (/home/X/anaconda3/envs/X/nni/node_modules/ssh2/lib/client.js:392:17)
    at SSH2Stream.onUSERAUTH_FAILURE (/home/X/anaconda3/envs/X/nni/node_modules/ssh2/lib/client.js:599:5)
    at SSH2Stream.emit (events.js:182:13)
    at parsePacket (/home/X/anaconda3/envs/X/nni/node_modules/ssh2-streams/lib/ssh.js:3930:10)
    at SSH2Stream._transform (/home/X/anaconda3/envs/X/nni/node_modules/ssh2-streams/lib/ssh.js:671:13)
    at SSH2Stream.Transform._read (_stream_transform.js:190:10)
    at SSH2Stream._read (/home/X/anaconda3/envs/X/nni/node_modules/ssh2-streams/lib/ssh.js:253:15)
    at SSH2Stream.Transform._write (_stream_transform.js:178:12)
    SparkSnail
    @SparkSnail
    hi, updated the code, may fix this error. Could you please try git clone -b dev-debug-ssh https://github.com/SparkSnail/nni again? Thanks!!
    Hoja
    @Guptajakala
    sure
    QuanluZhang
    @QuanluZhang
    @Guptajakala nnimanager.log will contain your password, please remove them before posting
    just for you to check whether the logged password is correct or not
    Hoja
    @Guptajakala
    thanks for reminding!
    [10/23/2019, 11:18:14 PM] DEBUG [ 'Database directory: /home/XXX/nni/gD5ELk5w/db' ]
    [10/23/2019, 11:18:15 PM] INFO [ 'Datastore initialization done' ]
    [10/23/2019, 11:18:15 PM] INFO [ 'Rest server listening on: http://0.0.0.0:8080' ]
    [10/23/2019, 11:18:15 PM] INFO [ 'RestServer start' ]
    [10/23/2019, 11:18:15 PM] INFO [ 'Construct remote machine training service.' ]
    [10/23/2019, 11:18:15 PM] INFO [ 'RestServer base port is 8080' ]
    [10/23/2019, 11:18:17 PM] DEBUG [ 'GET: /check-status: body:\n{}' ]
    [10/23/2019, 11:18:17 PM] DEBUG [ 'PUT: /experiment/cluster-metadata: body:\n{\n "machine_list": [\n {\n "ip": "128.6.4.101",\n "port": 22,\n "username": "XXX",\n "passwd": "XXX!",\n "gpuIndices": "0,1,2,3",\n "maxTrialNumPerGpu": 1,\n "useActiveGpu": true\n },\n {\n "ip": "128.6.4.103",\n "port": 22,\n "username": "XXX",\n "passwd": "XXX!",\n "gpuIndices": "0,1,2,3",\n "maxTrialNumPerGpu": 1,\n "useActiveGpu": true\n }\n ]\n}' ]
    [10/23/2019, 11:18:17 PM] INFO [ 'NNIManager setClusterMetadata, key: machine_list, value: [{"ip":"128.6.4.101","port":22,"username":"XXX","passwd":"XXX!","gpuIndices":"0,1,2,3","maxTrialNumPerGpu":1,"useActiveGpu":true},{"ip":"128.6.4.103","port":22,"username":"XXX","passwd":"XXX!","gpuIndices":"0,1,2,3","maxTrialNumPerGpu":1,"useActiveGpu":true}]' ]
    [10/23/2019, 11:18:17 PM] DEBUG [ '------------------training service try keyboard-interactive mode---------------' ]
    [10/23/2019, 11:18:17 PM] DEBUG [ '------------------training service 336---------------' ]
    [10/23/2019, 11:18:17 PM] DEBUG [ 'Connecting to remote machines: [{"ip":"128.6.4.101","port":22,"username":"XXX","passwd":"XXX!","gpuIndices":"0,1,2,3","maxTrialNumPerGpu":1,"useActiveGpu":true},{"ip":"128.6.4.103","port":22,"username":"XXX","passwd":"XXX!","gpuIndices":"0,1,2,3","maxTrialNumPerGpu":1,"useActiveGpu":true}]' ]
    [10/23/2019, 11:18:17 PM] DEBUG [ '-------------------in trainingservice 466------------------' ]
    [10/23/2019, 11:18:17 PM] DEBUG [ '-------------------in trainingservice 471------------------' ]
    [10/23/2019, 11:18:17 PM] DEBUG [ '-------------------in trainingservice 471------------------' ]
    [10/23/2019, 11:18:17 PM] DEBUG [ '--------------ssh client--147------------' ]
    [10/23/2019, 11:18:17 PM] DEBUG [ '--------------ssh client--147------------' ]
    [10/23/2019, 11:18:17 PM] DEBUG [ '------------ssh client----158------------' ]
    [10/23/2019, 11:18:17 PM] DEBUG [ '-----------start to initialize client-----------' ]
    [10/23/2019, 11:18:17 PM] DEBUG [ '128.6.4.101' ]
    [10/23/2019, 11:18:17 PM] DEBUG [ 22 ]
    [10/23/2019, 11:18:17 PM] DEBUG [ 'XXX' ]
    [10/23/2019, 11:18:17 PM] DEBUG [ '------------ssh client----158------------' ]
    [10/23/2019, 11:18:17 PM] DEBUG [ '-----------start to initialize client-----------' ]
    [10/23/2019, 11:18:17 PM] DEBUG [ '128.6.4.103' ]
    [10/23/2019, 11:18:17 PM] DEBUG [ 22 ]
    [10/23/2019, 11:18:17 PM] DEBUG [ 'XXX' ]
    [10/23/2019, 11:18:17 PM] DEBUG [ '---------------ssh client keyboard-interactive2-----------' ]
    [10/23/2019, 11:18:17 PM] DEBUG [ 'XXX!' ]
    [10/23/2019, 11:18:17 PM] DEBUG [ '---------------ssh client keyboard-interactive2-----------' ]
    [10/23/2019, 11:18:17 PM] DEBUG [ 'XXX!' ]
    [10/23/2019, 11:18:17 PM] DEBUG [ '--------------------initialize client success----------------' ]
    [10/23/2019, 11:18:17 PM] DEBUG [ '-------------------in trainingservice 473------------------' ]
    [10/23/2019, 11:18:17 PM] DEBUG [ '-------------------in trainingservice 484------------------' ]
    [10/23/2019, 11:18:17 PM] DEBUG [ 'remoteExeCommand: command: [mkdir -p /tmp/nni/experiments/gD5ELk5w]' ]
    [10/23/2019, 11:18:17 PM] DEBUG [ '--------------------initialize client success----------------' ]
    [10/23/2019, 11:18:17 PM] DEBUG [ '-------------------in trainingservice 473------------------' ]
    [10/23/2019, 11:18:17 PM] DEBUG [ '-------------------in trainingservice 484------------------' ]
    [10/23/2019, 11:18:17 PM] DEBUG [ 'remoteExeCommand
    QuanluZhang
    @QuanluZhang
    great, it works, right?
    Hoja
    @Guptajakala
    INFO: expand searchSpacePath: search_space.json to /home/X/debug/nni_remote_test/search_space.json
    INFO: expand codeDir: . to /home/X/debug/nni_remote_test/.
    INFO: Starting restful server...
    INFO: Successfully started Restful server!
    INFO: Setting remote config...
    Error message is {"error":"Error: Validate file name error: Error: file name in /home/X/debug/nni_remote_test/node_modules/console-control-strings/README.md~ is not valid!"}
    ERROR: Failed! Error is:
    SparkSnail
    @SparkSnail
    the file name in your codeDir is not valid
    NNI does not support ~ in file name
    Hoja
    @Guptajakala
    command: source activate bowen && python main.py
    codeDir: .
    this is my setting. I didn't use ~
    SparkSnail
    @SparkSnail
    /home/X/debug/nni_remote_test/node_modules/console-control-strings/README.md~
    Hoja
    @Guptajakala
    which step is using that file?
    image.png
    there is no README.md~
    SparkSnail
    @SparkSnail
    INFO: expand codeDir: . to /home/X/debug/nni_remote_test/.
    your code dir contains this file, NNI will upload your codeDir folder to remote machine
    if you do not need this file, you could delete it, or rename it
    Hoja
    @Guptajakala
    ok, just found its hided...
    SparkSnail
    @SparkSnail
    ps: why do your codeDir folder contains node_modules folder
    it seems unuseful
    Hoja
    @Guptajakala
    got it. Looks like now its working fine!
    SparkSnail
    @SparkSnail
    great!
    Hoja
    @Guptajakala
    Yeah, i deleted it. So is the reason of key-interaction?
    SparkSnail
    @SparkSnail
    yes, seems to be
    Hoja
    @Guptajakala
    thank you so much
    Will this patch be merged into main branch at some point?
    SparkSnail
    @SparkSnail
    sure, I will give a pr to fix this issue
    Hoja
    @Guptajakala
    great, I will close the issue for now.
    QuanluZhang
    @QuanluZhang
    please kindly report the reason before closing
    Hoja
    @Guptajakala
    sure
    SparkSnail
    @SparkSnail
    thanks for your effort on debugging
    Hoja
    @Guptajakala
    thank you for the support!
    Signing off...
    Ce Gao
    @gaocegege
    Hi I have a question about NNI's IR. Can it support graph which contains control flow (e.g. RNN)?
    apatsekin
    @apatsekin
    Hey guys! Would really appreciate if you look into this one: microsoft/nni#1620 .
    SparkSnail
    @SparkSnail
    @stone-doc hello
    xuehui
    @xuehui1991
    @gaocegege hello. For now, it cannot support it.
    And the link you mentioned is the old version. You could check the new NAS interface
    xuehui
    @xuehui1991
    I think the support the control flow(e.g. RNN) is a difficult thing, especially they will consider the combination with others ops, like CNN and attention. The shape fo the subgraph that sampling from some NAS algorithm may mismatch. (Personal commetns)
    Ce Gao
    @gaocegege
    @xuehui1991 Thanks for your reply, SGTM.
    BTW, the new NAS interface and feature are really awesome.
    lastrei
    @lastrei
    hi
    hi some one can help microsoft/nni#1938
    image.png
    Scarlett Li
    @scarlett2018
    @lastrei - thanks for raising it, nni dev will take a look in their earliest convenience.
    lastrei
    @lastrei
    @scarlett2018 thank you so much
    lastrei
    @lastrei
    and buy the way
    i set the max_bin "max_bin":{"_type":"randint","_value":[300,600]},
    but in the log i see this "max_bin": 833, why its >600?