Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
  • Jan 31 2019 20:11
    ha0ye edited #1225
  • Jan 31 2019 20:11
    ha0ye opened #1225
  • Jan 29 2019 22:19

    henrykironde on master

    Changes to fix new breaks in te… Merge pull request #1224 from z… (compare)

  • Jan 29 2019 22:19
    henrykironde closed #1224
  • Jan 29 2019 22:00
    zhangcandrew opened #1224
  • Jan 29 2019 19:51

    zhangcandrew on testChanges

    (compare)

  • Jan 29 2019 19:51

    zhangcandrew on md5test

    (compare)

  • Jan 29 2019 19:40

    zhangcandrew on md5test

    md5test so we don't lose it (compare)

  • Jan 29 2019 19:39

    zhangcandrew on testChanges

    Changes to fix new breaks in te… (compare)

  • Jan 28 2019 14:05
  • Jan 25 2019 22:37
  • Jan 25 2019 06:17
  • Jan 25 2019 02:44
    amanjain25 commented #1223
  • Jan 24 2019 23:29
  • Jan 24 2019 21:39
    henrykironde edited #1223
  • Jan 24 2019 21:24
    henrykironde edited #1223
  • Jan 24 2019 21:23
    henrykironde edited #1223
  • Jan 24 2019 21:23
    henrykironde edited #1223
  • Jan 24 2019 21:23
    henrykironde labeled #1223
  • Jan 24 2019 21:23
    henrykironde opened #1223
Shivam
@andy6975

Hi @henrykironde , I am sorry for replying so late. I had my mid sem exams.
I have the required servers up in my system along with the required modules of Python-3. Although, I have the following doubt:

  • I didn't understand the point of MSAccess server. I work on Ubuntu, so ins't it same as LibreOffice in this case?

Kindly guide with the proceedings. Thank you so much!

henry senyondo
@henrykironde
Thanks for the reply. The MSAccess server is only for windows.
henry senyondo
@henrykironde
The next step would be trying to understand how we create scripts. We do have retriever autocreate function that helps us to create scripts templates. We then manually edit the information in the scripts. To have a good feel of the function, please try to download this file https://data.boston.gov/dataset/city-hall-electricity-usage and keep it in a clean directory boston_cityhall_electricity and run retriever autocreate -d [path to dir]
Once you clean it up, create a pull request to https://github.com/weecology/retriever-recipes
Shivam
@andy6975
Hi @henrykironde , I was installing retreiver when I came across an issue. It was using dist-packages of Python 3.5, when I want it to use Python 3.7 (default python in my system). I tried looking for installation path in setup.py but couldn't find it. Is there any distutils file? Or if you could just tell me where to fix it. Thanks!
henry senyondo
@henrykironde
It looks like you may have the paths set pointing to the dist-packages of Python 3.5. Could you provide the results of which retriever and follow that with which python and also whereis python
Shivam
@andy6975
Okay! I have solved the problem.
Shivam
@andy6975
Also, I have created a pull requested as per the instructions. Thanks!
Saanidhya
@Saanidhyavats
Hi everyone! I am Saanidhya vats new to open source, can anyone guide me on how to start contributing
henry senyondo
@henrykironde
Hi @Saanidhyavats thanks for the interest the Data Retriever project as a starting point to open source. We do request that you read our code of conduct first. https://retriever.readthedocs.io/en/latest/code_of_conduct.html
I do recommend that you clone the repo, and set up the servers https://retriever.readthedocs.io/en/latest/developer.html#developer-s-guide
When you are done let me know. If you find any problem with the docs, open up and issue and I will help you with that. https://github.com/weecology/retriever/issues/new
Let me know when you are done.
Ethan White
@ethanwhite
[Ethan White, weecology] New renv and it's Python integration may be useful for rdataretriever: https://blog.rstudio.com/2019/11/06/renv-project-environments-for-r/
Jayce
@haoranjaycewang
Hi Everyone! My name is Jayce and I am currently a second year engineering science + prospective machine learning student at the University of Toronto. I really wanted to get involved in Data Retriever and begin my journey in open source programming through GSoC 2020 as I can see Data Retriever as an incredibly useful tool that I will probably use in the future! I'm currently in the process of setting up my servers (already cloned the repo) and I just wanted to confirm that I do not need to install MSAccess if I'm on Ubuntu from what I've read before? Or is there more to it?
henry senyondo
@henrykironde
Hi @haoranjaycewang , we are delighted to hear that you are interested in open source and more especially the Data Retriever. If you are using linux, you don't need MSAccess. Let me know in case anything is not working fine. Feel free to open up an issue for any problem you encounter.
Jayce
@haoranjaycewang
Hi @henrykironde, following the Developer's Guide (https://retriever.readthedocs.io/en/latest/developer.html#developer-s-guide) I have completed installing the required modules (setuptools, xlrd, Sphinx) and the required database infrastructures + their respective modules as per the setting up servers sections of that page, would the next step be to continue following that page to the Testing section or do you suggest alternate next steps to allow me to better understand the codebase and gain familiarity with Data Retriever
henry senyondo
@henrykironde
@haoranjaycewang , Try running the test. i.e pytest -v. Once the test run fine, you could try creating a new script. We use retriever autocreate -f <path-to-a-csv/tab file> or retriever autocreate -d <path-to-a directory having csv/tab files to come up with a script. The best way is to test that on the wine quality dataset. Download the data to directory http://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/ and run autocreate. Cross check with the script that we have https://github.com/weecology/retriever-recipes/blob/master/scripts/wine_quality.json
henry senyondo
@henrykironde
We store the scripts in the https://github.com/weecology/retriever-recipes repository. @haoranjaycewang , once you are done trying with the wine dataset, you could create scripts with any open public datasets datasets. Currently we do not have simple datasets that you could start with, so we are welcome to look at this dataset https://github.com/emorisse/FBI-Hate-Crime-Statistics/tree/master/2013
henry senyondo
@henrykironde
@haoranjaycewang I have added some data request for you weecology/retriever#1402
Jayce
@haoranjaycewang
Ok so when i first tried to enter pytest -v i got an error saying i dont have it installed, so I installed it, and then tried running pytest -v in the retriever directory, however i got a bunch of errors:
============================= test session starts ==============================
platform linux2 -- Python 2.7.15+, pytest-3.3.2, py-1.5.2, pluggy-0.6.0 -- /usr/bin/python2
rootdir: /home/jaycewang/retriever, inifile: setup.cfg
collected 2 items / 5 errors
and each of the 5 errors start with ERROR collecting test/test_integration.py , ERROR collecting test/test_modified_scripts.py ...etc
however i noticed that the traceback simply reports that there was an ImportError:cannot import name timezone for all of the 5 errors
both my python --version and python3 --version give me 3.6.8
henry senyondo
@henrykironde
I think you have Python 2.7.15 in your path and pytest seems to be under Python 2.7.15 . timezone is a newer addition to Python and if you are using 3.6.8, it should be included.
Jayce
@haoranjaycewang
@henrykironde I have managed to setup an environment with pytest 5.x.x and python 3+, however when I try to run pytest -v in the retriever directory, I get permission denied errors as the test_.../py files try to run mkdir, however when I run sudo pytest -v within the same virtual environment it reverts back to python2 and pytest3 which i realize is outdated.
============================= test session starts ==============================
platform linux -- Python 3.6.8, pytest-5.3.2, py-1.8.1, pluggy-0.13.1 -- /usr/bin/python3
rootdir: /home/jaycewang/retriever, inifile: setup.cfg
collected 2 items / 5 errors                                                   

==================================== ERRORS ====================================
__________________ ERROR collecting test/test_integration.py ___________________
test/test_integration.py:12: in <module>
    from retriever.lib.defaults import ENCODING, DATA_DIR
../.local/lib/python3.6/site-packages/retriever/__init__.py:10: in <module>
    create_home_dir()
../.local/lib/python3.6/site-packages/retriever/lib/engine_tools.py:45: in create_home_dir
    os.makedirs(dir)
/usr/lib/python3.6/os.py:220: in makedirs
    mkdir(name, mode)
E   PermissionError: [Errno 13] Permission denied: '/home/jaycewang/.retriever/raw_data'
------------------------------- Captured stdout --------------------------------
The Retriever lacks permission to access the ~/.retriever/ directory.
________________ ERROR collecting test/test_modified_scripts.py ________________
ImportError while importing test module '/home/jaycewang/retriever/test/test_modified_scripts.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
test/test_modified_scripts.py:16: in <module>
    from retriever.lib.scripts import SCRIPT_LIST, get_retriever_script_versions
E   ImportError: cannot import name 'get_retriever_script_versions'
___________________ ERROR collecting test/test_provenance.py ___________________
ImportError while importing test module '/home/jaycewang/retriever/test/test_provenance.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
test/test_provenance.py:7: in <module>
    from retriever.lib.defaults import ENCODING, RETRIEVER_REPOSITORY
E   ImportError: cannot import name 'RETRIEVER_REPOSITORY'
___________________ ERROR collecting test/test_regression.py ___________________
test/test_regression.py:13: in <module>
    import retriever as rt
../.local/lib/python3.6/site-packages/retriever/__init__.py:10: in <module>
    create_home_dir()
../.local/lib/python3.6/site-packages/retriever/lib/engine_tools.py:45: in create_home_dir
    os.makedirs(dir)
/usr/lib/python3.6/os.py:220: in makedirs
    mkdir(name, mode)
E   PermissionError: [Errno 13] Permission denied: '/home/jaycewang/.retriever/raw_data'
------------------------------- Captured stdout --------------------------------
The Retriever lacks permission to access the ~/.retriever/ directory.
___________________ ERROR collecting test/test_retriever.py ____________________
test/test_retriever.py:10: in <module>
    import retriever as rt
../.local/lib/python3.6/site-packages/retriever/__init__.py:10: in <module>
    create_home_dir()
../.local/lib/python3.6/site-packages/retriever/lib/engine_tools.py:45: in create_home_dir
    os.makedirs(dir)
/usr/lib/python3.6/os.py:220: in makedirs
    mkdir(name, mode)
E   PermissionError: [Errno 13] Permission denied: '/home/jaycewang/.retriever/raw_data'
henry senyondo
@henrykironde
At this stage if you are not using python2 for any current project, I would recommend that you remove it. And for the permission issue, you may need to reinstall the packages using admin and give rights to all users of those packages or run with Sudo
Otherwise using docker is also another way to test locally.
Jayce
@haoranjaycewang
@henrykironde Ok, I have given permission to the directory to be accessed, now I am encountering new import errors:
```
=========================================================================================== test session starts ============================================================================================
platform linux -- Python 3.6.8, pytest-5.3.2, py-1.8.1, pluggy-0.13.1 -- /usr/bin/python3
rootdir: /home/jaycewang/retriever, inifile: setup.cfg
collected 102 items / 3 errors / 99 selected                                                                                                                                                               

================================================================================================== ERRORS ==================================================================================================
______________________________________________________________________________ ERROR collecting test/test_modified_scripts.py ______________________________________________________________________________
ImportError while importing test module '/home/jaycewang/retriever/test/test_modified_scripts.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
test/test_modified_scripts.py:16: in <module>
    from retriever.lib.scripts import SCRIPT_LIST, get_retriever_script_versions
E   ImportError: cannot import name 'get_retriever_script_versions'
_________________________________________________________________________________ ERROR collecting test/test_provenance.py _________________________________________________________________________________
ImportError while importing test module '/home/jaycewang/retriever/test/test_provenance.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
test/test_provenance.py:7: in <module>
    from retriever.lib.defaults import ENCODING, RETRIEVER_REPOSITORY
E   ImportError: cannot import name 'RETRIEVER_REPOSITORY'
_________________________________________________________________________________ ERROR collecting test/test_retriever.py __________________________________________________________________________________
ImportError while importing test module '/home/jaycewang/retriever/test/test_retriever.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
test/test_retriever.py:23: in <module>
    from retriever.lib.defaults import HOME_DIR, RETRIEVER_DATASETS, RETRIEVER_REPOSITORY
E   ImportError: cannot import name 'RETRIEVER_DATASETS'
============================================================================================= warnings summary =============================================================================================
/home/jaycewang/.local/lib/python3.6/site-packages/future/standard_library/__init__.py:65
  /home/jaycewang/.local/lib/python3.6/site-packages/future/standard_library/__init__.py:65: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
    import imp
henry senyondo
@henrykironde
Give me results for these commands which python , which retriever, which pytest, whereis python , ls -al /usr/bin/python and ls -al /usr/bin/python3
Jayce
@haoranjaycewang
(retriever_env) jaycewang@jaycewang-UX490UA:~/retriever$ which retriever
/home/jaycewang/retriever_env/bin/retriever
(retriever_env) jaycewang@jaycewang-UX490UA:~/retriever$ which pytest
/home/jaycewang/.local/bin/pytest
(retriever_env) jaycewang@jaycewang-UX490UA:~/retriever$ whereis python
python: /usr/bin/python3.6 /usr/bin/python /usr/bin/python2.7-config /usr/bin/python3.6m /usr/bin/python3.6m-config /usr/bin/python3.6-config /usr/bin/python2.7 /usr/lib/python3.7 /usr/lib/python3.6 /usr/lib/python2.7 /etc/python3.6 /etc/python /etc/python2.7 /usr/local/lib/python3.6 /usr/local/lib/python2.7 /usr/include/python3.6 /usr/include/python2.7_d /usr/include/python3.6m /usr/include/python2.7 /usr/share/python /home/jaycewang/retriever_env/bin/python /home/jaycewang/retriever_env/bin/python3.6 /usr/share/man/man1/python.1.gz
(retriever_env) jaycewang@jaycewang-UX490UA:~/retriever$ ls -al /usr/bin/python
lrwxrwxrwx 1 root root 24 Jan  1 14:04 /usr/bin/python -> /etc/alternatives/python
(retriever_env) jaycewang@jaycewang-UX490UA:~/retriever$ ls -al /usr/bin/python3
lrwxrwxrwx 1 root root 9 Sep 29 14:28 /usr/bin/python3 -> python3.6
(retriever_env) jaycewang@jaycewang-UX490UA:~/retriever$ which python
/home/jaycewang/retriever_env/bin/python
(retriever_env) jaycewang@jaycewang-UX490UA:~/retriever$
henry senyondo
@henrykironde
@haoranjaycewang, you python setup has some problems, Looks like you are exporting a lot of python things in your ~/.profile, or ~/.bash_profile.
(retriever_env) jaycewang@jaycewang-UX490UA:~/retriever$ whereis python
python: /usr/bin/python3.6 /usr/bin/python /usr/bin/python2.7-config /usr/bin/python3.6m /usr/bin/python3.6m-config /usr/bin/python3.6-config /usr/bin/python2.7 /usr/lib/python3.7 /usr/lib/python3.6 /usr/lib/python2.7 /etc/python3.6 /etc/python /etc/python2.7 /usr/local/lib/python3.6 /usr/local/lib/python2.7 /usr/include/python3.6 /usr/include/python2.7_d /usr/include/python3.6m /usr/include/python2.7 /usr/share/python /home/jaycewang/retriever_env/bin/python /home/jaycewang/retriever_env/bin/python3.6 /usr/share/man/man1/python.1.gz
you should remove all old python version, and if not use export PYTHON=pathtothecorrect python
Jayce
@haoranjaycewang
ok i will do a clean reinstall of ubuntu since i have not used ubuntu in a while and reinstall python with $ sudo apt-get update $ sudo apt-get install python3.6
would that be ok?
i am not too familiar with exporting things and how that works in linux
henry senyondo
@henrykironde
Great, so I would recommend that you learn how that works, instead of reinstalling.
Mayurdeep Pathak
@pathak-mayurdeep
Hello everyone! I'm Mayurdeep and I am a 4th year student of Computer Sc. Engg. at Jorhat Engineering College. I have been using Data Retriever in my final year project and I'd love to contribute in some way (bug fixes- if any?). Any suggestions..?
henry senyondo
@henrykironde
@pathak-mayurdeep we are glad to hear that you are using the Data Retriever for you final project. I do recommend that you start by looking at the issues with the tag getting started, https://github.com/weecology/retriever/issues?q=is%3Aissue+is%3Aopen+label%3Agetting-started. Let me know in case you need any help.
veb7vmehra
@veb7vmehra
Hi Everyone, I am Vaibhav Mehra. I am quite interested in data analysis and want to contribute to this organization. I will love to contribute to the project named "Add support for more raw data formats" so @henrykironde and @ethanwhite, can you please tell me about any prerequisite required.
henry senyondo
@henrykironde
Hi @veb7vmehra the team at the Data Retriever is happy to hear that you are interested in taking part on the project, Add support for more raw data formats. Please take a look at the required skill section https://github.com/weecology/retriever/wiki/GSoC-2020-Project-Ideas#degree-of-difficulty-and-needed-skills. We expect students to have contributed to the project extensively before GSoC. A good start point would be looking at issues marked getting started. We also appreciate and welcome new ideas, these can be added to the issues.
Ashish Priyadarshi
@ashishpriyadarshiCIC
Hello everyone, I'm Ashish Priyadarshi and I'm an undergraduate student at Cluster Innovation Centre ,University of Delhi pursuing a B.Tech degree in ITMI(Information Technology and Mathematical innovation). I got familiar with Data Retriever last year through the help of @ethanwhite and @henrykironde but I couldn't continue to work on it further due to some reasons. I'm interested in working on "Data Retriever: Add support for more raw data formats" . Can someone please guide me towards getting familiar with Data retriever and working on the project?
pranu2502
@pranu2502
Hello everyone, I am Pranav, I am a computer science undergraduate at International Institute of Information technology, Bangalore. I would love to contribute in any way possible. So can anyone please suggest mt where to start
henry senyondo
@henrykironde
Hi @ashishpriyadarshiCIC and @pranu2502 thank you for the interest in the Data Retriever project. We do have the developers docs which is a good start especially for @pranu2502 . Set up the environment and try a few commands. In order to get familiar with the project , I would recommend starting with the adding of scripts to the Data Retriever(https://github.com/weecology/retriever/issues?q=is%3Aissue+is%3Aopen+label%3A%22Dataset+Request%22) or looking at issues marked getting started.
At any point, feel free to open up an issue for improvement or a new idea that you would want to work on.
Ashish Priyadarshi
@ashishpriyadarshiCIC
Thanks, I'll start working on it right away
Ashish Priyadarshi
@ashishpriyadarshiCIC
@henrykironde I had an issue while reinstalling, it showed the error that :-
mysql: [ERROR] Found option without preceding group in config file /home/ashish/.my.cnf at line 1!
mysql: [ERROR] Fatal error in defaults handling. Program aborted!
image.png
This is the .my.cnf file
henry senyondo
@henrykironde
Can you try to run with options parsed retriever install mysql iris -u travis -p pas.... --port .... --host ....