Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
  • May 16 2020 19:13
    oshalash38 opened #87
  • May 16 2020 15:56
    Travis cobalt-uoft/cobalt (master) broken (266)
  • May 16 2020 15:55

    qasim on master

    Add archive notice (compare)

  • May 16 2020 15:54

    qasim on master

    Add archive notice Rest in pea… (compare)

  • May 16 2020 15:52

    qasim on master

    Add archive notice (compare)

  • May 16 2020 15:47

    qasim on master

    Disassociate uoft-scrapers from… (compare)

  • Sep 03 2019 21:37
    MuradAkh commented #91
  • Aug 31 2019 07:45

    CobaltBot on master

    Export dataset: textbooks (compare)

  • Aug 24 2019 07:39

    CobaltBot on master

    Export dataset: textbooks (compare)

  • Aug 17 2019 07:31

    CobaltBot on master

    Export dataset: textbooks (compare)

  • Aug 01 2019 08:46

    CobaltBot on master

    Export dataset: shuttles (compare)

  • Aug 01 2019 08:31

    CobaltBot on master

    Export dataset: parking (compare)

  • Aug 01 2019 08:16

    CobaltBot on master

    Export dataset: athletics (compare)

  • Jul 01 2019 08:46

    CobaltBot on master

    Export dataset: shuttles (compare)

  • Jul 01 2019 08:31

    CobaltBot on master

    Export dataset: parking (compare)

  • Jul 01 2019 08:16

    CobaltBot on master

    Export dataset: athletics (compare)

  • Jun 01 2019 08:45

    CobaltBot on master

    Export dataset: shuttles (compare)

  • Jun 01 2019 08:30

    CobaltBot on master

    Export dataset: parking (compare)

  • Jun 01 2019 08:15

    CobaltBot on master

    Export dataset: athletics (compare)

  • Jun 01 2019 07:09

    CobaltBot on master

    Export dataset: courses (compare)

Connor Peet
@connor4312
They might link to it as an external resource for discoverability, but imo it's better for everyone to keep it separate
(as an outsider)
Now, I really should get off Gitter/Slack/Twitter/etc and get back to studying
Qasim Iqbal
@qasim
@connor4312 I agree to an extent. the one thing that doesn't happen in this scenario is being "authentic", kind of like http://api.uwaterloo.ca
and likewise. :cry:
Connor Peet
@connor4312
Eh, no reason they can't add a cname to the API
Except of course for liability (seeing it as an endorsement)
Zach Munro-Cape
@munrocape
we can build it and add a CNAME if they want to give us a subdomain haha
Hanchen
@g3wanghc
Should we scrape http://thevarsity.ca/ for News?
Ghost
@ghost~55a93cb68a7b72f55c3fb492
Hanchen
@g3wanghc
It looks pretty clean. The have news from Sunday, January 1, 2006 at the earliest.
Connor Peet
@connor4312
Varsity has an RSS feed, so you can just plug that into lxml https://thevarsity.ca/feed/
Would be a neat thing
Qasim Iqbal
@qasim
CobaltBot you beautiful thing, it made the 2016-2017 tag w/ the new UTSG timetable!!! https://github.com/cobalt-uoft/datasets/releases
(With that, Cobalt now supports the 2016-2017 school year courses, for UTSG at least. Now waiting for the other campuses update).
Qasim Iqbal
@qasim
Would love some input here on how to go about accomplishing a nice getting started guide cobalt-uoft/documentation#26
Connor Peet
@connor4312
I'd have it in multiple languages. Probably Python and Node. Most uni students will learn Python, and everyone knows at least a bit of Javascript.
Paul Xu
@paulxuca
Thoughts about a wrapper for node?
Qasim Iqbal
@qasim
@paulxuca a good idea :) if you decide to make one, I can add a link to it in the getting started docs
Kashav Madan
@kshvmdn
@paulxuca waterloo's api has a few node clients too, might be of some use
heres another, don't think it's maintained anymore though -https://github.com/rvaiya/node_uwapi
Qasim Iqbal
@qasim
@paulxuca this is wonderful 💕
Paul Xu
@paulxuca
<3 <3 <3
Daniel
@profoundhub
hey @paulxuca
Kashav Madan
@kshvmdn
http://frank-k.github.io/course-finder/ looks like it's using a Cobalt dataset :raised_hands:
Qasim Iqbal
@qasim
That's awesome 🔥
Chris Jin
@erjin
Hey guys, I am here to say thank all who contributed to this open sourced API. I just built an app completely on top of this. Hope I can contribute to cobalt-uodt one day.
Qasim Iqbal
@qasim
@erjin hey Chris ☺️ thanks for that, we all work very hard to bring easy access to UofT data for students and I'm really glad that you are benefiting from it. Let me know if you need ideas for how to contribute!
Chris Jin
@erjin
@qasim Yes, I definitely need your help. I found that labs and printers only had Info for those in BA. How about SF and GB? Is there anyway to get data from https://ssl.ecf.utoronto.ca/ecf/services/labstatus/ . By the way, the app I built is called UTall, basically including everything in this api. Check it out if you are interested (Both android and ios). Thank you!!!
Qasim Iqbal
@qasim
Daniel
@profoundhub
@qasim nice App
Alex Adusei
@alexadusei
Hello there! Is this channel still alive?
Kashav Madan
@kshvmdn

@alexadusei heyheyhey, it is now! :wink:

what's up!

Alex Adusei
@alexadusei
Hey @kshvmdn. Glad to hear it :smile: I’ve been following this project for a little over a year now. I go to Queen’s university and have been thinking of starting an open API of our own.
Guess I just want to know how you guys went about starting this project. Any specific requirements/reasons behind certain decisions? Do you guys follow the same process as the Waterloo API contributors? Any resources I should look into for this that you found helpful when starting out?
Kashav Madan
@kshvmdn

Awesome :fire: – glad you wanna get Queen's involved!

@qasim might be able to provide more insight on how the project was started, but the first step is to definitely get some source of data. Not sure how much you know about Cobalt, so I'll explain it really briefly (this diagram might be able to explain the service layout in more detail). We have a set of scrapers (cobalt-uoft/uoft-scrapers) that periodically commit new data into cobalt-uoft/datasets. The API (cobalt-uoft/cobalt) pulls from this repository and dumps everything into the Mongo database that it queries and serves data from.

You definitely wanna start by building a scraper or two (this'll also help you determine what type of data is publicly and readily available). Not sure how experienced you are with building scrapers, so feel free to reach out if you need some help getting started! Some Laurier students did something similar actually; they forked Cobalt a few months back to start their own API – here's the fork: https://github.com/andrewmcewen/carbon, and their data source: https://github.com/andrewmcewen/datasets (don't think their scrapers are open source, but you definitely don't have to conform to the structure of uoft-scrapers if you don't want to). A friend from Mac also tried to do the same, but got shutdown by the school really quickly... ¯\_(ツ)_/¯

Also, be sure to add Queen's to CampusData rankings when you get something up and running! (join the mailing list too – seems to be kind of dead, but I'm sure they'd love to help you get started as well!)

Alex Adusei
@alexadusei

Thanks for that, extremely helpful! I’ve been scourging through this chat to get a bit of info about how it got started actually (that diagram was helpful as well).

Definitely seems like I’ll start with building a couple scrapers (haven’t delved into that, but did a bit of research and doesn’t seem overly difficult to implement, unless there’s more about the paint points you guys realized that you want to share), and then work towards storage/automating with cron jobs. Is the API data input into Mongo also run periodically?

Also, prior to even scraping, how did you guys strategize where to scrape? Queen’s is pretty poor with data representation (a full list of our courses is one single PDF). Was thinking about somehow scraping our university registrar, SOLUS for getting all course information (though you need to authenticate with student ID here). It’s different per school, but any thoughts on how you guys startd off? Courses first, buildings next, etc?
Kashav Madan
@kshvmdn

Is the API data input into Mongo also run periodically?

Yeah! The API server runs on a job on the hour to determine if it should re-sync, see cobalt-uoft/cobalt/src/db/index.js for implementation details (it compares the current commit hash against that of the latest commit in the datasets repo).

how did you guys strategize where to scrape?

Usually if we found a source providing data we were interested in, we'd try building something around it. We managed everything using GitHub issues, so you can probably go through a few of those (in uoft-scrapers) to get an idea of our process. I guess the answer is that we just looked around. As you build scrapers, you'll start to realize what works in a scraper and what doesn't (for example, sites with complicated frontends that require lots of user interaction are typically harder to scrape, as you might expect).

Queen’s is pretty poor with data representation (a full list of our courses is one single PDF). Was thinking about somehow scraping our university registrar, SOLUS for getting all course information (though you need to authenticate with student ID here).

It's not impossible, but definitely tougher to deal with scraping when authentication is in the way. We got pretty lucky in that a lot of the data we wanted to support was publicly available. I highly suggest confirming that you're allowed to run automated tools or scrapers on any service behind a wall, I know a lot of UofT services have stricter policies when authentication is involved.

It’s different per school, but any thoughts on how you guys startd off? Courses first, buildings next, etc?

That's actually exactly how we started off! I'd say start with whatever seems easy to scrape, and then when you're up & running and have some interest from other developers, it becomes a lot easier to expand. The list of buildings on this page: http://www.queensu.ca/campusmap/overall seems to be scrape-able – might be a good place to start! We used to use the categories on the CampusData rankings page for inspiration!

Alex Adusei
@alexadusei

Yeah! The API server runs on a job on the hour to determine if it should re-sync, see cobalt-uoft/cobalt/src/db/index.js for implementation details (it compares the current commit hash against that of the latest commit in the datasets repo).

Makes sense, will definitely take a look!

Usually if we found a source providing data we were interested in, we'd try building something around it. We managed everything using GitHub issues, so you can probably go through a few of those (in uoft-scrapers) to get an idea of our process. I guess the answer is that we just looked around. As you build scrapers, you'll start to realize what works in a scraper and what doesn't (for example, sites with complicated frontends that require lots of user interaction are typically harder to scrape, as you might expect).

That’s a good approach, great to see your approach to the opened PRs and schemas you guys tried out.

It's not impossible, but definitely tougher to deal with scraping when authentication is in the way. We got pretty lucky in that a lot of the data we wanted to support was publicly available. I highly suggest confirming that you're allowed to run automated tools or scrapers on any service behind a wall, I know a lot of UofT services have stricter policies when authentication is involved.

I’ll have to compliment you guys with how readily available your school data is :) Very good situation. Turns out we’ll probably have to authenticate over our antiquated system, but I’ll probably delve into that after starting with those other ones you recommended.

That's actually exactly how we started off! I'd say start with whatever seems easy to scrape, and then when you're up & running and have some interest from other developers, it becomes a lot easier to expand. The list of buildings on this page: http://www.queensu.ca/campusmap/overall seems to be scrape-able – might be a good place to start! We used to use the categories on the CampusData rankings page for inspiration!

Great resources with campus data (we’ll work towards those listings as well)

Also apologies about the late reply. All great tips for getting started though, thanks @kshvmdn! Do you have any idea how the Waterloo team got started or the approach they use for their own? I remember them being mentioned in this chat a couple times, wonder if there’s any inspiration gained from their approach
Alex Adusei
@alexadusei
Hello again :smile: Been off the map for this one a bit but huge progress made! Got about 4 data points (buildings, news, textbooks, courses, and starting exams next). Just wanted to thank you for the help for closure, @kshvmdn!
I guess my last question is wondering how you guys actually use your Python library uoft-scrapers to periodically scrape, and where those decisions came from? E.g: How often to scrape courses vs news vs textbooks?
Daniel
@profoundhub
Hey, how's it going? I took a break from Chatting to work on other projects! I'm back now and maybe I'll be more active here again, what's new with you?
Alex Adusei
@alexadusei
Hello again! Went off the map a bit myself. Glad to see some small activity here. Our API is about done and we’re sclaing up the team/ops now!
Brian Z Huang
@321iakez
hey! is this chat still active?
I was thinking the API could use a professors functionality for data on faculty members
would love to know if you guys are still interested in contributions