These are chat archives for biojs/biojs

8th
Apr 2018
Sarthak Sehgal
@sarthak-sehgal
Apr 08 2018 05:34
Hello @all,
Megh [backend contributor] and I are planning to shift the prototype to the NeCTAR server for testing the speeds of data retrieval and render. As discussed in the community call, we would further test NodeJS-MogoDB v.s. Pyramid-PostgreSQL, if time permits. Currently, the prototype is hosted on a server placed in Bangalore, India (RAM: 1GB) and the data retrieval speeds are blazingly fast. We hope to get the same (or even better!) results on the NeCTAR server placed in Australia (RAM: 32GB).
You can view the current prototype here.
We will be starting off soon. Your suggestions and feedback are welcome. :smiley:
Abhinavvisen
@abhinavvisen
Apr 08 2018 05:43
Hi @sarthak-sehgal I am thinking that instead of storing data into small remote servers instead of storing the data at a fixed place because the person might have problems in data access speed who is accessing data from US
*Hi @sarthak-sehgal I am thinking that instead of storing data at a fixed server what we can do is storing data at various remote servers.
This will increase the data access speed from the user.
Sarthak Sehgal
@sarthak-sehgal
Apr 08 2018 05:45
Hey @abhinavvisen,
I think that the BioJS has only a single server in Australia.
Abhinavvisen
@abhinavvisen
Apr 08 2018 05:49
Can we create like multiple copies of data and put it in different servers around the world
?
Sarthak Sehgal
@sarthak-sehgal
Apr 08 2018 05:49
I think that the very fundamental problem to that approach is that we don't have servers around the world.
Abhinavvisen
@abhinavvisen
Apr 08 2018 05:50
You are right @sarthak-sehgal
Sarthak Sehgal
@sarthak-sehgal
Apr 08 2018 05:50
As far as I remember, the NeCTAR services are funded by the Australian government. Buying servers around the world is going to be very costly.
Abhinavvisen
@abhinavvisen
Apr 08 2018 05:51
Any how if any user wants to access the data from the US it might be still slow right?
Björn Grüning
@bgruening
Apr 08 2018 05:52
I can offer a mirror server in Europe, funded also by government if this is needed.
Abhinavvisen
@abhinavvisen
Apr 08 2018 05:52
That is excellent
Björn Grüning
@bgruening
Apr 08 2018 05:53
and I have some connections in the USA
Abhinavvisen
@abhinavvisen
Apr 08 2018 05:53
What we need is a faster components access by the user
Björn Grüning
@bgruening
Apr 08 2018 05:53
but I guess these problems can be fixed if one server is up and running and we see problems, right?
Sarthak Sehgal
@sarthak-sehgal
Apr 08 2018 05:54
@abhinavvisen
Not necessarily. The servers these days provide fast data interactions. The NeCTAR server does have potential, maybe it needs a bit tweaking to set the right things up. It's a 32 gigs server so it should work fine. Though, @rowlandm told me that it's usually not that fast. That's why we want to test the speeds.
Abhinavvisen
@abhinavvisen
Apr 08 2018 05:55
What I am suggesting is we put up the main data like the components download data on the main server
While the individual extra information about the component on the small servers.
Sarthak Sehgal
@sarthak-sehgal
Apr 08 2018 05:56
Also, the size of data on the server is pretty small and is mostly plain text. Most of the information of the components is being retrieved with a GitHub API call which mostly depends on GitHub's servers and the user's internet connection.
Abhinavvisen
@abhinavvisen
Apr 08 2018 05:56
I talked to @yochannah the other day regarding speed.She also said that the person sitting in Europe might face difficulty in data access.
@sarthak-sehgal Did you create the prototype on the biojs organisation or its in your personal repo
Sarthak Sehgal
@sarthak-sehgal
Apr 08 2018 06:00
@abhinavvisen It's a repository on my GitHub account.
You can view it here: https://github.com/sarthak-sehgal/registry-biojs
rowlandm
@rowlandm
Apr 08 2018 06:02
We already use validate cloudflare. That will speed things up, but we should engineer our base to be fast regardless
Abhinavvisen
@abhinavvisen
Apr 08 2018 06:02
Thanks @sarthak-sehgal
rowlandm
@rowlandm
Apr 08 2018 06:02
We already use cloudflare
Sorry. Bad autocomplete.
Sarthak Sehgal
@sarthak-sehgal
Apr 08 2018 06:14
I agree with @rowlandm. I've tried my best to minimize the delay in data render.
Hopefully the speeds will be maintained with the NeCTAR server.
rowlandm
@rowlandm
Apr 08 2018 06:25
There are ways to maximise the time to retrieve the data. But to be honest, the data is tiny. As long as we "slice" the data appropriately we will be fine.
From what people have explained to me, we are not slicing the data at all in the current BioJS website
Sarthak Sehgal
@sarthak-sehgal
Apr 08 2018 06:27
@rowlandm, regarding that, I think we had discussed the idea of displaying only the top 10 components in the "Components" page and top 3 components for each category: downloads, stars, and last modified and having a search bar for all others. I have implemented it in the prototype.
Also, as I have mentioned in the proposal, data slicing in the search would be there as it would only show the name and the tags of the components in the result of the search query. This enables user to get the exact match of the search query and also faster results (rendering of data is also minimized). Once the user clicks the component, the GitHub API call and a call for the visualization would enable user to see the whole data.
Abhinavvisen
@abhinavvisen
Apr 08 2018 06:31
@sarthak-sehgal we can also use meta data like the user could describe the component and the resulted query will show those components matching the description.
In case the user doesn't recognise or knows about the tags.As we have to keep the website to all kinds of users.
@sarthak-sehgal your idea sounds good too.May be we can combine both these features in our search bar
Sarthak Sehgal
@sarthak-sehgal
Apr 08 2018 06:34
@abhinavvisen, yes, I agree.
Though, I think the target users are mostly beginners in the field of bioinformatics or are already bioinformaticians who know about the components generally.
Yes, we can combine it. Would be great. :smiley:
Abhinavvisen
@abhinavvisen
Apr 08 2018 06:38
@sarthak-sehgal we can also use AMP for our components page.
rowlandm
@rowlandm
Apr 08 2018 06:59
@sarthak-sehgal - yeah - that's right - sorry I missed your previous top 10 components comment
Alkesh Srivastava
@alkesh47
Apr 08 2018 08:29
I agree with @rowlandm , as pointed out already, the current amount of data is not that big, and seeing the current growth rate for newly added components, I too feel that it would be a safe bet to prune the data coming from the backend for a better response time.
Rohit Gupta
@r0hit-gupta
Apr 08 2018 09:16
@all using a CDN like Cloudflare should solve the problem of latency.
This API sends almost the exact data that is required
We can slice it even further for the main components page
Sarthak Sehgal
@sarthak-sehgal
Apr 08 2018 09:19
@r0hit-gupta, yes. As pointed out by @rowlandm, BioJS already uses Cloudflare.
Rohit Gupta
@r0hit-gupta
Apr 08 2018 09:22
We can setup multiple API endpoints for different slices of data but the simplicity of the system would start diminishing then.
Sarthak Sehgal
@sarthak-sehgal
Apr 08 2018 09:24
@r0hit-gupta, do you plan to call GitHub API for list of contributors? It is not in the link you sent..
Rohit Gupta
@r0hit-gupta
Apr 08 2018 09:26
Yes. Need to work upon that. This would be done at the server side only in a periodic interval.
Sarthak Sehgal
@sarthak-sehgal
Apr 08 2018 09:28
If we use GitHub API call for the contributors (which I plan to), I think we can retrieve a lot more data with a single call like the number of stars, downloads, contributors, etc.
This will help in reducing the data we get from backend, data slicing. We can retrieve the same data from a single API call which anyway we would have to do.
rowlandm
@rowlandm
Apr 08 2018 09:32
It would be easier to call the API and store it in a central database, then do the calls from there for most of the data we need.
For a component only focus, then it might be OK to use a straight GitHub API call, provided that the API doesn't need authentication...
Abhinavvisen
@abhinavvisen
Apr 08 2018 09:34
@sarthak-sehgal you are right.we can call multiple data from a single API call.
Sarthak Sehgal
@sarthak-sehgal
Apr 08 2018 09:35
@rowlandm Yes, I agree. Megh and I had planned exactly this. The prototype gets the data for the top 10 components and top 3 components for various categories from the backend but for a particular component, a call to GitHub API is being made.
rowlandm
@rowlandm
Apr 08 2018 09:36
:)
Abhinavvisen
@abhinavvisen
Apr 08 2018 09:36
In fact if there is any way we can link single API call to multiple API calls
It will become simpler in that way for structuring the data.
What do you suggest @sarthak-sehgal @rowlandm @r0hit-gupta ?
rowlandm
@rowlandm
Apr 08 2018 09:39
Not sure what you mean there.... @abhinavvisen
Sarthak Sehgal
@sarthak-sehgal
Apr 08 2018 09:40
@abhinavvisen, I didn't get the idea...can you elaborate?
Abhinavvisen
@abhinavvisen
Apr 08 2018 09:43
Like if we call the API request for the particular components contributor ,should't there is a way that the simultaneous API is triggered without calling it for say downloads, stars etc.
Abhinavvisen
@abhinavvisen
Apr 08 2018 10:02
Or in simple words an API call that triggers multiple action
Alkesh Srivastava
@alkesh47
Apr 08 2018 10:11
If we are storing the data from API calls into the database, there would arise a need for regularly updating the stored info too, let's say the number of stars.
Rohit Gupta
@r0hit-gupta
Apr 08 2018 10:42
@sarthak-sehgal @abhinavvisen there is no single API provided by Github to fetch all the information at once.
Github APIs are rate limited. Calling the API from client side would exhaust the quota in seconds.
For unauthenticated requests, the rate limit allows for up to 60 requests per hour. Unauthenticated requests are associated with the originating IP address, and not the user making requests.
If we opt for authenticated requests, we risk exposing our OAuth tokens to the public.
Rohit Gupta
@r0hit-gupta
Apr 08 2018 10:50
@abhinavvisen the prototype version of the backend works in a similar manner that you are suggesting. Multiple actions are triggered on a single command. Check out https://github.com/r0hit-gupta/workman
Sarthak Sehgal
@sarthak-sehgal
Apr 08 2018 11:15
@alkesh47, yes, if that's the case, a cronjob will be implemented.
Alkesh Srivastava
@alkesh47
Apr 08 2018 11:18
@r0hit-gupta I guess we'll need some kind of caching mechanism to tackle that bit too
Rohit Gupta
@r0hit-gupta
Apr 08 2018 11:19
@alkesh47 a cron job is in place that updates all the packages on a given time interval.
Alkesh Srivastava
@alkesh47
Apr 08 2018 11:20
But does it update the database too?
Rohit Gupta
@r0hit-gupta
Apr 08 2018 11:20
Yes. It does.
Alkesh Srivastava
@alkesh47
Apr 08 2018 11:21
Well that solves the updating part..
@r0hit-gupta What about caching github API calls from the client side, that is if we choose to do that
Rohit Gupta
@r0hit-gupta
Apr 08 2018 11:27

@alkesh47 we can choose to do that but that can turn out to be really problematic for us.

A component page would need to make around 3 requests to Github API to fetch all the required information.
This means a unique user can view only 20 packages in an hour as the API is rate limited.

Alkesh Srivastava
@alkesh47
Apr 08 2018 11:30
Well that is the problem, I'm also not so sure of how to fetch information for a wide range and long list of packages via github's API calls
@r0hit-gupta
Rohit Gupta
@r0hit-gupta
Apr 08 2018 11:32
@alkesh47 I know. In my opinion, performing all these tasks on the backend would make things easier for us.
Alkesh Srivastava
@alkesh47
Apr 08 2018 11:36
@r0hit-gupta I also think that the sole purpose of the front end should be to render the response and leave out as much of the complex functionalities of retrieval and storage to the backend. I guess everyone would agree on that.
Abhinavvisen
@abhinavvisen
Apr 08 2018 22:31
@r0hit-gupta OK
@alkesh47 that is correct
Abhinavvisen
@abhinavvisen
Apr 08 2018 23:24
yes @alkesh47 Dividing the task makes the construction easier.It is better that we leave the complex functions for the backend.