These are chat archives for ushahidi/Community

20th
Aug 2018
Sanjay Bhangar
@batpad
Aug 20 2018 06:26
hey @rjmackay sorry missed you last night - I just went to sleep at 4am before the messages :( - we do have the CLI importer running, though it was a bit convoluted. Posting the relevant files. I think we still have a problem importing the lat / lng values.

The mapping file ended up looking like this:

{
"[district]": "[title]",
"[id]": "[content]",
"[person]": "[values][a357ed51-2b3c-45ba-b42e-cc09ff725b16][0]",
"[location]": "[values][e5bad9b6-16c8-4ace-adf2-5c53614e0b1e][0]",
"[contact]": "[values][241f2001-778f-4f91-a552-89486b4aaae2][0]",
"[noofpeople]": "[values][cf18f70f-0752-497d-85b2-bd19bb016416][0]",
"[dt_of_reporting]": "[values][90d63bc8-525e-447c-88c9-929b5c42ad63][0]",
"[is_request_for_others]": "[values][077829cb-c636-4482-be91-76ff7f60ccc4][0]",
"[needothers]": "[values][3487f5ec-bc75-4a0c-be82-4ebcabb823c2][0]",
"[releifcamp]": "[values][85484420-8d22-4ea4-b919-d51699c8daef][0]",
"[detailcloth]": "[values][28d962a9-a94c-4cc2-8b95-b0adcdbc3a54][0]",
"[detailfood]": "[values][9e9fc3a7-25b5-483d-b189-b356af4b400a][0]",
"[detailkit_util]": "[values][15de9b27-8bc4-4d53-adf1-3135ee545b1e][0]",
"[detailmed]": "[values][76e14e3f-7362-488b-8925-63d90f3e6dc5][0]",
"[detailrescue]": "[values][a350528b-f563-4303-a298-54f1b5f522b1][0]",
"[detailtoilet]": "[values][09aad13b-902e-499f-88bf-f05ef606e0c6][0]",
"[detailwater]": "[values][ae3b078b-15b3-4253-91de-a5ed9fe2ca96][0]"
}

Ushbot
@ushbot
Aug 20 2018 07:43

[Robbie Mackay, Ushahidi] I think for the location you might need to do something like
"[location][lat]": "[values][e5bad9b6-16c8-4ace-adf2-5c53614e0b1e][0][lat]",
"[location][lon]": "[values][e5bad9b6-16c8-4ace-adf2-5c53614e0b1e][0][lon]",

basically map the lat and lon separately

[Robbie Mackay, Ushahidi] I don't think the code can cope with pulling lat and lon from a single column but I might be wrong. That importer hasn't changed much since the initial release of v3 so I'm not 100% on the details anymore
Sanjay Bhangar
@batpad
Aug 20 2018 07:52
alright! cheers @rjmackay - we can give that a shot.
Ushbot
@ushbot
Aug 20 2018 08:06
[Robbie Mackay, Ushahidi] Cool. I'm just about done for the day but let me know how it goes
Ushbot
@ushbot
Aug 20 2018 10:02
[David Losada, Ushahidi] @batpad I’m around if any help through chat needed
Sajjad Anwar
@geohacker
Aug 20 2018 15:47
@rjmackay @tuxpiper @aoduor any of you around for an ushahidi sql question?
Ushbot
@ushbot
Aug 20 2018 15:49
[David Losada, Ushahidi] sure, go ahead
Sajjad Anwar
@geohacker
Aug 20 2018 15:50
@tuxpiper our projection for importing ~40k row csv to the ushahidi db via the CLI is about 16 hours. we're wondering if we can write insert statements against the mysql database if this is faster.
@tuxpiper and also does ushahidi v2 use the schema? inserts are going to be slower as well I guess.
Ushbot
@ushbot
Aug 20 2018 15:54
[David Losada, Ushahidi] it’s possible since everything is stored in the database in one way or other. However, the complexity to write the statements will depend on your survey. Any value that is not title or description is stored in its own row, in a table matching its datatype (i.e. numbers, strings and locations are all stored in different tables), and linked to the posts using a number of foreign keys that you would have to provide
[David Losada, Ushahidi] v2 and v3 have radically different database schemas
[David Losada, Ushahidi] my note was about v3
Sajjad Anwar
@geohacker
Aug 20 2018 15:55
@tuxpiper do you have an example statement perhaps from the platform code?
Ushbot
@ushbot
Aug 20 2018 15:56
[David Losada, Ushahidi] unfortunately I don’t have that , the statements are built on the fly during execution, using an object to relational mapper
Sajjad Anwar
@geohacker
Aug 20 2018 15:56
yeah ok.
we've spent the last two days trying to figure out to get our CSV survey data into the database. and don't really can spend another 16-20 hours waiting for this.
Ushbot
@ushbot
Aug 20 2018 15:57
[David Losada, Ushahidi] have you thought about using the REST API for pushing the data?
Sajjad Anwar
@geohacker
Aug 20 2018 15:57
we've. it's slow.
the hosted instance times out. on the self hosted one we increased the execution time and had some luck. but it takes a flat 20 mins to import 400 rows.
Ushbot
@ushbot
Aug 20 2018 15:59
[David Losada, Ushahidi] that’s surprising .. we’ve had much better results in the past . what does your survey look like? could you share that with us?
Sajjad Anwar
@geohacker
Aug 20 2018 15:59
we've tried the web upload, wrote a script to use the API, and now we're trying the CLI because that's the last thing we have.
@tuxpiper i could share a sample. but it's nothing complicated. ~20 columns.
Ushbot
@ushbot
Aug 20 2018 16:01
[David Losada, Ushahidi] sure, just to take a look and see if I spot anything. It looks like we’ve got things to learn from what you are doing.
Sajjad Anwar
@geohacker
Aug 20 2018 16:01
@tuxpiper i think we're also seeing the mysql schema creates a row for a column in the csv. that's a lot of inserts so time consuming.
Ushbot
@ushbot
Aug 20 2018 16:03
[David Losada, Ushahidi] yes, I was referring to that in my comment. That’s designed to allow flexible data types while still using a vanilla relational database
Ushbot
@ushbot
Aug 20 2018 16:09
[David Losada, Ushahidi] we really want to help, it just looks like we are going to need more details in order to optimise for such a big data set. Taking a look at the survey would help
Sajjad Anwar
@geohacker
Aug 20 2018 16:11
description,contact,district,category,status,comments from requestee,date,lat,lng,situation,sid
Ushbot
@ushbot
Aug 20 2018 16:15
[David Losada, Ushahidi] yea that’s pretty short. The fact that multiple rows are created on the db shouldn’t be holding you back
Sajjad Anwar
@geohacker
Aug 20 2018 16:15
@tuxpiper ^ this is the schema
Ushbot
@ushbot
Aug 20 2018 16:18
[David Losada, Ushahidi] with the REST API approach.. any reason to suspect network latency? was your REST API client running in the same network as the server?
Sajjad Anwar
@geohacker
Aug 20 2018 16:19
pretty sure that's not affecting us
and it's really tiny csvs. nothing more than 1MB
Ushbot
@ushbot
Aug 20 2018 16:23
[David Losada, Ushahidi] sorry a bit confused there. When you mention that you were using the REST API , were you using it to import the CSV or to submit each post with a separate call?
Sajjad Anwar
@geohacker
Aug 20 2018 16:24
@tuxpiper no. we tried the web form. we weren't able to do batch post requests. so we didn't insert via single post requests.
Ushbot
@ushbot
Aug 20 2018 16:25
[David Losada, Ushahidi] ah ok I see
[David Losada, Ushahidi] and that’s why you were hitting web server timeouts
Sajjad Anwar
@geohacker
Aug 20 2018 16:25
@tuxpiper do you have an example of a post request? i think we looked at the client-side code and that wasn't really helpful.
Ushbot
@ushbot
Aug 20 2018 16:28
[David Losada, Ushahidi] yea, I can pull up an example shortly, we also have api documentation here: http://preview.ushahidi.com/platform/develop/api/index.html
Sajjad Anwar
@geohacker
Aug 20 2018 16:32
@tuxpiper you mean this endpoint? http://preview.ushahidi.com/platform/develop/api/index.html#posts-create-a-new-post-post that doesn't have a request body and we looked up the keys from the requests the front-end makes. but that looks like we've to do per row.
Sanjay Bhangar
@batpad
Aug 20 2018 16:33
Right, we did actually get the POST API working, but we were doing that to integrate our form -- maybe we also went down a rabbit hole assuming the CLI import would be fastest.
Ushbot
@ushbot
Aug 20 2018 16:42
[David Losada, Ushahidi] sorry, I’m back now, computer crash
[David Losada, Ushahidi] the problem you are experiencing with CSV bulk import is that it takes long to process 40k rows within a single web request . The web server is going to time out, such large import tasks should never happen in a http request-response loop. Ideally, the web request would put a job task in a queue and things would work in the background.
[David Losada, Ushahidi] that would also mean that you’d have to install a queue server, worker instances etc (a way more complicated setup than you have now)
[David Losada, Ushahidi] for that reason… I’m suggesting using the POST API
Ushbot
@ushbot
Aug 20 2018 16:47
[David Losada, Ushahidi] also, the CLI that you are using now will be slower than the POST API , just because of the overhead of creating the PHP process 40k times
Ushbot
@ushbot
Aug 20 2018 16:48
[David Losada, Ushahidi] yes
[David Losada, Ushahidi] I was about to share some example payload, but @batpad mentioned you’ve got that already sorted out?
Sajjad Anwar
@geohacker
Aug 20 2018 16:48
will you be able to help with a sample request snippet? the parameters are unclear and we don't have a way to test the payload.
Ushbot
@ushbot
Aug 20 2018 16:49
[David Losada, Ushahidi] sure, I could even fine tune the example for your specific deployment, if you could share with me the URL of your deployment and I could look at the survey definition
Sajjad Anwar
@geohacker
Aug 20 2018 16:50
@tuxpiper that'll be great! we're hosted here https://volunteers.keralarescue.in/ - happy to give you admin access as well if that's easier.
@tuxpiper we also have a hosted instance but we didn't find that much faster than this cli import process but if you think POST could import much faster we don't mind switching. that'll reduce a lot of devops overhead on our side.
Sanjay Bhangar
@batpad
Aug 20 2018 16:52
Hey - I don't have a working POST on the updated instance with all the field ids mapped :-) - I was doing that in a sorta horrid manual way - will be great to have a good canonical example
our code looked something like this: https://github.com/IEEEKeralaSection/rescuekerala/pull/554/files but was not the nicest - and at that point we weren't looking at the API for bulk CSV uploads.
But if that would be the best way to do it, we can definitely script our CSV uploads to just use the API to make individual POSTs.
Ushbot
@ushbot
Aug 20 2018 16:54
[David Losada, Ushahidi] awesome , let me take a look at that
Sanjay Bhangar
@batpad
Aug 20 2018 16:54
@ushbot but that form is POSTing to a different instance and form
Ushbot
@ushbot
Aug 20 2018 16:58
[David Losada, Ushahidi] I see thumbsup emoji well, still it’s pretty much what you would need. We may have to remap the form value IDs .. but I will also look at some python code I have lying around that could prove useful for a more elegant matching.
Sajjad Anwar
@geohacker
Aug 20 2018 17:00
@ushbot yeah, that'll be great.
Ushbot
@ushbot
Aug 20 2018 17:07
[David Losada, Ushahidi] also, one trick you can consider is to shoot simultaneous POST /post API requests by either:
  1. using asyncio in python , but I think something may just explode unless you are an asyncio expert
  2. introducing a parameter so that you can bring up several processes for your CSV to API importer (let’s say 5), and have each process import different rows from the CSV file
Sanjay Bhangar
@batpad
Aug 20 2018 17:09
@tuxpiper yea sue, I'm sure we can figure paralellizing it some-how. But the approach there seems sane? Also, how long should we expect an access_token to be valid for?
@tuxpiper are there any client libraries for the api .. ? in any language might be okay ..
Ushbot
@ushbot
Aug 20 2018 17:20
[David Losada, Ushahidi] @batpad the access token expires after one hour , the reset token after 7 days
[David Losada, Ushahidi] yea the approach there looks pretty sane to me thumbsup emoji and we don’t have really any higher level clients libraries for the api. We would love seeing that sort of thing coming together from the community.
Ushbot
@ushbot
Aug 20 2018 17:35
[David Losada, Ushahidi] @geohacker @batpad what is the name of the survey in https://volunteers.keralarescue.in/ that you are trying to import to?
Sanjay Bhangar
@batpad
Aug 20 2018 17:35
@ushbot Rescue Survey
Ushbot
@ushbot
Aug 20 2018 17:37
[David Losada, Ushahidi] oh I see , you are just not filling all those fields.. the CSV header you shared had way less fields
Sajjad Anwar
@geohacker
Aug 20 2018 17:38
@tuxpiper yeah some of them will be filled while our volunteers call and verify the requests.
Ushbot
@ushbot
Aug 20 2018 17:40
[David Losada, Ushahidi] could you possibly indicate me the matching of the CSV fields to the fields in the survey?
Sajjad Anwar
@geohacker
Aug 20 2018 17:40
yes
@tuxpiper ^
Sanjay Bhangar
@batpad
Aug 20 2018 17:43
sorry - sec, i have a more complete mapping
this is all the labels and keys in the survey: https://gist.github.com/batpad/e745b2c6f1fed6fec51ea15608ea37dd
Sajjad Anwar
@geohacker
Aug 20 2018 17:44
ah nice
Sanjay Bhangar
@batpad
Aug 20 2018 17:47
cc @tuxpiper ^
Ushbot
@ushbot
Aug 20 2018 17:48
[David Losada, Ushahidi] oh awesome
Ushbot
@ushbot
Aug 20 2018 17:55
[David Losada, Ushahidi] what does the “Status” field in the CSV match to?
Sanjay Bhangar
@batpad
Aug 20 2018 17:57
sec
Ushbot
@ushbot
Aug 20 2018 17:59
[David Losada, Ushahidi] “comments from requestee” ?
Sanjay Bhangar
@batpad
Aug 20 2018 17:59
@tuxpiper what's the best way to update a record via the API? The PUT request? Is there by chance a way to update by only passing in fields that need to be changed? Or does it always need a full representation of the JSON?
David Losada
@tuxpiper
Aug 20 2018 18:00
it always needs the full document
Sanjay Bhangar
@batpad
Aug 20 2018 18:00
oh @tuxpiper my bad. sorry. some confusion with those CSV fields - lemme look closer.
Ushbot
@ushbot
Aug 20 2018 18:00
[David Losada, Ushahidi] it’s alright
[David Losada, Ushahidi] your code is really ok from what I can see , and you’ve got all the attribute IDs at hand
[David Losada, Ushahidi] I think all you need is to update those attribute IDs in the code and see how it works with a couple rows from the CSV
Sanjay Bhangar
@batpad
Aug 20 2018 18:01
@tuxpiper yea, this is json that maps the field ids to the CSV columns: https://gist.github.com/batpad/58ce4640634f66508238ae5d9a8e87b4
yup - we're working on a python script to read the CSV and POST ..
I think it's close to having something to test
Ushbot
@ushbot
Aug 20 2018 18:02
[David Losada, Ushahidi] my only other doubt is the “categories” field.. what does that map to? actual categories in your v3 deployment?
Sanjay Bhangar
@batpad
Aug 20 2018 18:04
Oh, interesting - I don't know if that's actual "categories" - we've just mapped it to a field called Category (Internal) -- I had assumed that was just a regular text field ..
Ushbot
@ushbot
Aug 20 2018 18:09
[David Losada, Ushahidi] if actual categories, we would need to check out how those are referred to in the API . I think it would be by their numeric ID in the database
Sanjay Bhangar
@batpad
Aug 20 2018 18:09
right - I think this is not meant to be actual categories
Ushbot
@ushbot
Aug 20 2018 18:58
[David Losada, Ushahidi] another thing that I just remembered, there is a rate limiter built in the api
[David Losada, Ushahidi] I’ll find the way to disable it for you
[David Losada, Ushahidi] in application/config/ratelimiter.php , you would set “cache” to “”
Sanjay Bhangar
@batpad
Aug 20 2018 19:13
oh! interesting ..
ty - will pass it on and see if that helps :) - I'm crashing out, but will tell folks to ask here if they run into anything - thank you so much for all your help @tuxpiper .
Ushbot
@ushbot
Aug 20 2018 19:33
[David Losada, Ushahidi] glad that we could help! I’m also calling it a day soon but we’ll have others watching this channel in case anything comes up