by

Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
  • Aug 07 19:33
    edsu commented #341
  • Aug 07 18:11
    zarasash commented #341
  • Aug 07 14:21
    edsu commented #341
  • Aug 07 13:59
    zarasash commented #341
  • Aug 07 13:58
    zarasash commented #341
  • Aug 07 13:55
    zarasash commented #341
  • Aug 07 13:53
    zarasash reopened #341
  • Aug 07 13:53
    zarasash commented #341
  • Aug 07 00:49
    edsu commented #323
  • Aug 06 23:37
    zarasash closed #341
  • Aug 06 23:37
    zarasash commented #341
  • Aug 06 16:41
    jincio commented #340
  • Aug 06 16:40
    jincio commented #340
  • Aug 06 16:25
    edsu commented #341
  • Aug 06 16:24
    edsu commented #341
  • Aug 06 16:24
    edsu commented #341
  • Aug 06 14:59
    edsu commented #340
  • Aug 06 14:59
    edsu commented #340
  • Aug 06 14:03
    zarasash opened #341
  • Aug 06 09:50
    DapangLiu commented #323
Ed Summers
@edsu
you should still be able to go back and grab tweets within the ~1 week window that the api supports
twarc.py --search '#ferguson' > ferguson.json
you just can't call it repeatedly and have it remember where it left off
i typically do this
twarc.py --stream "#ferguson" > stream.json
to get the current stream
Renato Gabriele
@remagio
Right
Ed Summers
@edsu
and then this to get the old ones
twarc.py --search "#ferguson" > search.json
and then you've got all you can get
Renato Gabriele
@remagio
I see
Ed Summers
@edsu
i'm sorry that the behavior has changed, but it has the v0.x.x for a reason :-)
Renato Gabriele
@remagio
hahaha,
Do you manage multiple Twitter API Token for concurrent execution of --stream?
Ed Summers
@edsu
it should be possibly to write a little script that uses twarc as a library and does the logic of naming the files, and looking at the last id
no i typically use one filter stream to get multiple things
i guess you can do that though
Renato Gabriele
@remagio
Not always but it is
Ed Summers
@edsu
I started working on a script that would name the files, and do the id logic here https://github.com/edsu/twarc/blob/search_archive/utils/search_archive.py
if it seems like something worth finishing i can do that
Renato Gabriele
@remagio
Your idea is to archive files extracting tweets with that from saved json, right?
Ed Summers
@edsu
I'm sorry I do not understand.
Renato Gabriele
@remagio
Checking search_archive.py I realize it works on save files by twarc
saved
do I miss something?
Ed Summers
@edsu
search_archive.py doesn't work yet
the idea would be you could run it with a query and a directory where the twitter result files would live
search_archive.py --search ferguson --archive_dir=/mnt/tweets/ferguson
Renato Gabriele
@remagio
my fault, haha
Ed Summers
@edsu
the first time you run it, it would get as many tweets as it can and write them to /mnt/tweets/ferguson/tweets-0001.json
Renato Gabriele
@remagio
miscmatch between our discussion and source ;-)
Ed Summers
@edsu
the next time you run it it will look at the first tweet in tweets-00001.json and use that twitter id as the minimum id to archive, and write the tweets as /mnt/tweets/ferguson/tweets-0002.json
that's what i was thinking anyway
i didn't like the way the old twarc named files
Renato Gabriele
@remagio
right
I think it works and ppl will like this
Ed Summers
@edsu
ok let me see if i can get it working
maybe you can try it out :-)
Renato Gabriele
@remagio
:-)
Anyway, THX!
You contributions in this kind of stuff is great
Have a good day @edsu
Ed Summers
@edsu
hopefully i'll have something for you to try in an hour
thanks for working w/ me on it!
Renato Gabriele
@remagio
You are welcome
We could discuss about nosql integration soon if you want
Ed Summers
@edsu
super to have some in italy trying it out btw -- makes a big difference to have it not just be a US thing
sure
some -> someome
someone!
Renato Gabriele
@remagio
I see, but we have ridiculous "numbers" comparing with US
I mean, media say "Incredible, Italian twitter scene crazy for next Presidente"