These are chat archives for rosshinkley/nightmare

26th
Jun 2016
Mingsterism
@mingsterism
Jun 26 2016 05:27
@rosshinkley hey ross. i have a function which loops and paginates to the next page. its a recursive function. however, everytime it loops, it instantiates a new instance of nightmare. is this good ? looks like it will affect the performance.
function start(link) {
    var Nightmare = require('nightmare');
    var nightmare = Nightmare({ show: true })
    var a = true;

    nightmare
        .goto(link)
        .evaluate(getLinks, link)
        .end()
        .then(function(content) {
            //console.log(content.nxtLink);
            console.log(content.t);
            if (content.nxtLink != undefined) start(content.nxtLink);
        })
        .catch(function(err) {
            console.log(err);
        })
}

start(url)
Mingsterism
@mingsterism
Jun 26 2016 10:31
kindly disregard the earlier code. Appreciate can give me your feedback on my new code.
Any tips for improvement? it works, but I dont know if its performant, or if there is memory leaks. How do I judge quality of the code? Appreciate any advice. Thanks :)
also my concern is that it is always instantiating a new instance of nightmare everytime. is that performant?
Samuel Rounce
@srounce
Jun 26 2016 11:57
@mingsterism the assignment & require call doesn't need to be made in the constructor can be above
Also should be using this not referring to a named instance variable, I'm surprised that code even works let alone you haven't noticed the side effects of creating multiple instances of it
Mingsterism
@mingsterism
Jun 26 2016 12:25
@srounce could you clarify in regards to using this
im not too sure if i fully understand your comment.
Samuel Rounce
@srounce
Jun 26 2016 12:38
Look again at line 22
Mingsterism
@mingsterism
Jun 26 2016 12:39
@srounce yeah. are you saying i'm creating a new instance of nightmare everytime I loop?
Samuel Rounce
@srounce
Jun 26 2016 12:39
Nope
Mingsterism
@mingsterism
Jun 26 2016 12:40
or are you meaning that i'm creating many instances of bot, and they do not get closed
Samuel Rounce
@srounce
Jun 26 2016 12:40
You're referencing the variable you assign your instance to
bot1
bot1.nightmare...
Mingsterism
@mingsterism
Jun 26 2016 12:41
whats wrong with that
i did that to create a single instance of nightmare. so that all the other functions can access the same nightmare instance.
Ross Hinkley
@rosshinkley
Jun 26 2016 15:10
mmm, looking at the bin now...
the problem with line 22 is that you're referencing an instance of the prototype interior to the prototype... what you have there won't work out of the context you have it in
as bot1 won't be defined
Mingsterism
@mingsterism
Jun 26 2016 15:15
but the works though.
Ross Hinkley
@rosshinkley
Jun 26 2016 15:15
for now, yes
but if you split the code up
where RootNightmare was defined in it's own file, and bot1 wasn't defined in the file (because it probably shouldn't be), it will break
Mingsterism
@mingsterism
Jun 26 2016 15:16
i'm trying to understand your statement. i cant seem to get it.
Ross Hinkley
@rosshinkley
Jun 26 2016 15:16
you probably want that to read this.nightmare...
Mingsterism
@mingsterism
Jun 26 2016 15:16
"the problem with line 22 is that you're referencing an instance of the prototype interior to the prototype... what you have there won't work out of the context you have it in
as bot1 won't be defined"
Ross Hinkley
@rosshinkley
Jun 26 2016 15:17
so... line 41, you define bot1
Mingsterism
@mingsterism
Jun 26 2016 15:17
yup
Ross Hinkley
@rosshinkley
Jun 26 2016 15:17
which is an instance of RootNightmare
Mingsterism
@mingsterism
Jun 26 2016 15:18
ok.
Ross Hinkley
@rosshinkley
Jun 26 2016 15:18
line 22, you use bot1 inside of an instance method main
let's say you wanted to have a second bot
Mingsterism
@mingsterism
Jun 26 2016 15:18
ahhh.
Ross Hinkley
@rosshinkley
Jun 26 2016 15:18
bot2 = new RootNightmare(); bot2.main('http://myawesometld.com') ...
Mingsterism
@mingsterism
Jun 26 2016 15:19
you mean i cant use main anymore? because linked to bot1?
Ross Hinkley
@rosshinkley
Jun 26 2016 15:19
that is one problem, yes
bot2 now changes bot1
and if you have instance variables on bot2, they won't update appropriately (eg, bot2.nightmare)
Mingsterism
@mingsterism
Jun 26 2016 15:20
hmm. what you meant by that? "they won't update appropriately "
Ross Hinkley
@rosshinkley
Jun 26 2016 15:21
because all of your instances rely on the bot1 instance
give this a try
ah, dang, i missed one
hang on
Mingsterism
@mingsterism
Jun 26 2016 15:22
ok. but why did you need self.
why not just everything to this
Ross Hinkley
@rosshinkley
Jun 26 2016 15:22
because this changes depending on function scope
namely:
Mingsterism
@mingsterism
Jun 26 2016 15:22
ah. i see.
Ross Hinkley
@rosshinkley
Jun 26 2016 15:23
.then(function(v) {
  //this !== self
  console.log(v.nextUrl)
  self.main(v.nextUrl)
})
of course, there are a lot of asterisks here
but in general, this refers to the current function scope
Mingsterism
@mingsterism
Jun 26 2016 15:25
is the code always re-instantiating a new instance of nightmare everytime it loops?
Ross Hinkley
@rosshinkley
Jun 26 2016 15:26
nightmare itself uses this technique
no
i don't think so.
it looks like your nightmare instance is set up in the RootNightmare constructor
Mingsterism
@mingsterism
Jun 26 2016 15:27
i see. ok.
is this structure a good way. any other room for improvement?
using constructors and all.
Ross Hinkley
@rosshinkley
Jun 26 2016 15:28
the only complaint i'd mount is with getTitlesAndNextUrl
i think
it's not intended to be run by the user
Mingsterism
@mingsterism
Jun 26 2016 15:29
you mean its like a classmethod?
or staticmethod?
Ross Hinkley
@rosshinkley
Jun 26 2016 15:29
it's an instance method the way it's set up now
meaning you could run bot1.getTitlesAndNextUrl()
which... would not be good
Mingsterism
@mingsterism
Jun 26 2016 15:29
ok. got it.
but how about overall strucuture. you mentioned earlier if i place code in separate file it would not work?
if constructor is in different file from bot1?
Ross Hinkley
@rosshinkley
Jun 26 2016 15:30
otherwise, i'm sure there's a cleaner way to do iterative promises, but all i can think of right now is if you know the URLs up front, which judging from your source, you don't
and i'm having a bit of a lapse in memory :P
depends on how you're planning on using RootNightmare
Mingsterism
@mingsterism
Jun 26 2016 15:31
haha. ok no worries. you dont use much nightmare now?
Ross Hinkley
@rosshinkley
Jun 26 2016 15:31
if you're only planning on hvaing one bot, then it might make sense to leave it
at a minimum, though, you want the stuff in main to reference this (either directly or through self)
Mingsterism
@mingsterism
Jun 26 2016 15:32
if i have many bots, will they all run in parallel? meaning multiple electron instances?
Ross Hinkley
@rosshinkley
Jun 26 2016 15:32
you can
but you'll be reasonably limited
there's an issue about this... let me see if i can find it
segmentio/nightmare#564, maybe?
that's not the one i'm thinking of
Mingsterism
@mingsterism
Jun 26 2016 15:37
@rosshinkley hmm. whats the reason of the limitation though.
does it mean i have to use something like digitalocean? will that help?
Ross Hinkley
@rosshinkley
Jun 26 2016 15:42
uh, it'd be like starting multiple instances of Chrome
Electron is pretty resource-intensive
Mingsterism
@mingsterism
Jun 26 2016 15:43
does nightmare create a headless browser instance of electron?
like phantom for selenium?
Ross Hinkley
@rosshinkley
Jun 26 2016 15:44
it creates an instance of electron, yes
nightmare and electron instances are 1:1
(for now)
Mingsterism
@mingsterism
Jun 26 2016 15:45
oh. so its not the same as phantom for selenium then?
because phantom was meant to be less resource intensive than selenium as it does not load a full browser
Ross Hinkley
@rosshinkley
Jun 26 2016 15:46
no, nightmare doesn't use webdriver, which i think is what phantom uses
i can't remember now
Mingsterism
@mingsterism
Jun 26 2016 15:49
ok. will check up on that.
thanks again ross. really appreciate the help.
Ross Hinkley
@rosshinkley
Jun 26 2016 15:50
np
Ross Hinkley
@rosshinkley
Jun 26 2016 16:00
ah, right. phantomjs is a specialized webkit build
i had the testing direction backwards :P
you can test from selenium because phantom has an implementation of the webdriver protocol embedded in it
Mingsterism
@mingsterism
Jun 26 2016 16:06
sry not able to fully understand everything you said. lol.
Ross Hinkley
@rosshinkley
Jun 26 2016 16:07
lol, that's okay
Phantom is web-driver-able
Mingsterism
@mingsterism
Jun 26 2016 16:07
but is phantom js headless, and nightmare is not?
Ross Hinkley
@rosshinkley
Jun 26 2016 16:07
yeah, phantom doesn't require any specialty UI framework parts
Mingsterism
@mingsterism
Jun 26 2016 16:07
does that mean nightmarejs resource usage is more than phantomjs?
Ross Hinkley
@rosshinkley
Jun 26 2016 16:08
nightmare relies on having a framebuffer available
Mingsterism
@mingsterism
Jun 26 2016 16:08
sry framebuffer?
Ross Hinkley
@rosshinkley
Jun 26 2016 16:08
eg, X11 or Xvfb
Mingsterism
@mingsterism
Jun 26 2016 16:09
ah i see.
Ross Hinkley
@rosshinkley
Jun 26 2016 16:09
hopefully that won't be the case forever, chromium has been working on that problem for a while
check this comment out
Mingsterism
@mingsterism
Jun 26 2016 16:16
cool. will check that out. thanks
Ross Hinkley
@rosshinkley
Jun 26 2016 16:17
also
since we're out in the weeds
segmentio/nightmare#593 has some loose talk about using a single Electron instance for multiple browser windows
it's on my list :)
Mingsterism
@mingsterism
Jun 26 2016 16:24
nice will check that out.
btw, one thing. regarding the code var self=this . could i change it to let self=this. any side effects?
Ross Hinkley
@rosshinkley
Jun 26 2016 16:26
not that i can think of offhand
let defines a variable to the containing block's scope
which in this case ... should be just fine
Mingsterism
@mingsterism
Jun 26 2016 16:26
yeah. it seems fine with testing. alright will just give it a go.
cheers.
Ross Hinkley
@rosshinkley
Jun 26 2016 16:26
:)
Mingsterism
@mingsterism
Jun 26 2016 16:35
@rosshinkley hey Ross. for mongoose/mongo, are we meant to create a new document for every data point (eg: 1 mil data = 1 mil documents? )
looking at the docs, they mention

var Kitten = mongoose.model('Kitten', kittySchema);
A model is a class with which we construct documents. In this case, each document will be a kitten with properties and behaviors as declared in our schema. Let's create a kitten document representing the little guy we just met on the sidewalk outside:

var silence = new Kitten({ name: 'Silence' });
console.log(silence.name); // 'Silence'

Ross Hinkley
@rosshinkley
Jun 26 2016 16:36
yeah, you generally don't want to store everything in a single document
Mingsterism
@mingsterism
Jun 26 2016 16:36
should each data point be a single document?
Ross Hinkley
@rosshinkley
Jun 26 2016 16:37
memory serving, the size of the document is capped at something like 16 or 32mb
yeah
Mingsterism
@mingsterism
Jun 26 2016 16:38
so if im paginating through pages,
each page, i will grab all the titles. i must do a for loop for each title? create a new document within the loop?
eg :i grab 20 links per page. while on that page, loop through each of the 20 links, and create a new document for each. does that sound correct?
Ross Hinkley
@rosshinkley
Jun 26 2016 16:40
i guess? I don't know what your requirements are
Mingsterism
@mingsterism
Jun 26 2016 16:41
lets say im paginating through a job site. each page has 20 jobs > 20 links
i am grabbing all the links and paginating through each page. so i must create a sub loop on each page, to go through each link on each page?
to create a new mongo document for each job.
Ross Hinkley
@rosshinkley
Jun 26 2016 16:43
seems reasonable enough
are you going to go back later to the job links and put more data into the job document?
Mingsterism
@mingsterism
Jun 26 2016 16:44
hmm. probably not.
but if i wanted to could i?
Ross Hinkley
@rosshinkley
Jun 26 2016 16:44
i don't see why not
Mingsterism
@mingsterism
Jun 26 2016 16:47
@rosshinkley ross this was the actual mongoose code from the quick-start docs.
var Kitten = mongoose.model('jaberwaki', kittySchema);
var silence = new Kitten({name: 'Silence'})
console.log('saving silence now ...');
silence.save(function(err, fluffy) {
    console.log('evaluating silence now....');
    if (err) return console.error(err);
    silence.speak();
})
they manually created a new document var silence
how can i automate this creation.
array1 = [1, 2, 3, 4, 5, 6, 7]
var jobs = mongoose.model('jobs', jobSchema)
for (let loop=0; loop < array1.length(); loop++) {
    var loop = new jobs({title: array1[loop]});
}
i was thinking something like this, but not too sure. because my script gives me array1 right now.
i'm not too sure how to instantiate a new document with var for every element in array1
Ross Hinkley
@rosshinkley
Jun 26 2016 16:51
you are already, although reusing loop like that probably isn't a great idea
also doing this synchronously is going to cause problems when you want to call .save()
Mingsterism
@mingsterism
Jun 26 2016 16:52
but this cant work. because im doing var loop is var 1 = new jobs({...})
var 1
var2
var3
in regards to save, could it be like
for (let loop=0; loop < array1.length(); loop++) {
    var loop = new jobs({title: array1[loop]});
    loop.save()
}
Ross Hinkley
@rosshinkley
Jun 26 2016 17:05
but save isn't a synchronous operation
Mingsterism
@mingsterism
Jun 26 2016 17:40
could i promise.resolve()
@rosshinkley
or any other suggestion apart from using for-loops
Ross Hinkley
@rosshinkley
Jun 26 2016 17:42
you could certainly prop up promises to handle saving
i think mongoose returns a promise out of the box
Mingsterism
@mingsterism
Jun 26 2016 17:43
sry what did you mean by that.
oh ok.
you mean mongoose instance?
Ross Hinkley
@rosshinkley
Jun 26 2016 17:43
the save method on a model instance returns a promise
i'm.... somewhat sure
and in that case, you might want to consider map/reduce
Mingsterism
@mingsterism
Jun 26 2016 17:44
yes. save does return a promise.
ah i see.
Ross Hinkley
@rosshinkley
Jun 26 2016 17:45
or... is it Promise.all?
i think?
since you don't need to maintain order
Mingsterism
@mingsterism
Jun 26 2016 17:45
ah. thats a good idea. did you mean mapping save to my array. then Promise.all() the array?
Ross Hinkley
@rosshinkley
Jun 26 2016 17:46
yep
Mingsterism
@mingsterism
Jun 26 2016 17:46
thats genius :) thanks. didnt think of that. lol
Mingsterism
@mingsterism
Jun 26 2016 17:55
@rosshinkley must i always create a new document object.
because in the docs, they have this
var Kitten = mongoose.model('Kitten', kittySchema);
var silence = new Kitten({ name: 'Silence' });
console.log(silence.name); // 'Silence'
if i just did Promise.all(titles.map(save) is that fine though.
Mingsterism
@mingsterism
Jun 26 2016 18:31
hey ross. my portion of my code is here. http://hastebin.com/upazifidof.js
my output when i search my database is this.
> show collections
data4
> db.data4.find()
{ "_id" : ObjectId("57701ef03581d4740122c68c"), "__v" : 0 }
{ "_id" : ObjectId("57701ef33581d4740122c68d"), "__v" : 0 }
{ "_id" : ObjectId("57701ef53581d4740122c68e"), "__v" : 0 }
{ "_id" : ObjectId("57701ef83581d4740122c68f"), "__v" : 0 }
{ "_id" : ObjectId("57701efa3581d4740122c690"), "__v" : 0 }
>
so it did connect, and it did create documents. but its empty. meaning there must be some error at line 21 im guessing.
any ideas. thanks very much. @rosshinkley
Ross Hinkley
@rosshinkley
Jun 26 2016 18:33
uh, first guess
let newDoc = new self.model({title: 'fluffy'})
but your schema defines it as...
mongoose.Schema({job: String})
and by default, mongoose will drop non-model members when saving
in other words, you have a naming mismatch
Mingsterism
@mingsterism
Jun 26 2016 18:34
ahhh.
let me try.
hope that solves it.
Ross Hinkley
@rosshinkley
Jun 26 2016 18:35
i still don't think you're going to have great luck with a synchronous for loop
fwiw
Mingsterism
@mingsterism
Jun 26 2016 18:36
it did work :) but yeah. i tried with Promise.map(save). the thing is i need to create a new document every time though.
i need to let newDoc = new ...
Ross Hinkley
@rosshinkley
Jun 26 2016 18:36
... you're doing that now
Mingsterism
@mingsterism
Jun 26 2016 18:37
yeah. but wanted to implement your idea earlier. promise.all(jobs.map(save).
Ross Hinkley
@rosshinkley
Jun 26 2016 18:37
yeah
so
off the cuff....
Promise.all(titles.map((title) => new self.model({title: title}).save())).then(...)
something like that?
Mingsterism
@mingsterism
Jun 26 2016 18:38
i seee. man. thanks ross.
that should work.
Ross Hinkley
@rosshinkley
Jun 26 2016 18:39
np :)
Mingsterism
@mingsterism
Jun 26 2016 18:39
feel so bad. just getting answers off you. :)
Ross Hinkley
@rosshinkley
Jun 26 2016 18:39
that might not be 100% right, but should be enough to get you going
nah, i'm happy to help
Mingsterism
@mingsterism
Jun 26 2016 18:40
cheers