Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
  • Aug 15 11:17
    FrednandFuria opened #82
  • Jun 20 21:19
    @bjorno43 banned @shenerd140
  • May 10 09:13
    @bjorno43 banned @zhaokunhaoa
  • Apr 27 19:48
    @mstellaluna banned @zhonghuacx
  • Apr 25 17:07
    @mstellaluna banned @cmal
  • Jan 08 22:07
    @mstellaluna banned @gautam1858
  • Jan 08 22:05
    @mstellaluna banned @dertiuss323
  • Dec 15 2018 23:34
    @mstellaluna banned @Julianna7x_gitlab
  • Oct 12 2018 05:50
    @bjorno43 banned @NACH74
  • Oct 05 2018 23:02
    @mstellaluna banned @JomoPipi
  • Sep 16 2018 12:21
    @bjorno43 banned @yash-kedia
  • Sep 16 2018 12:16
    @bjorno43 banned @vnikifirov
  • Sep 05 2018 08:13
    User @bjorno43 unbanned @androuino
  • Sep 05 2018 07:38
    @bjorno43 banned @androuino
  • Aug 23 2018 16:58
    User @bjorno43 unbanned @rahuldkjain
  • Aug 23 2018 16:23
    @bjorno43 banned @rahuldkjain
  • Jul 29 2018 14:15
    User @bjorno43 unbanned @jkyereh
  • Jul 29 2018 01:00
    @bjorno43 banned @jkyereh
  • Jul 10 2018 22:09
    @bjorno43 banned @manafn
  • Jul 06 2018 15:23
    @texas2010 banned @imlegend19
Alice Jiang
@becausealice2
I don't know anything about stringindexer off the top of my head, I think one hot encoding could work @IssaMousssa
quick glance at google search results make it seem like StringIndexer could also be a solution
Vishesh Mangla
@XtremeGood
hi someone here?
and knowing tensorflow 2.0?
Alice Jiang
@becausealice2
here, yes. TF2.0, not so much I'm afraid :(
Vishesh Mangla
@XtremeGood
hi can I get some help about posting on stackoverflow
?
Again and again I ' m getting error on formatting.
Alice Jiang
@becausealice2
what error?
Janus Reith
@janus-reith

Hi, I'm pretty new to ML and trying to find out if my use case is already implemented and available to use.
Specifically I'd search something js based to use with node, like brain.js or tensorflow-js.

Im looking for a specific usecase to implement: I have a text and want to extract multiple specifications from that.
Probably it will use LSTM, but I would need multiple outputs for my input

It would be awesome if it could work in such a way:
Input: "Intel Core i7-7500U 2,70 GHz, 16 GB DDR4"
Output: { cpuManufactor: "Intel", cpuClockRate: "2,70", "memorySize": 16, "memoryType": "DDR4" }

The input wouln't always have the same length or contain the same information.

If it wouldn't work this way I could settle for a less dynamic pattern like:
Output: { cpuManufactorIntel: 1, cpuClockRateIs270: 1, memorySizeIs16: 1 memoryTypeIsDDR4: 1 }

However, getting near the first example would be awesome.

As the concept behind this text classification is pretty general and not related to my specific data I believe something similar to this must be implemented somewhere already.
However I didn't really find anything yet.

Thanks in advance for your help!

Alice Jiang
@becausealice2
I've never seen anything like that specific example, @janus-reith but it sounds like you're looking for some type of feature extraction?
Janus Reith
@janus-reith
@becausealice2 Yeah, I had to to look it up, but this seems to be what I am looking for
Janus Reith
@janus-reith

IMHO this should work without a dictionary and without actual language processing.

Just that I have enough training data similar to "Intel Core i7-7500U 2,70 GHz, 16 GB DDR4" resulting in { cpuClockRate: "2,70", "memorySize": 16 }
or similar examples like "Intel Core i7-2500U 1,80 GHz, 8 GB DDR4" resulting in { cpuClockRate: "1,80", "memorySize": 8 } so new Entries could be matched with a certain similarity

Eric Leung
@erictleung
@janus-reith it sounds like you could benefit more from using regular expressions that having to delve into deep learning. The space seems fairly constrained (e.g., you probably know all of the companies that make CPUs and can search for that string). Then once your regular expressions fail for some reason, like typos, you can manually review those and update your regular expressions to catch the rest. That seems easier and more actionable than using deep learning. But if you're already tried regexes, then nevermind.

Just a reminder:

"Premature optimization is the root of all evil."
-- Donald Knuth

Alice Jiang
@becausealice2
@janus-reith I agree with @erictleung if all you're doing is pulling apart the data than deep learning is going to be a heavier solution than you'll need.
Janus Reith
@janus-reith
The cpu would just be an example - There would be lots of different varying specifications. I was hoping to be able make use of Deep Learning to avoid having to match each pattern separately.
Seemed like a classic use case to me.
But thanks for the hint, I'll ingestivate how far regex matching can get me here.
Still, I feel like I might have some misconcept regarding the way some ML patterns work, as I'm quite baffled that my use case is neither a typical one nor (relatively) easy to achieve.
Janus Reith
@janus-reith
Might look into something ready to use like Amazon Comprehend if there is nothing open to use and no similar example to base my efforts on.
Eric Leung
@erictleung

@janus-reith my rule of thumb on when to use deep learning is for tasks that are easy for humans to do but difficult to tell someone to do.

For example, if you think of all the typical deep learning applications, all of them are difficult to just tell someone about, like driving a car or producing art/images. But I could tell someone how to look for computer memory with some simple rules like, "If you see a number in front of 'GB' that is next to some letters like 'DDR', then use that number for memory size."

And again, the way technical specifications for computers is fairly standardized. The only thing you might have to worry about it using a comma instead of a decimal point in those numbers (1,6 versus 1.6).

Here's an example of just using regex to extract product details http://ceur-ws.org/Vol-1267/LD4IE2014_Petrovski.pdf There is no code, per se, but it gives you an idea of how it is possible.
Janus Reith
@janus-reith
@erictleung Yeah thats similar to how I got it. For a human it is is easy to reccognize these patterns (if I had a lot of lines with "Core i7" and "Core i5", it will be clear that "Core i3" is the type "i3" ).
However, explaining the logic to match all these patterns would be more difficult to explain to a human, while it would be easy for a human to understand the pattern, even without knowing the language.
I get that regex matching could be easier as the amount of different specifications will still be a limited so I could make a list of rules that could catch a high percentage of the fields I need.
Still I don't really get how this is not a designated task for ML
Eric Leung
@erictleung

@janus-reith machine learning is generally divided into two categories: supervised and unsupervised. Supervised is you have labels on data and you want to correctly assign that label to that data. In unsupervised, you're just searching for patterns. Your problem is maybe closest to supervised learning.

You've mentioned that input would be something like

"Intel Core i7-7500U 2,70 GHz, 16 GB DDR4"

and the example output would be

{ cpuClockRate: "2,70", "memorySize": 16 }

This is more clearly just normal text processing because you are simply extracting information from a set of text. In other words, the answer you're looking for is within your input data.

An appropriately used machine learning task is spam filtering in emails. There are words within the emails that hint to you that it is spam, but the task is to categorize the data rather than simply extract weird words from an email. ML is also necessary because the number and types of words you may see in emails is unconstrained (i.e., you don't know all the words someone might use in an email).

Here are some tips from Amazon on when to use ML:

  • You cannot code the rules e.g., spam or not spam. If you start seeing yourself writing a lot of rules to solve your problem, ML might help.
  • You cannot scale to manually review by a human.

Again, information extraction for your products you should be able to code most if not all the rules. And the number of samples will depend on your situation.

Zijing Zhang
@zzj0402_gitlab
import * as use from "@tensorflow-models/universal-sentence-encoder";
       ^

SyntaxError: Unexpected token *
    at Module._compile (internal/modules/cjs/loader.js:721:23)
    at Object.Module._extensions..js (internal/modules/cjs/loader.js:787:10)
    at Module.load (internal/modules/cjs/loader.js:653:32)
    at tryModuleLoad (internal/modules/cjs/loader.js:593:12)
    at Function.Module._load (internal/modules/cjs/loader.js:585:3)
    at Function.Module.runMain (internal/modules/cjs/loader.js:829:12)
    at startup (internal/bootstrap/node.js:283:19)
    at bootstrapNodeJSCore (internal/bootstrap/node.js:622:3)
node js hates * here, how to fix this?
Eric Leung
@erictleung
@zzj0402_gitlab what version of Node do you have? Some searching around suggests you need a specific version of Node, namely at least v12.2.0 https://stackoverflow.com/a/56350495/
Janus Reith
@janus-reith
@erictleung I ended up using Machine Learning to detemine the necessary Regexes - Now try to stop me : D
Janus Reith
@janus-reith
Thanks for your hints
Eric Leung
@erictleung
@janus-reith nice! :smiley:
Philip Durbin
@pdurbin
If anyone here happens to be in Boston there's a "machine learning and big data" track at this free conference on Thursday and Friday (Aug 15-16) at Boston University: https://devconfus2019.sched.com/overview/type/Machine+Learning+%26+Big+Data
ali Fazeli
@AFZL95
you can check my blog posts about my experience as a data analyst at Huawei Technologies in my portfolio: https://faze.li/
Eric Leung
@erictleung
Occasionally people stop by and ask about mathematics for machine learning. Just ran into this book that may prove useful https://mml-book.github.io/. Short-ish book of about 400 pages and it is free to download. It will also be published eventually if you want a physical copy. I ran into this while reading through this thread books to read for deep learning in case you want to dig a bit deeper.
Eric Leung
@erictleung
Oh this is pretty cool. I'm a bit late to the party, but it looks like Kaggle (known for data science prediction competitions) has a YouTube playlist of their reading group https://www.youtube.com/playlist?list=PLqFaTIg4myu8t5ycqvp7I07jTjol3RCl9
Here's the details if you wanna join in for the continuing discussions https://twitter.com/rctatman/status/1131621843188604928
Eric Leung
@erictleung
For a gem, see the definition for "data science" 🧐🙃
Alice Jiang
@becausealice2
That's hysterical! Thanks for sharing @erictleung
Vishu
@bommojuvishu
Hi Guys , I am trying to deploy the opencv in the heroku using the python flask . I am getting the following error : ImportError: libSM.so.6: cannot open shared object file: No such file or directory
Is there any way to deploy the opencv in the heroku ?
Alice Jiang
@becausealice2
Give this a try @bommojuvishu
Koderkid1936
@Koderkid1936
chance = 0
while chance <=3:
        guess = int(input("Guess: "))
        if guess == 9:
                print("try again")
               chance+1
       else:
                print("try again")
        chance+=1
print(chance) #why does this print 4 instead of 3 when you enter the integer 9?
Eric Leung
@erictleung
@Koderkid1936 your while loop will allow chance to equal 3. So it will go through the loop once more and run the line chance += 1 again, thus making chance equals to 4 as it has printed out.
@Koderkid1936 this link might help you visualize what your code is doing. It will create diagrams of what your code is doing as you go through it line by line.
Koderkid1936
@Koderkid1936
@erictleung thanks alot much appreciated, i think im starting to understand the flow of the program it executes the chance+=1 statement beofre the while loop.. i think but i will do more research thanks :thumbsup:
Eric Leung
@erictleung
Random comment. So I'm helping out with a data science boot camp in town and last time I checked, in one of their modules, they are using one of the freeCodeCamp new coder surveys as one of their datasets! So crazy that something I've had a hand in cleaning up has come full circle for me to see again :laughing:
Philip Durbin
@pdurbin
Can you please link me to that dataset? I'll take it under consideration for dataverse-sample-data. :)
Eric Leung
@erictleung
BuntyBru
@BuntyBru
Hi guys 

Can anyone mention good courses for data science for complete beginners ( person who hasn't had attachment with tech ever and is a business graduate)
I did some google search for this  
But wanted to know some personal reviews

Thanks
Philip Durbin
@pdurbin
@erictleung oh, right, we talked about this FCC dataset at freeCodeCamp/2017-new-coder-survey#7 :) Do you want to hear more about dataverse-sample-data? :)