These are chat archives for FreeCodeCamp/DataScience

17th
Oct 2017
Akash Malla
@mallaakash
Oct 17 2017 01:43
@erictleung thanks .
but there is no specific format of writing resumes ...so the resumes i received does not follow any standard ...so its becoming tricky for me ....to extract...can you suggest something in this senario.
CamperBot
@camperbot
Oct 17 2017 01:43
mallaakash sends brownie points to @erictleung :sparkles: :thumbsup: :sparkles:
:cookie: 553 | @erictleung |http://www.freecodecamp.com/erictleung
Davide Andreazzini
@david1983
Oct 17 2017 08:28
Davide Andreazzini
@david1983
Oct 17 2017 08:37
you can run the following function against your resumes and then use the entities

# Imports the Google Cloud client library
from google.cloud import language
from google.cloud.language import enums
from google.cloud.language import types

def entity_sentiment_text(text):
    """Detects entity sentiment in the provided text."""
    client = language.LanguageServiceClient()

    if isinstance(text, six.binary_type):
        text = text.decode('utf-8')

    document = types.Document(
        content=text.encode('utf-8'),
        type=enums.Document.Type.PLAIN_TEXT)

    # Detect and send native Python encoding to receive correct word offsets.
    encoding = enums.EncodingType.UTF32
    if sys.maxunicode == 65535:
        encoding = enums.EncodingType.UTF16

    result = client.analyze_entity_sentiment(document, encoding)

    for entity in result.entities:
        print('Mentions: ')
        print(u'Name: "{}"'.format(entity.name))
        for mention in entity.mentions:
            print(u'  Begin Offset : {}'.format(mention.text.begin_offset))
            print(u'  Content : {}'.format(mention.text.content))
            print(u'  Magnitude : {}'.format(mention.sentiment.magnitude))
            print(u'  Sentiment : {}'.format(mention.sentiment.score))
            print(u'  Type : {}'.format(mention.type))
        print(u'Salience: {}'.format(entity.salience))
        print(u'Sentiment: {}\n'.format(entity.sentiment))
evaristoc
@evaristoc
Oct 17 2017 09:17

@david1983 thanks for the introduction to google cloud language! Not sure if the approach for the @mallaakash 's case though. I haven't used the tool so I don't know if that it is applicable either.

@mallaakash

  • First of all : what are you planning to extract from the resumes exactly, if we can know? Qualifications and Skills?
  • Second: are you using a dictionary/vocabulary to compare against? Data is semi-structured, so one usual way is to have a template vocabulary that reflects your search. Something to compare against and in your case indicating the main keywords that would guide your search. That is the basis of text mining.
  • Regarding the sector of your interest, I have investigated a few companies' usual practices recently. Eg.: https://www.textkernel.com/nl/ (also in English). If I am not wrong, Textkernel would have a large database of words related to different sectors, classified accordingly. They would also implement different parsing schemes based on cumulative experience on how people would usually create a resume. So there is not one parsing, but many. The good news is that, again, data is semi-structured: no everyone writes a resume the same way but there are few standards that are widely accepted, so parsing is hard but not impossible, if you are ready to deal with false positives/negatives.
  • Based on the last comment, it is likely you will have to be prepared for manual work. This parsing will be similar to web crawlers: everything will be ok until you find an exception or that someone changed something.
  • If you have a lot of resumes, computer power, time and enough money, I would be tented to suggest you to use something like recurrent or even convolutional neural networks. They would detect the common structures of the resumes, which I estimate could help you to detect where the main keywords could be found. This is just an idea though.

Be aware that the creation of the vocabulary and the parsing could be a long term task. If you can, try to find a template somewhere. But that is why the number of companies in the sector hasn't grown exponentially - they need the data (the vocabularies, the parsing and the classifiers). Although more and more available, not everyone has it.

CamperBot
@camperbot
Oct 17 2017 09:17
evaristoc sends brownie points to @david1983 and @mallaakash :sparkles: :thumbsup: :sparkles:
:cookie: 25 | @mallaakash |http://www.freecodecamp.com/mallaakash
api offline
Nagilla Venkatesh
@nagillavenkatesh
Oct 17 2017 10:04
Hi all, Can anyone help me in ELT(Extract, Load and Transform) in Amazon Redshift
evaristoc
@evaristoc
Oct 17 2017 14:59
@nagillavenkatesh
It is SQL : https://aws.amazon.com/redshift. What do you exactly need? If you are using it it is because you are handling a lot of data, most likely structured, if I understand correctly.