These are chat archives for FreeCodeCamp/DataScience

3rd
Oct 2016
evaristoc
@evaristoc
Oct 03 2016 11:55

People

Dynamic Topic Modeling (wikipedia)

... will be my next stop, I think.

Quickly check only the summaries of these papers if you are interested in potential uses:

This, combined with SNA could make an interesting combination when analysing how topics are started and adopted by the audience. Combined with Lifecycle analysis also incorporates aspects of the relation between the permanence of actors and the lifetime of topics.

Other references of Dynamic Topic Modelling:

@jacqueline-homan Good luck with everything, really! Hope you find a pleasant place around us to discuss all aspects of Data Science (including the political/social impact) as well as your progress. We are REALLY interested.
Jacqueline S. Homan
@jacqueline-homan
Oct 03 2016 17:38

@evaristoc All good stuff and cool tools :smile: F# has a lot of cool tools too. But the coolest tool is our willingness to question whether or not we are starting with an accurate data model in the first place. It is perfectly acceptable to question everything when dealing with the raw data that was aggregated and getting other engineers to weigh in with questions about it, too. (Again, this all boils down to asking the right questions - some of us might miss something vital that some others of us catch).

A silly and trite example would be to imagine the application of all these cool data science tools to a badly flawed raw data model -> treating the data model of a fruit and nut cake the same way we would treat the data model of a Ford truck brake assembly. The former is a recursive data tree while the latter is more like a binary tree. Incorrectly modeling the former as the latter would result in fatalities due to food allergies.

So bottom line: Never feel that you can't question the raw data you are given to work with and re-model it if necessary.

Jacqueline S. Homan
@jacqueline-homan
Oct 03 2016 19:49

This post by @Lightwaves bears repeating:

We can either create a biased model without realizing it or our data can be biased
So we have to be hyper-aware of that

And this is why making sure we're starting out with accurate raw data is so important.

For example, what does our raw data model actually reflect, or is it a flawed raw data set to begin with?

Is race really a predictor of recidivism rates or is it a more systemic non-race based issue such as the only economic options for survival being "bad choice" vs. "worse choice" for those with criminal records and juxtaposing that against the backdrop of insufficient opportunities for the general population of non-offenders in a shrinking jobs market (due in large part to technology and globalism making a significant part of the job-seeker pool redundant - or as Karl Marx referred to as "surplus labor").

Asking the broader question of "what happens to those who've been economically left out in post-Welfare Reform America" would require us to put on our researcher hats to gain more knowledge for responsible data aggregation and application of statistics and other data science tools.

Three significant and much overlooked facts:

  1. In the US, as horrible as the whole privatized for-profit Prison Industrial Complex is, prison is often the only option for obtaining food, housing, and medical care for a growing number of people.

  2. The overwhelming majority of the people in prison are people from poverty. (Does this mean that poor people are more inherently criminal and anti-social than the more advantaged and privileged? Does this mean that we're seeing the result of hopelessness and despair due to a real lack of opportunities and a real lack of a legit social safety net? How many offenders were actually innocent but couldn't afford lawyers and DNA testing that would prove they were wrongfully incarcerated and then rendered 100% unemployable due to having records after release on parole, etc? )

  3. Whether someone is innocent of the crime they're arrested for or not, pleading guilty can (and often does) mean the difference between doing only a few years prison time vs getting slapped with a sentence where you won't be eligible for release until you're way too old to work (even if you could find someone willing to hire you as an ex-felon).

The information coming back to us from those who volunteer to teach in prisons, if we're receptive to what they have to say, is that the inmates with the shortest sentences are those who copped a plea deal - regardless if they were innocent or not. (What does this say about how our legal system works and about recidivism?)

Christopher Hedges, who teaches literary skills to prison inmates, brings these little known facts up several times.

Which is why the first question anyone in data science should be asking is the proverbial grail question: "who is benefiting?"