Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
    Stas Kaino
    @StasKaino_twitter
    some of them still do...
    like this one
    11 Feb 2016 : Column 1715
    ale rimoldi
    @aoloe
    as i wrote above, the text is not trivial to parse.
    you can easily get all speakers plus some noise.
    you can probably also automatically remove most if not all the noise.
    as an example by removing the lines that start with a number... if this is true for all lines in the document.
    Stas Kaino
    @StasKaino_twitter
    i wonder what is wrong with the code. the thing is the first observation is 11 Feb 201 becasue of the elif section for the first observations. Hence it doesnt find h3 but does then find p. But thats why I place if len(topic) > 0, for the obeservations before the for loop reaches date (h3) the len(topic) should be 0...
    ale rimoldi
    @aoloe
    i have the feeling that the code i posted before mostly works...
    if you have specific issues, please upload the code so that i can check it...
    ale rimoldi
    @aoloe
    ok, i'm leaving for the evening... i'll be back tomorrow!
    have a nice evening
    Stas Kaino
    @StasKaino_twitter
    Hello Ale
    Just send an email....thanks
    ale rimoldi
    @aoloe
    hoi
    Stas Kaino
    @StasKaino_twitter
    i am trying your code
    but something is not working
    this was exactly was it was trying to do
    but i didnt know the correct syntax
    can you please help me that it works?
    it gives me this error
    TypeError: cannot use a string pattern on a bytes-like object
    Stas Kaino
    @StasKaino_twitter
    Dear Ale, I am simply wondering how can i use re.sub() procedure on content - how this content needs to be prepared for this cleanup?
    ok I got it
    i had to remove utf-8 encoding. uff.....
    Stas Kaino
    @StasKaino_twitter
    it works but it still does not catch the lines...
    this section:
      I am very much aware of the requests for the last two debates. We are discussing that and will seek to find the best way of making sure it can happen.
      <br/>
     </p>
     <a class="anchor-column noCont" name="column_1754">
     </a>
     <p>
      <b>
       11 Feb 2016 : Column 1754
      </b>
     </p>
     <p>
      As for the business on Tuesday week, there should be plenty of time available. We have consideration of two sets of Lords amendments, but I am confident that there would be time for a debate to take place on that day. Looking back at the experience of the past few weeks, it has tended to work okay, but I continue to keep the matter under review.
    the one with 11 Feb 2016 - is still there. I wonder why?
    ale rimoldi
    @aoloe
    seems to work here... just make sure that you have all the spaces correct.
    Stas Kaino
    @StasKaino_twitter
    Ale, can you please explain what for we apply the section <br/>\n\n</p> in re.sub(r'<br/>\n\n</p><a class="anchor-column noCont" name="column_\d{4}"> </a><p><b>\d{2} \w{3} \d{4} : Column\d{4}</b></p><p>', '', content) ?
    also, do you know how to search text in the console? it searches but nothing is highlighted in conda
    is not working
    if you look at the end of the output and search for 11 Feb 2016 : Column 1754 - this thing is still there
    Stas Kaino
    @StasKaino_twitter
    maybe you could guide me to some resource where I could read in simple language how is re.sub organised? I am not sure where the spaces are wrong
    ale rimoldi
    @aoloe
    if i replace my sub through your sub i get a different result...
    Stas Kaino
    @StasKaino_twitter
    can you please repeat? I dont understand
    maybe you can just show me on my code why this is not working?
    ale rimoldi
    @aoloe
    you're missing a space after Column
    i've put your sub() in my code and i also got the wrong result.
    i was looking for the difference and found it after Column... i hope it was the only difference
    Stas Kaino
    @StasKaino_twitter
    do you think I need this at the beginning?
    <br/>\n\n</p>
    you mean we need use: re.sub(r'<a class="anchor-column noCont" name="column_\d{4}"> </a><p><b>\d{2} \w{3} \d{4} : Column \d{4}</b></p><p>', '', content) ?
    Stas Kaino
    @StasKaino_twitter
    ok.. space did work, but I still wonder if we need <br/>\n\n</p>
    ale rimoldi
    @aoloe
    you don't need that. but if you remove them, you'll have a couple of new lines in the string... which is not very nice.
    Stas Kaino
    @StasKaino_twitter
    I think it is better not to touch this as it corresponds to previous tags
    Stas Kaino
    @StasKaino_twitter
    Ale do you know a way I can make the output on the console to show every item list on the new line?