by

Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
  • Sep 28 17:24
    ashmaroli commented #2087
  • Sep 28 17:22
    ashmaroli commented #2087
  • Sep 28 17:16
    ashmaroli commented #2087
  • Sep 28 17:11
    flavorjones labeled #2087
  • Sep 28 16:51
    flavorjones commented #2087
  • Sep 28 16:49
    flavorjones commented #2087
  • Sep 28 14:55
    AppVeyorBot commented #2087
  • Sep 28 14:53
  • Sep 28 14:39
    codeclimate[bot] commented #2087
  • Sep 28 14:26
    ashmaroli opened #2087
  • Sep 28 12:27
    flavorjones milestoned #2086
  • Sep 28 12:27
    flavorjones labeled #2086
  • Sep 28 12:27
    flavorjones opened #2086
  • Sep 28 11:26
    flavorjones commented #2075
  • Sep 28 11:26
    flavorjones milestoned #1952
  • Sep 28 11:25
    flavorjones commented #1952
  • Sep 27 17:24
    BL4CKH47H4CK3R starred sparklemotion/nokogiri
  • Sep 27 13:12
    stayhero commented #1952
  • Sep 27 08:11
    bruno- commented #2075
  • Sep 26 17:14
    gernest starred sparklemotion/nokogiri
Qqwy / Wiebe-Marten
@Qqwy

When doing the following:

Nokogiri::HTML.fragment("<a href='https://foo.com?a=1&b=2").to_s                                                    
# => "<a href=\"https://foo.com?a=1&amp;b=2\"></a>"

in the output, the ampersand is escaped

Am I doing something wrong here?
(My real use-case is iterating over all a[href]s in the document and altering the URLS)
Nokogiri::HTML("<a href='https://foo.com'>foo</a>").search("//a").each do |n| n.attributes["href"].value = "https://foo.com?q=a&x=y" end.to_s
Qqwy / Wiebe-Marten
@Qqwy
Hmm, I learned something new today!
Turns out that ampersands should always be escaped inside URLs.
I only hope that no double escaping will happen, where &amp; is expanded into &amp;amp; in this example
Shlomi Fish
@shlomif
hi all! The "tutorials" link here is broken - https://nokogiri.org/
Mike Dalessio
@flavorjones
expect a few minutes of CI downtime, updating to https://github.com/concourse/concourse/releases/tag/v6.0.0
guillermo haas-thompson
@memoht
I am strugglebussing to parse an XML feed with Nokogiri to create records in a Rails database. I've tried multiple times over the years to get an XML feed to parse and managed to avoid it by going other routes (CSV import, JSON files, hitting API). I have a new task for a side project that is forcing me to revisit XML parsing. I think my example is straightforward enough, and wondering if anyone has a few to help me through this. [Simplified example: https://gist.github.com/memoht/1dc78f0f005abbb8d01267519ce386f1]
Mike Dalessio
@flavorjones
@memoht Hi there! Sorry for the slow response, have been moving onto a new laptop and missed the notifications. I'm happy to try to help! FWIW I can parse the XML in your gist just fine using Nokogiri::XML(xml) ... can you be more specific about what you're trying to do?
guillermo haas-thompson
@memoht
My end goal is to parse the feed via. rake task a couple times a day, and either create or update records in the Rails app by the referencenumber field in the XML. That is in the 2nd file of the gist.
guillermo haas-thompson
@memoht
Current state > Now able to create new Job records and skip if record exists (searching by referencenumber in XML field) but still unable to update existing records. So, progress-ish.
guillermo haas-thompson
@memoht
@flavorjones Gist updated with current state of affairs (the horror, look away) https://gist.github.com/memoht/e693d8bffc433e8d63d8cbc8d2ceebe0
guillermo haas-thompson
@memoht
Current state > Got it working via some assist, but not the most efficient. Taking advantage of first_or_initialize >> would still love to see a cleaner way of achieving this.
Mike Dalessio
@flavorjones
@memoht I'm still not sure what you need help with? The code you're using to parse each job record seems fine and similar to how I would approach the problem. Can you be more specific about what you're looking for help with?
guillermo haas-thompson
@memoht
@flavorjones TBH, I didn't know if my code was the cleanest. The section where it tries to create a new record, or update if record already exists (searching by ref_no field) felt like I didn't do it so well. It works, but I was wondering if there was a more efficient approach.
I spent a lot of effort trying to convert the data to a Hash first because I was more familiar with that. The experience has helped me understand Nokogiri a bit better.
Mike Dalessio
@flavorjones
OK, since you asked - having scraped many feeds and sites in my day, I do have one strong opinion about how to structure that code. Specifically, my preference is to have a clean separation between parsing and extracting the data and storing the data.
Mike Dalessio
@flavorjones
For example, with your current code, there's no way to test that the parsing is correct without also storing records in a database -- it's all done in one method, making it hard to figure out whether something is wrong with the parsing or if something is wrong with the database or ORM code.
Maybe imagine how you could take this same code, but restructure it to have one method that accepts XML and returns, e.g., an array of attribute hashes. Then a second method could accept the array of attribute hashes and update or insert records into the database.
Anyway. I'm not criticizing at all! Take this advice with a grain of salt, it's just what I've done in the past.
guillermo haas-thompson
@memoht
I agree and appreciate the input. When something goes sideways, it just does. I plan to iterate back over. I was surprised I got this to work. I didn't figure this out by reading the docs unfortunately, but more through searching and trial and error. I wish the docs covered a bit more items in detail (well more like detail helpful for newcomers. I did read through the actual RDocs as well, I just need to get better at that). Thanks, stay safe and have a great day. @flavorjones LLAP
Mike Dalessio
@flavorjones
Mike Dalessio
@flavorjones
Mike Dalessio
@flavorjones
Getting ready to ship v1.10.10 which will have precompiled Ruby 2.7 support for Windows (#2029)
Mike Dalessio
@flavorjones
CI is down for a bit, tearing down the infrastructure and rebuilding.
Tessy Joseph John
@tesssie
Is there any NewsML parser for ruby
Mike Dalessio
@flavorjones
@tesssie Sorry for the slow response. I'm not familiar with NewsML but it looks like it's a form of XML and so Nokogiri should parse it reasonably well. If you're looking for an example of how someone on the internet has done this, I googled and found https://github.com/rguiu/NewsML-to-Wordpress/blob/master/import_news.rb
Mike Dalessio
@flavorjones
Updating ci.nokogiri.org to concourse v6.5.0 this morning: https://github.com/concourse/concourse/releases/tag/v6.5.0
Mike Dalessio
@flavorjones
Update complete.
Mike Dalessio
@flavorjones
Updating ci.nokogiri.org to concourse v6.5.1: https://github.com/concourse/concourse/releases/tag/v6.5.1
Mike Dalessio
@flavorjones
... done
Mike Dalessio
@flavorjones
@jvshahid do you think you'll have a chance in the next few weeks to look at #2080? I'm happy to dig in if you don't, just want to make sure it doesn't fall through the cracks since it seems like a blocker for a v1.11 release
Julien Feltesse
@robotvert
Hi there
Before I open an issue on the repo I want to make sure I'm not missing something...
Basically I have a maven pom.xml file I need to manipulate and it works fine but whenever nokogiri parses the doc it appends an XML declaration at the very end
Is this something people have witnessed too? I tried to search for this but so far no results
Minimalistic example:
hello = Pathname("hello.xml")

puts hello.read
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
</project>
# => nil

require 'nokogiri'
# => true

doc = Nokogiri::XML(hello)

puts doc
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
</project>
<?xml version="1.0" encoding="UTF-8"?>
# => nil

puts doc.to_xml
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
</project>
<?xml version="1.0" encoding="UTF-8"?>
Julien Feltesse
@robotvert
And the NO_DECLARATION option on save trims the declaration at the top of the file while leaving the one at the bottom... :sweat_smile:
puts doc.to_xml(save_with: Nokogiri::XML::Node::SaveOptions::NO_DECLARATION)
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
</project>
<?xml version="1.0" encoding="UTF-8"?>
Mike Dalessio
@flavorjones
@robotvert I'm curious that you're passing in a Pathname. Instead, can you either pass in an IO object or a String, and see if the results change?
There's a known issue with how Pathname.read works -- it's incompatible with how IO.read works, and Nokogiri can't tell the difference. Some discussion is happening at sparklemotion/nokogiri#1821 on how to address this.
Julien Feltesse
@robotvert
oh wow such a trap
Mike Dalessio
@flavorjones
I agree, and I honestly think it should be considered a bug that all of these things pretend to be IO objects but have different semantics.
Julien Feltesse
@robotvert
yeah FTR I want with pathname because it's just super simple since it provides .read and .write
I'll give it a shot, thanks @flavorjones !
Julien Feltesse
@robotvert
As you pointed out, using an IO object works just fine, e.g.
hello = IO.new(IO.sysopen("hello.xml"))
doc = Nokogiri::XML(hello)
puts doc
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
</project>
Thanks for the pointers!
Mike Dalessio
@flavorjones
cool cool cool, glad I was able to help
Samir Sabri
@hopewise_twitter
I am having an issue related to libc and nokogiri here: phusion/passenger-docker#296 who can help?
Mike Dalessio
@flavorjones
Hi, I can try to help. Would you like to chat here, or in the github issue