Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
guillermo haas-thompson
@memoht
I spent a lot of effort trying to convert the data to a Hash first because I was more familiar with that. The experience has helped me understand Nokogiri a bit better.
Mike Dalessio
@flavorjones
OK, since you asked - having scraped many feeds and sites in my day, I do have one strong opinion about how to structure that code. Specifically, my preference is to have a clean separation between parsing and extracting the data and storing the data.
Mike Dalessio
@flavorjones
For example, with your current code, there's no way to test that the parsing is correct without also storing records in a database -- it's all done in one method, making it hard to figure out whether something is wrong with the parsing or if something is wrong with the database or ORM code.
Maybe imagine how you could take this same code, but restructure it to have one method that accepts XML and returns, e.g., an array of attribute hashes. Then a second method could accept the array of attribute hashes and update or insert records into the database.
Anyway. I'm not criticizing at all! Take this advice with a grain of salt, it's just what I've done in the past.
guillermo haas-thompson
@memoht
I agree and appreciate the input. When something goes sideways, it just does. I plan to iterate back over. I was surprised I got this to work. I didn't figure this out by reading the docs unfortunately, but more through searching and trial and error. I wish the docs covered a bit more items in detail (well more like detail helpful for newcomers. I did read through the actual RDocs as well, I just need to get better at that). Thanks, stay safe and have a great day. @flavorjones LLAP
Mike Dalessio
@flavorjones
Mike Dalessio
@flavorjones
Mike Dalessio
@flavorjones
Getting ready to ship v1.10.10 which will have precompiled Ruby 2.7 support for Windows (#2029)
Mike Dalessio
@flavorjones
CI is down for a bit, tearing down the infrastructure and rebuilding.
Tessy Joseph John
@tesssie
Is there any NewsML parser for ruby
Mike Dalessio
@flavorjones
@tesssie Sorry for the slow response. I'm not familiar with NewsML but it looks like it's a form of XML and so Nokogiri should parse it reasonably well. If you're looking for an example of how someone on the internet has done this, I googled and found https://github.com/rguiu/NewsML-to-Wordpress/blob/master/import_news.rb
Mike Dalessio
@flavorjones
Updating ci.nokogiri.org to concourse v6.5.0 this morning: https://github.com/concourse/concourse/releases/tag/v6.5.0
Mike Dalessio
@flavorjones
Update complete.
Mike Dalessio
@flavorjones
Updating ci.nokogiri.org to concourse v6.5.1: https://github.com/concourse/concourse/releases/tag/v6.5.1
Mike Dalessio
@flavorjones
... done
Mike Dalessio
@flavorjones
@jvshahid do you think you'll have a chance in the next few weeks to look at #2080? I'm happy to dig in if you don't, just want to make sure it doesn't fall through the cracks since it seems like a blocker for a v1.11 release
Julien Feltesse
@robotvert
Hi there
Before I open an issue on the repo I want to make sure I'm not missing something...
Basically I have a maven pom.xml file I need to manipulate and it works fine but whenever nokogiri parses the doc it appends an XML declaration at the very end
Is this something people have witnessed too? I tried to search for this but so far no results
Minimalistic example:
hello = Pathname("hello.xml")

puts hello.read
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
</project>
# => nil

require 'nokogiri'
# => true

doc = Nokogiri::XML(hello)

puts doc
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
</project>
<?xml version="1.0" encoding="UTF-8"?>
# => nil

puts doc.to_xml
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
</project>
<?xml version="1.0" encoding="UTF-8"?>
Julien Feltesse
@robotvert
And the NO_DECLARATION option on save trims the declaration at the top of the file while leaving the one at the bottom... :sweat_smile:
puts doc.to_xml(save_with: Nokogiri::XML::Node::SaveOptions::NO_DECLARATION)
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
</project>
<?xml version="1.0" encoding="UTF-8"?>
Mike Dalessio
@flavorjones
@robotvert I'm curious that you're passing in a Pathname. Instead, can you either pass in an IO object or a String, and see if the results change?
There's a known issue with how Pathname.read works -- it's incompatible with how IO.read works, and Nokogiri can't tell the difference. Some discussion is happening at sparklemotion/nokogiri#1821 on how to address this.
Julien Feltesse
@robotvert
oh wow such a trap
Mike Dalessio
@flavorjones
I agree, and I honestly think it should be considered a bug that all of these things pretend to be IO objects but have different semantics.
Julien Feltesse
@robotvert
yeah FTR I want with pathname because it's just super simple since it provides .read and .write
I'll give it a shot, thanks @flavorjones !
Julien Feltesse
@robotvert
As you pointed out, using an IO object works just fine, e.g.
hello = IO.new(IO.sysopen("hello.xml"))
doc = Nokogiri::XML(hello)
puts doc
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
</project>
Thanks for the pointers!
Mike Dalessio
@flavorjones
cool cool cool, glad I was able to help
Samir Sabri
@hopewise_twitter
I am having an issue related to libc and nokogiri here: phusion/passenger-docker#296 who can help?
Mike Dalessio
@flavorjones
Hi, I can try to help. Would you like to chat here, or in the github issue
@hopewise_twitter My brief advice would be to make sure you build your gems on the same system on which they're running. This error message indicates that Nokogiri was compiled on a system using a different version of glibc.
Samir Sabri
@hopewise_twitter
Thanks, but how can I know the correct version of Nokogiri for Ubuntu 18.4 ?
Mike Dalessio
@flavorjones
@hopewise_twitter I'm not sure I understand your question, and I'm not familiar with anything that would be a "correct version". What I'm saying above is that you should compile Nokogiri (which is done at installation-time ... so you can read this as "install nokogiri") on the same system on which you're running Nokogiri. This error message indicates to me that you compiled (installed) Nokogiri on a different system from the one it's running on.
Samir Sabri
@hopewise_twitter
Yes, you are right, I have fixed the issue. Thanks
Mike Dalessio
@flavorjones
Updating ci.nokogiri.org, might be momentary downtime in the next few minutes
Will Wharton
@wartron
hey guys, im dealing with some emails breaking rendering in outlook.. i dont think this is Nokogiris fault, but did notice after we use it to add UTMs to emails the a change to the following line
image.png
i know its outpu is valid and formatted more appropriate, but was wondering if our email desinger purposely wanted this to be 1 line for dealing with outlook rendering, and if anyone had an idea to prevent this
Mike Dalessio
@flavorjones
Hi Will. Nokogiri is just a wrapper around the underlying parsing engine (either libxml2 if you're using CRuby, or xerces/nekohtml if you're using JRuby). So often there will be formatting changes that are beyond Nokogiri's control when parsing and then serializing a document.
In this case, I don't think you have a problem -- these kinds of magic comments can be multiple-line (whitespace is not semantic in HTML or XML!). For an example, see https://stackoverflow.design/email/base/mso/
@wartron didn't tag you in my reply above, so tagging you now
Will Wharton
@wartron
@flavorjones Thanks for the reply, i figured as much, and keep trying to explain to my co-workers that we need to use litmus.com or something to test rendering html emails in all the variants of outlook.
Mike Dalessio
@flavorjones
@/all I'm going to stop monitoring this channel, and I'm going to remove the github and CI integrations. If you'd like to chat, please move over to this Ruby Discord server: https://discord.gg/UyQnKrT
The Discord chat channel is #nokogiri-💎 at https://discord.gg/UyQnKrT
Akinori MUSHA
@knu
👍