07-18-2023, 03:22 PM
## Summary
I'm using Ruby (`ruby 2.1.2p95 (2014-05-08) [x86_64-linux-gnu]` on my machine, `ruby 1.9.3p484 (2013-11-22 revision 43786) [x86_64-linux]` in production environment) and Nori to convert an XML document (initially processed with Nokogiri for some validation) into a Ruby Hash, but I later discovered that Nori is dropping the attributes of the deepest XML elements.
## Issue Details and Reproducing
To do this, I'm using code similar to the following:
xml = Nokogiri::XML(File.open('file.xml')) { |config| config.strict.noblanks }
hash = Nori.new.parse xml.to_s
The code generally works as intended, except for one case. Whenever Nori parses the XML text, it drops element attributes from the leaf elements (i.e. elements that have no child elements).
For example, the following document:
<?xml version="1.0"?>
<root>
<objects>
<object>
<fields>
<id>1</id>
<name>The name</name>
<description>A description</description>
</fields>
</object>
</objects>
</root>
...is converted to the expected Hash (some output omitted for brevity):
irb(main):066:0> xml = Nokogiri::XML(txt) { |config| config.strict.noblanks }
irb(main):071:0> ap Nori.new.parse(xml.to_s), :indent => -2
{
"root" => {
"objects" => {
"object" => {
"fields" => {
"id" => "1",
"name" => "The name"
"description" => "A description"
}
}
}
}
}
The problem shows up when element attributes are used on elements with no children. For example, the following document is *not* converted as expected:
<?xml version="1.0"?>
<root>
<objects>
<object id="1">
<fields>
<field name="Name">The name</field>
<field name="Description">A description</field>
</fields>
</object>
</objects>
</root>
The same `Nori.new.parse(xml.to_s)`, as displayed by `awesome_print`, shows the attributes of the deepest `<field>` elements are *absent*:
irb(main):131:0> ap Nori.new.parse(xml.to_s), :indent => -2
{
"root" => {
"objects" => {
"object" => {
"fields" => {
"field" => [
[0] "The name",
[1] "A description"
]
},
"@id" => "1"
}
}
}
}
The Hash only has their values as a list, which is *not* what I wanted. I expected the `<field>` elements to retain their attributes just like their parent elements (e.g. see `@id="1"` for `<object>`), not for their attributes to get chopped off.
Even if the document is modified to look as follows, it still doesn't work as expected:
<?xml version="1.0"?>
<root>
<objects>
<object id="1">
<fields>
<Name type="string">The name</Name>
<Description type="string">A description</Description>
</fields>
</object>
</objects>
</root>
It produces the following Hash:
{
"root" => {
"objects" => {
"object" => {
"fields" => {
"Name" => "The name",
"Description" => "A description"
},
"@id" => "1"
}
}
}
}
Which lacks the `type="whatever"` attributes for each field entry.
Searching eventually lead me to [Issue #59][1] with the last post (from Aug 2015) stating he can't "find the bug in Nori's code."
## Conclusion
So, **my question is:** Are any of you aware of a way to work around the Nori issue (e.g. perhaps a setting) that would allow me to use my original schema (i.e. the one with attributes in elements with no children)? If so, can you share a code snippet that will handle this correctly?
I had to re-design my XML schema and change code at about three times to make it work, so if there's a way to get Nori to behave, and I'm simply not aware of it, I'd like to know what it is.
I'd like to *avoid* installing more libraries as much as possible just to get this working properly with the schema structure I originally wanted to use, but I'm open to the possibility if it's proven to work. (I'd have to re-factor the code once again...) Frameworks are definitely overkill for this, so please: do *not* suggest [Ruby on Rails][2] or similar full-stack solutions.
Please note that my current solution, based on a (reluctantly) redesigned schema, is working, but it's more complicated to generate and process than the original one, and I'd like to go back to the simpler/shallower schema.
[1]:
[2]:
I'm using Ruby (`ruby 2.1.2p95 (2014-05-08) [x86_64-linux-gnu]` on my machine, `ruby 1.9.3p484 (2013-11-22 revision 43786) [x86_64-linux]` in production environment) and Nori to convert an XML document (initially processed with Nokogiri for some validation) into a Ruby Hash, but I later discovered that Nori is dropping the attributes of the deepest XML elements.
## Issue Details and Reproducing
To do this, I'm using code similar to the following:
xml = Nokogiri::XML(File.open('file.xml')) { |config| config.strict.noblanks }
hash = Nori.new.parse xml.to_s
The code generally works as intended, except for one case. Whenever Nori parses the XML text, it drops element attributes from the leaf elements (i.e. elements that have no child elements).
For example, the following document:
<?xml version="1.0"?>
<root>
<objects>
<object>
<fields>
<id>1</id>
<name>The name</name>
<description>A description</description>
</fields>
</object>
</objects>
</root>
...is converted to the expected Hash (some output omitted for brevity):
irb(main):066:0> xml = Nokogiri::XML(txt) { |config| config.strict.noblanks }
irb(main):071:0> ap Nori.new.parse(xml.to_s), :indent => -2
{
"root" => {
"objects" => {
"object" => {
"fields" => {
"id" => "1",
"name" => "The name"
"description" => "A description"
}
}
}
}
}
The problem shows up when element attributes are used on elements with no children. For example, the following document is *not* converted as expected:
<?xml version="1.0"?>
<root>
<objects>
<object id="1">
<fields>
<field name="Name">The name</field>
<field name="Description">A description</field>
</fields>
</object>
</objects>
</root>
The same `Nori.new.parse(xml.to_s)`, as displayed by `awesome_print`, shows the attributes of the deepest `<field>` elements are *absent*:
irb(main):131:0> ap Nori.new.parse(xml.to_s), :indent => -2
{
"root" => {
"objects" => {
"object" => {
"fields" => {
"field" => [
[0] "The name",
[1] "A description"
]
},
"@id" => "1"
}
}
}
}
The Hash only has their values as a list, which is *not* what I wanted. I expected the `<field>` elements to retain their attributes just like their parent elements (e.g. see `@id="1"` for `<object>`), not for their attributes to get chopped off.
Even if the document is modified to look as follows, it still doesn't work as expected:
<?xml version="1.0"?>
<root>
<objects>
<object id="1">
<fields>
<Name type="string">The name</Name>
<Description type="string">A description</Description>
</fields>
</object>
</objects>
</root>
It produces the following Hash:
{
"root" => {
"objects" => {
"object" => {
"fields" => {
"Name" => "The name",
"Description" => "A description"
},
"@id" => "1"
}
}
}
}
Which lacks the `type="whatever"` attributes for each field entry.
Searching eventually lead me to [Issue #59][1] with the last post (from Aug 2015) stating he can't "find the bug in Nori's code."
## Conclusion
So, **my question is:** Are any of you aware of a way to work around the Nori issue (e.g. perhaps a setting) that would allow me to use my original schema (i.e. the one with attributes in elements with no children)? If so, can you share a code snippet that will handle this correctly?
I had to re-design my XML schema and change code at about three times to make it work, so if there's a way to get Nori to behave, and I'm simply not aware of it, I'd like to know what it is.
I'd like to *avoid* installing more libraries as much as possible just to get this working properly with the schema structure I originally wanted to use, but I'm open to the possibility if it's proven to work. (I'd have to re-factor the code once again...) Frameworks are definitely overkill for this, so please: do *not* suggest [Ruby on Rails][2] or similar full-stack solutions.
Please note that my current solution, based on a (reluctantly) redesigned schema, is working, but it's more complicated to generate and process than the original one, and I'd like to go back to the simpler/shallower schema.
[1]:
[To see links please register here]
"Issue #53"[2]:
[To see links please register here]