Create an account

Very important

  • To access the important data of the forums, you must be active in each forum and especially in the leaks and database leaks section, send data and after sending the data and activity, data and important content will be opened and visible for you.
  • You will only see chat messages from people who are at or below your level.
  • More than 500,000 database leaks and millions of account leaks are waiting for you, so access and view with more activity.
  • Many important data are inactive and inaccessible for you, so open them with activity. (This will be done automatically)


Thread Rating:
  • 696 Vote(s) - 3.51 Average
  • 1
  • 2
  • 3
  • 4
  • 5
XML to Hash conversion: Nori drops the attributes of the deepest XML elements

#1
## Summary

I'm using Ruby (`ruby 2.1.2p95 (2014-05-08) [x86_64-linux-gnu]` on my machine, `ruby 1.9.3p484 (2013-11-22 revision 43786) [x86_64-linux]` in production environment) and Nori to convert an XML document (initially processed with Nokogiri for some validation) into a Ruby Hash, but I later discovered that Nori is dropping the attributes of the deepest XML elements.

## Issue Details and Reproducing

To do this, I'm using code similar to the following:

xml = Nokogiri::XML(File.open('file.xml')) { |config| config.strict.noblanks }
hash = Nori.new.parse xml.to_s

The code generally works as intended, except for one case. Whenever Nori parses the XML text, it drops element attributes from the leaf elements (i.e. elements that have no child elements).

For example, the following document:

<?xml version="1.0"?>
<root>
<objects>
<object>
<fields>
<id>1</id>
<name>The name</name>
<description>A description</description>
</fields>
</object>
</objects>
</root>

...is converted to the expected Hash (some output omitted for brevity):

irb(main):066:0> xml = Nokogiri::XML(txt) { |config| config.strict.noblanks }
irb(main):071:0> ap Nori.new.parse(xml.to_s), :indent => -2
{
"root" => {
"objects" => {
"object" => {
"fields" => {
"id" => "1",
"name" => "The name"
"description" => "A description"
}
}
}
}
}

The problem shows up when element attributes are used on elements with no children. For example, the following document is *not* converted as expected:

<?xml version="1.0"?>
<root>
<objects>
<object id="1">
<fields>
<field name="Name">The name</field>
<field name="Description">A description</field>
</fields>
</object>
</objects>
</root>

The same `Nori.new.parse(xml.to_s)`, as displayed by `awesome_print`, shows the attributes of the deepest `<field>` elements are *absent*:

irb(main):131:0> ap Nori.new.parse(xml.to_s), :indent => -2
{
"root" => {
"objects" => {
"object" => {
"fields" => {
"field" => [
[0] "The name",
[1] "A description"
]
},
"@id" => "1"
}
}
}
}

The Hash only has their values as a list, which is *not* what I wanted. I expected the `<field>` elements to retain their attributes just like their parent elements (e.g. see `@id="1"` for `<object>`), not for their attributes to get chopped off.

Even if the document is modified to look as follows, it still doesn't work as expected:

<?xml version="1.0"?>
<root>
<objects>
<object id="1">
<fields>
<Name type="string">The name</Name>
<Description type="string">A description</Description>
</fields>
</object>
</objects>
</root>

It produces the following Hash:

{
"root" => {
"objects" => {
"object" => {
"fields" => {
"Name" => "The name",
"Description" => "A description"
},
"@id" => "1"
}
}
}
}

Which lacks the `type="whatever"` attributes for each field entry.

Searching eventually lead me to [Issue #59][1] with the last post (from Aug 2015) stating he can't "find the bug in Nori's code."

## Conclusion

So, **my question is:** Are any of you aware of a way to work around the Nori issue (e.g. perhaps a setting) that would allow me to use my original schema (i.e. the one with attributes in elements with no children)? If so, can you share a code snippet that will handle this correctly?

I had to re-design my XML schema and change code at about three times to make it work, so if there's a way to get Nori to behave, and I'm simply not aware of it, I'd like to know what it is.

I'd like to *avoid* installing more libraries as much as possible just to get this working properly with the schema structure I originally wanted to use, but I'm open to the possibility if it's proven to work. (I'd have to re-factor the code once again...) Frameworks are definitely overkill for this, so please: do *not* suggest [Ruby on Rails][2] or similar full-stack solutions.

Please note that my current solution, based on a (reluctantly) redesigned schema, is working, but it's more complicated to generate and process than the original one, and I'd like to go back to the simpler/shallower schema.

[1]:

[To see links please register here]

"Issue #53"
[2]:

[To see links please register here]

Reply

#2
Nori is not actually dropping the attributes, they are just not being printed.

If you run the ruby script:

require 'nori'

data = Nori.new(empty_tag_value: true).parse(<<XML)
<?xml version="1.0"?>
<root>
<objects>
<object>
<fields>
<field name="Name">The name</field>
<field name="Description">A description</field>
</fields>
</object>
</objects>
</root>
XML

field_list = data['root']['objects']['object']['fields']['field']

puts "text: '#{field_list[0]}' data: #{field_list[0].attributes}"
puts "text: '#{field_list[1]}' data: #{field_list[1].attributes}"

You should get the output

["The name", "A description"]
text: 'The name' data: {"name"=>"Name"}
text: 'A description' data: {"name"=>"Description"}

Which clearly shows that the attribute are there, but are not displayed by the `inspect` method (the `p(x)` function being the same as `puts x.inspect`).


You will notice that `puts field_list.inspect` outputs `["The name", "A description"]`. but `field_list[0].attributes` prints the attribute key and data.

If you would like to have `pp` display this you can overload the `inspect` method in the `Nori::StringWithAttributes`.

class Nori
class StringWithAttributes < String
def inspect
[attributes, String.new(self)].inspect
end
end
end

Or if you wanted to change the output you could overload the `self.new` method to have it return a different data strcture.

class Nori
class MyText < Array
def attributes=(data)
self[1] = data
end
attr_accessor :text
def initialize(text)
self[0] = text
self[1] = {}
end
end
class StringWithAttributes < String
def self.new(x)
MyText.new(x)
end
end
end

And access the data as a tuple

puts "text: '#{data['root']['objects']['object']['fields']['field'][0].first}' data: #{ data['root']['objects']['object']['fields']['field'][0].last}"

This would make it so you could have the data as JSON or YAML as the text items would look like arrays with 2 elements.
`pp` also works.

{"root"=>
{"objects"=>
{"object"=>
{"fields"=>
{"field"=>
[["The name", {"name"=>"Name"}],
["A description", {"name"=>"Description"}]]},
"bob"=>[{"@id"=>"id1"}, {"@id"=>"id2"}],
"bill"=>
[{"p"=>["one", {}], "@id"=>"bid1"}, {"p"=>["two", {}], "@id"=>"bid2"}],
"@id"=>"1"}}}}

This should do what you want.

require 'awesome_print'
require 'nori'

# Copyright © 2016 G. Allen Morris III
#
# Awesome Print is freely distributable under the terms of MIT license.
# See LICENSE file or

[To see links please register here]

#------------------------------------------------------------------------------
module AwesomePrint
module Nori

def self.included(base)
base.send :alias_method, :cast_without_nori, :cast
base.send :alias_method, :cast, :cast_with_nori
end

# Add Nori XML Node and NodeSet names to the dispatcher pipeline.
#-------------------------------------------------------------------
def cast_with_nori(object, type)
cast = cast_without_nori(object, type)
if defined?(::Nori::StringWithAttributes) && object.is_a?(::Nori::StringWithAttributes)
cast = :nori_xml_node
end
cast
end

#-------------------------------------------------------------------
def awesome_nori_xml_node(object)
return %Q|["#{object}", #{object.attributes}]|
end
end
end

AwesomePrint::Formatter.send(:include, AwesomePrint::Nori)

data = Nori.new(empty_tag_value: true).parse(<<XML)
<?xml version="1.0"?>
<root>
<objects>
<object>
<fields>
<field name="Name">The name</field>
<field name="Description">A description</field>
</fields>
</object>
</objects>
</root>
XML

ap data

as the output is:

{
"root" => {
"objects" => {
"object" => {
"fields" => {
"field" => [
[0] ["The name", {"name"=>"Name"}],
[1] ["A description", {"name"=>"Description"}]
]
}
}
}
}
}

Reply



Forum Jump:


Users browsing this thread:
1 Guest(s)

©0Day  2016 - 2023 | All Rights Reserved.  Made with    for the community. Connected through