class Nokogiri::XML::SAX::Document
argument.
‘false`, then the #reference callback will be invoked, but with `nil` for the `content`
means the #characters callback will not be invoked. If ParserContext#replace_entities is
replacement text will not be reported. If ParserContext#replace_entities is `true`, this
or an external entity that could not be resolved because of network issues), then the
† In the case where the replacement text for the entity is unknown (e.g., an undeclared entity
| External † | #replace_entities == true
| #replace_entities == false
|
| Internal | always | #replace_entities == false
|
| Undeclared † | never | #replace_entities == false
|
| Predefined (e.g., &
) | always | never |
| Char ref (e.g., ’
) | always | never |
|————————————–|————————————|————————————-|
| Entity type | #characters | #reference |
DTDs](en.wikipedia.org/wiki/Document_type_definition#Entity_declarations).
💡 For more information on entity types, see [Wikipedia’s page on
documents.
⚠ It is UNSAFE to set ParserContext#replace_entities to ‘true` when parsing untrusted
ParserContext#replace_entities is `false`.)
of ParserContext#replace_entities. (Recall that the default value of
possibly to both. The behavior is determined by a combination of _entity type_ and the value
Entities will be reported to the user via callbacks to #characters, to #reference, or
you’re not getting the behavior you expect.
⚠ Entity handling is complicated in a SAX parser! Please read this section carefully if
### Entity Handling
- Nokogiri::HTML4::SAX
- Nokogiri::XML::SAX
See also:
You can use this event handler for any SAX-style parser included with Nokogiri.
end
end
puts “#{name} ended”
def end_element name
end
puts “#{name} started!”
def start_element name, attrs = []
class MyHandler < Nokogiri::XML::SAX::Document
To only be notified about start and end element events, write a class like this:
methods you are interested in knowing about.
XML document. To register for any particular event, subclass this class and implement the
handling. All of the methods on this class are available as possible events while parsing an
The SAX::Document class is used for registering types of events you are interested in
:markup: markdown
def cdata_block(string)
[Parameters]
Called when cdata blocks are found
##
def cdata_block(string) end
def characters(string)
⚠ This method might be called multiple times for a contiguous string of characters.
⚠ Please see Document@Entity+Handling for important information about how entities are handled.
- +string+ contains the character data or entity replacement text
[Parameters]
ParserContext#replace_entities is +true+.
Called when character data is parsed, and for parsed entities when
##
def characters(string) end
def comment(string)
[Parameters]
Called when comments are encountered
##
def comment(string) end
def end_document
##
def end_document end
def end_element(name)
- +name+ (String) the name of the element being closed
[Parameters]
Called at the end of an element.
##
def end_element(name) end
def end_element_namespace(name, prefix = nil, uri = nil)
- +uri+ (String, nil) is the associated URI for the element's namespace
- +prefix+ (String, nil) is the namespace prefix for the element
- +name+ (String) is the name of the element
[Parameters]
Called at the end of an element.
##
def end_element_namespace(name, prefix = nil, uri = nil) # Deal with SAX v1 interface end_element([prefix, name].compact.join(":")) end
def error(string)
[Parameters]
Called on document errors
##
def error(string) end
def processing_instruction(name, content)
- +name+ is the target of the instruction
[Parameters]
Called when processing instructions are found
##
def processing_instruction(name, content) end
def reference(name, content)
Since v1.17.0
⚠ An internal entity may result in a call to both #characters and #reference.
⚠ Please see Document@Entity+Handling for important information about how entities are handled.
- +content+ (String, nil) is the replacement text for the entity, if known
- +name+ (String) is the name of the entity
[Parameters]
Called when a parsed entity is referenced and not replaced.
##
def reference(name, content) end
def start_document
##
def start_document end
def start_element(name, attrs = [])
end
]
["foo:bar", [["foo:quux", "xxx"]]],
["root", [["xmlns:foo", "http://foo.example.com/"], ["xmlns", "http://example.com/"]]],
parser.document.start_elements => [
assert_pattern do
XML
parser.parse(<<~XML)
namespaced elements or attributes will be returned as strings including the prefix:
Note that the element namespace and any attribute namespaces are not provided, and so any
#start_element_namespace method instead.
💡If you're dealing with XML and need to handle namespaces, use the
[ ["xmlns:foo", "http://sample.net"], ["size", "large"] ]
- +attrs+ (Array
- +name+ (String) the name of the element
[Parameters]
Called at the beginning of an element.
##
def start_element(name, attrs = []) end
def start_element_namespace(name, attrs = [], prefix = nil, uri = nil, ns = []) # rubocop:disable Metrics/ParameterLists
end
end
]
]
[],
"foo", "http://foo.example.com/", # prefix and uri for the "a" element
[Nokogiri::XML::SAX::Parser::Attribute(localname: "bar", prefix: "foo", uri: "http://foo.example.com/", value: "hello")], # prefixed attribute
"a",
], [
[["foo", "http://foo.example.com/"]], # namespace declarations
nil, nil,
[],
"root",
[
parser.document.start_elements_namespace => [
assert_pattern do
XML
parser.parse(<<~XML)
it "start_elements_namespace is called with namespaced attributes" do
[Example]
💡If you're dealing with HTML or don't care about namespaces, try #start_element instead.
- +ns+ (Array
- +uri+ (String, nil) is the associated URI for the element's namespace
- +prefix+ (String, nil) is the namespace prefix for the element
- +uri+ (String, nil) the namespace URI of the attribute
- +prefix+ (String, nil) the namespace prefix of the attribute
- +value+ (String) the value of the attribute
- +localname+ (String) the local name of the attribute
- +attrs+ (Array
- +name+ (String) is the name of the element
[Parameters]
Called at the beginning of an element.
##
def start_element_namespace(name, attrs = [], prefix = nil, uri = nil, ns = []) # rubocop:disable Metrics/ParameterLists # Deal with SAX v1 interface name = [prefix, name].compact.join(":") attributes = ns.map do |ns_prefix, ns_uri| [["xmlns", ns_prefix].compact.join(":"), ns_uri] end + attrs.map do |attr| [[attr.prefix, attr.localname].compact.join(":"), attr.value] end start_element(name, attributes) end
def warning(string)
[Parameters]
Called on document warnings
##
def warning(string) end
def xmldecl(version, encoding, standalone)
- +encoding+ (String, nil) the encoding of the document if present, else +nil+
- +version+ (String) the version attribute
[Parameters]
Called when an \XML declaration is parsed.
##
def xmldecl(version, encoding, standalone) end