class Nokogiri::XML::SAX::Document


argument.
‘false`, then the #reference callback will be invoked, but with `nil` for the `content`
means the #characters callback will not be invoked. If ParserContext#replace_entities is
replacement text will not be reported. If ParserContext#replace_entities is `true`, this
or an external entity that could not be resolved because of network issues), then the
† In the case where the replacement text for the entity is unknown (e.g., an undeclared entity
 
| External † | #replace_entities == true | #replace_entities == false |
| Internal | always | #replace_entities == false |
| Undeclared † | never | #replace_entities == false |
| Predefined (e.g., &) | always | never |
| Char ref (e.g., ’) | always | never |
|————————————–|————————————|————————————-|
| Entity type | #characters | #reference |
DTDs](en.wikipedia.org/wiki/Document_type_definition#Entity_declarations).
💡 For more information on entity types, see [Wikipedia’s page on
documents.
It is UNSAFE to set ParserContext#replace_entities to ‘true` when parsing untrusted
ParserContext#replace_entities is `false`.)
of ParserContext#replace_entities. (Recall that the default value of
possibly to both. The behavior is determined by a combination of _entity type_ and the value
Entities will be reported to the user via callbacks to #characters, to #reference, or
you’re not getting the behavior you expect.
⚠ Entity handling is complicated in a SAX parser! Please read this section carefully if
### Entity Handling
- Nokogiri::HTML4::SAX
- Nokogiri::XML::SAX
See also:
You can use this event handler for any SAX-style parser included with Nokogiri.
end
end
puts “#{name} ended”
def end_element name
end
puts “#{name} started!”
def start_element name, attrs = []
class MyHandler < Nokogiri::XML::SAX::Document
To only be notified about start and end element events, write a class like this:
methods you are interested in knowing about.
XML document. To register for any particular event, subclass this class and implement the
handling. All of the methods on this class are available as possible events while parsing an
The SAX::Document class is used for registering types of events you are interested in
:markup: markdown

def cdata_block(string)

- +string+ contains the cdata content
[Parameters]
Called when cdata blocks are found
##
def cdata_block(string)
end

def characters(string)


⚠ This method might be called multiple times for a contiguous string of characters.

⚠ Please see Document@Entity+Handling for important information about how entities are handled.

- +string+ contains the character data or entity replacement text
[Parameters]

ParserContext#replace_entities is +true+.
Called when character data is parsed, and for parsed entities when
##
def characters(string)
end

def comment(string)

- +string+ contains the comment data
[Parameters]
Called when comments are encountered
##
def comment(string)
end

def end_document

Called when document ends parsing.
##
def end_document
end

def end_element(name)


- +name+ (String) the name of the element being closed
[Parameters]

Called at the end of an element.
##
def end_element(name)
end

def end_element_namespace(name, prefix = nil, uri = nil)


- +uri+ (String, nil) is the associated URI for the element's namespace
- +prefix+ (String, nil) is the namespace prefix for the element
- +name+ (String) is the name of the element
[Parameters]

Called at the end of an element.
##
def end_element_namespace(name, prefix = nil, uri = nil)
  # Deal with SAX v1 interface
  end_element([prefix, name].compact.join(":"))
end

def error(string)

- +string+ contains the error
[Parameters]
Called on document errors
##
def error(string)
end

def processing_instruction(name, content)

- +content+ is the value of the instruction
- +name+ is the target of the instruction
[Parameters]
Called when processing instructions are found
##
def processing_instruction(name, content)
end

def reference(name, content)


Since v1.17.0

⚠ An internal entity may result in a call to both #characters and #reference.

⚠ Please see Document@Entity+Handling for important information about how entities are handled.

- +content+ (String, nil) is the replacement text for the entity, if known
- +name+ (String) is the name of the entity
[Parameters]

Called when a parsed entity is referenced and not replaced.
##
def reference(name, content)
end

def start_document

Called when document starts parsing.
##
def start_document
end

def start_element(name, attrs = [])


end
]
["foo:bar", [["foo:quux", "xxx"]]],
["root", [["xmlns:foo", "http://foo.example.com/"], ["xmlns", "http://example.com/"]]],
parser.document.start_elements => [
assert_pattern do

XML

hello world

parser.parse(<<~XML)

namespaced elements or attributes will be returned as strings including the prefix:
Note that the element namespace and any attribute namespaces are not provided, and so any

#start_element_namespace method instead.
💡If you're dealing with XML and need to handle namespaces, use the

[ ["xmlns:foo", "http://sample.net"], ["size", "large"] ]
- +attrs+ (Array>) an assoc list of namespace declarations and attributes, e.g.:
- +name+ (String) the name of the element
[Parameters]

Called at the beginning of an element.
##
def start_element(name, attrs = [])
end

def start_element_namespace(name, attrs = [], prefix = nil, uri = nil, ns = []) # rubocop:disable Metrics/ParameterLists

rubocop:disable Metrics/ParameterLists

end
end
]
]
[],
"foo", "http://foo.example.com/", # prefix and uri for the "a" element
[Nokogiri::XML::SAX::Parser::Attribute(localname: "bar", prefix: "foo", uri: "http://foo.example.com/", value: "hello")], # prefixed attribute
"a",
], [
[["foo", "http://foo.example.com/"]], # namespace declarations
nil, nil,
[],
"root",
[
parser.document.start_elements_namespace => [
assert_pattern do

XML



parser.parse(<<~XML)
it "start_elements_namespace is called with namespaced attributes" do
[Example]

💡If you're dealing with HTML or don't care about namespaces, try #start_element instead.

- +ns+ (Array>) is an assoc list of namespace declarations on the element
- +uri+ (String, nil) is the associated URI for the element's namespace
- +prefix+ (String, nil) is the namespace prefix for the element
- +uri+ (String, nil) the namespace URI of the attribute
- +prefix+ (String, nil) the namespace prefix of the attribute
- +value+ (String) the value of the attribute
- +localname+ (String) the local name of the attribute
- +attrs+ (Array) is an array of structs with the following properties:
- +name+ (String) is the name of the element
[Parameters]

Called at the beginning of an element.
##
def start_element_namespace(name, attrs = [], prefix = nil, uri = nil, ns = []) # rubocop:disable Metrics/ParameterLists
  # Deal with SAX v1 interface
  name = [prefix, name].compact.join(":")
  attributes = ns.map do |ns_prefix, ns_uri|
    [["xmlns", ns_prefix].compact.join(":"), ns_uri]
  end + attrs.map do |attr|
    [[attr.prefix, attr.localname].compact.join(":"), attr.value]
  end
  start_element(name, attributes)
end

def warning(string)

- +string+ contains the warning
[Parameters]
Called on document warnings
##
def warning(string)
end

def xmldecl(version, encoding, standalone)

- +standalone+ ("yes", "no", nil) the standalone attribute if present, else +nil+
- +encoding+ (String, nil) the encoding of the document if present, else +nil+
- +version+ (String) the version attribute
[Parameters]

Called when an \XML declaration is parsed.
##
def xmldecl(version, encoding, standalone)
end