class HTMLEntities
HTML entity encoding and decoding for Ruby
def decode(source)
Unknown named entities will not be converted
equivalents. The string should already be in UTF-8 encoding.
Decode entities in a string into their UTF-8
def decode(source) (@decoder ||= Decoder.new(@flavor)).decode(source) end
def encode(source, *instructions)
contains valid UTF-8 before calling this method.
Note: It is the program's responsibility to ensure that the source
decimal equivalents otherwise.
non-ASCII characters replaced with their named entity where possible, and
encode(str, :basic, :named, :decimal) - 7-bit clean, with all
encode(str, :basic, :decimal) - XML-safe and 7-bit clean
encode(str) - XML-safe
Examples:
If no instructions are specified, :basic will be used.
clobbered and that named entities are replaced before numeric ones.
the order listed above to ensure that entity ampersands are not
You can specify the commands in any order, but they will be executed in
:hexadecimal :: Convert non-ASCII characters to hexadecimal entities (e.g. # ካ)
:decimal :: Convert non-ASCII characters to decimal entities (e.g. Ӓ)
:named :: Convert non-ASCII characters to their named HTML 4.01 equivalent
:basic :: Convert the five XML entities ('"<>&)
are possible, and may be specified in order:
Encode codepoints into their corresponding entities. Various operations
def encode(source, *instructions) Encoder.new(@flavor, instructions).encode(source) end
def initialize(flavor='xhtml1')
entity encodes and decodes the same under :expanded as under :xhtml1
'expanded' is a strict superset of the XHTML entities: every xhtml named
characters." (sgml.txt).
ISOpub, ISOtech, HTMLspecial, HTMLsymbol) to corresponding Unicode
ISOdia, ISOgrk1, ISOgrk2, ISOgrk3, ISOgrk4, ISOlat1, ISOlat2, ISOnum,
ISOamsb, ISOamsc, ISOamsn, ISOamso, ISOamsr, ISObox, ISOcyr1, ISOcyr2,
it "maps SGML character entities from various public sets (namely, ISOamsa,
ftp://ftp.unicode.org/Public/MAPPINGS/VENDORS/MISC/SGML.TXT
'expanded' includes a large number of additional SGML entities drawn from
HTML4.
handling of the apos (apostrophe) named entity, which is not defined in
The only difference in functionality between html4 and xhtml1 is in the
Available flavors are 'html4', 'expanded' and 'xhtml1' (the default).
Create a new HTMLEntities coder for the specified flavor.
def initialize(flavor='xhtml1') @flavor = flavor.to_s.downcase raise UnknownFlavor, "Unknown flavor #{flavor}" unless FLAVORS.include?(@flavor) end