class HTMLEntities


HTML entity encoding and decoding for Ruby

def decode(source)


Unknown named entities will not be converted

equivalents. The string should already be in UTF-8 encoding.
Decode entities in a string into their UTF-8
def decode(source)
  (@decoder ||= Decoder.new(@flavor)).decode(source)
end

def encode(source, *instructions)


contains valid UTF-8 before calling this method.
Note: It is the program's responsibility to ensure that the source

decimal equivalents otherwise.
non-ASCII characters replaced with their named entity where possible, and
encode(str, :basic, :named, :decimal) - 7-bit clean, with all
encode(str, :basic, :decimal) - XML-safe and 7-bit clean
encode(str) - XML-safe
Examples:

If no instructions are specified, :basic will be used.

clobbered and that named entities are replaced before numeric ones.
the order listed above to ensure that entity ampersands are not
You can specify the commands in any order, but they will be executed in

:hexadecimal :: Convert non-ASCII characters to hexadecimal entities (e.g. # ካ)
:decimal :: Convert non-ASCII characters to decimal entities (e.g. Ӓ)
:named :: Convert non-ASCII characters to their named HTML 4.01 equivalent
:basic :: Convert the five XML entities ('"<>&)

are possible, and may be specified in order:
Encode codepoints into their corresponding entities. Various operations
def encode(source, *instructions)
  Encoder.new(@flavor, instructions).encode(source)
end

def initialize(flavor='xhtml1')


entity encodes and decodes the same under :expanded as under :xhtml1
'expanded' is a strict superset of the XHTML entities: every xhtml named

characters." (sgml.txt).
ISOpub, ISOtech, HTMLspecial, HTMLsymbol) to corresponding Unicode
ISOdia, ISOgrk1, ISOgrk2, ISOgrk3, ISOgrk4, ISOlat1, ISOlat2, ISOnum,
ISOamsb, ISOamsc, ISOamsn, ISOamso, ISOamsr, ISObox, ISOcyr1, ISOcyr2,
it "maps SGML character entities from various public sets (namely, ISOamsa,
ftp://ftp.unicode.org/Public/MAPPINGS/VENDORS/MISC/SGML.TXT
'expanded' includes a large number of additional SGML entities drawn from

HTML4.
handling of the apos (apostrophe) named entity, which is not defined in
The only difference in functionality between html4 and xhtml1 is in the

Available flavors are 'html4', 'expanded' and 'xhtml1' (the default).
Create a new HTMLEntities coder for the specified flavor.
def initialize(flavor='xhtml1')
  @flavor = flavor.to_s.downcase
  raise UnknownFlavor, "Unknown flavor #{flavor}" unless FLAVORS.include?(@flavor)
end