class CodeRay::Tokens
to load them from a file, and still use any Encoder that CodeRay provides.
It also allows you to generate tokens directly (without using a scanner),
that you put in your DB.
You can convert it to a webpage, a YAML file, or dump it into a gzip’ed string
Tokens gives you the power to handle pre-scanned code very easily:
See how small it is? ;)
# the Tokens object is here ——-^
CodeRay.scan(‘price = 2.59’, :ruby).html
Thus, the syntax below becomes clear:
then builds the output from this object.
The input is split and saved into a Tokens object. The Encoder
Tokens is the interface between Scanners and Encoders:
]
[:end_group, :string]
[‘“’, :delimiter],
[‘a string’, :content],
[‘”’, :delimiter],
[:begin_group, :string],
[
The Ruby scanner, for example, splits “a string” into:
token actions, namely begin_group and end_group.
Some scanners also yield sub-tokens, represented by special
[‘$^’, :error]
[‘3.1415926’, :float]
[‘# It looks like this’, :comment]
A token looks like this:
* the token kind (a Symbol representing the type of the token)
a token action (begin_group, end_group, begin_line, end_line)
* the token text (the original source of the token in a String) or
consisting of
A token is not a special object, just a two-element Array
a Scanner.
The Tokens class represents a list of tokens returnd from
= Tokens TODO: Rewrite!
def begin_group kind
def begin_group kind self << :begin_group << kind end
def begin_group kind; push :begin_group, kind end
def begin_group kind; push :begin_group, kind end
def begin_line kind
def begin_line kind self << :begin_line << kind end
def begin_line kind; push :begin_line, kind end
def begin_line kind; push :begin_line, kind end
def count
def count size / 2 end
def dump gzip_level = 7
speed and compression rate.
in most cases as it is a good compromise between
but the default value 7 should be what you want
You can configure the level of compression,
so it has an #undump method. See Tokens.load.
The returned String object includes Undumping
In addition, it is gzipped using GZip.gzip.
The dump is created with Marshal.dump;
in files or databases.
Dumps the object into a String that can be saved
def dump gzip_level = 7 dump = Marshal.dump self dump = GZip.gzip dump, gzip_level dump.extend Undumping end
def encode encoder, options = {}
* an Encoder object
* an Encoder class
* a symbol like :html oder :statistic
encoder can be
Encode the tokens using encoder.
def encode encoder, options = {} unless encoder.is_a? Encoders::Encoder if encoder.respond_to? :to_sym encoder_class = Encoders[encoder] end encoder = encoder_class.new options end encoder.encode_tokens self, options end
def encode_with encoder, options = {}
def encode_with encoder, options = {} Encoders[encoder].new(options).encode_tokens self end
def end_group kind
def end_group kind self << :end_group << kind end
def end_group kind; push :end_group, kind end
def end_group kind; push :end_group, kind end
def end_line kind
def end_line kind self << :end_line << kind end
def end_line kind; push :end_line, kind end
def end_line kind; push :end_line, kind end
def fix
Ensure that all begin_group tokens have a correspondent end_group.
def fix raise NotImplementedError, 'Tokens#fix needs to be rewritten.' # tokens = self.class.new # # Check token nesting using a stack of kinds. # opened = [] # for type, kind in self # case type # when :begin_group # opened.push [:begin_group, kind] # when :begin_line # opened.push [:end_line, kind] # when :end_group, :end_line # expected = opened.pop # if [type, kind] != expected # # Unexpected end; decide what to do based on the kind: # # - token was never opened: delete the end (just skip it) # next unless opened.rindex expected # # - token was opened earlier: also close tokens in between # tokens << token until (token = opened.pop) == expected # end # end # tokens << [type, kind] # end # # Close remaining opened tokens # tokens << token while token = opened.pop # tokens end
def fix!
def fix! replace fix end
def method_missing meth, options = {}
For example, if you call +tokens.html+, the HTML encoder
Redirects unknown methods to encoder calls.
def method_missing meth, options = {} encode_with meth, options rescue PluginHost::PluginNotFound super end
def optimize
for example, consecutive //-comment lines could already be
If the scanner is written carefully, this is not required -
Combined with dump, it saves space for the cost of time.
in most Encoders. It basically makes the output smaller.
This can not be undone, but should yield the same output
tokens of the same kind.
Returns the tokens compressed by joining consecutive
def optimize raise NotImplementedError, 'Tokens#optimize needs to be rewritten.' # last_kind = last_text = nil # new = self.class.new # for text, kind in self # if text.is_a? String # if kind == last_kind # last_text << text # else # new << [last_text, last_kind] if last_kind # last_text = text # last_kind = kind # end # else # new << [last_text, last_kind] if last_kind # last_kind = last_text = nil # new << [text, kind] # end # end # new << [last_text, last_kind] if last_kind # new end
def optimize!
def optimize! replace optimize end
def split_into_lines
This makes it simple for encoders that work line-oriented,
- there are no open tokens at the end the line
(which means all other token are single-line)
- newlines are single tokens
Makes sure that:
TODO: Scanner#split_into_lines
def split_into_lines raise NotImplementedError end
def split_into_lines!
def split_into_lines! replace split_into_lines end
def split_into_parts *sizes
This method is used by @Scanner#tokenize@ when called with an Array
betweem them.
part closes all opened tokens. This is useful to insert tokens
the text size specified by the parameter. In addition, each
The result will be an Array of Tokens objects. The parts have
Split the tokens into parts of the given +sizes+.
def split_into_parts *sizes parts = [] opened = [] content = nil part = Tokens.new part_size = 0 size = sizes.first i = 0 for item in self case content when nil content = item when String if size && part_size + content.size > size # token must be cut if part_size < size # some part of the token goes into this part content = content.dup # content may no be safe to change part << content.slice!(0, size - part_size) << item end # close all open groups and lines... closing = opened.reverse.flatten.map do |content_or_kind| case content_or_kind when :begin_group :end_group when :begin_line :end_line else content_or_kind end end part.concat closing begin parts << part part = Tokens.new size = sizes[i += 1] end until size.nil? || size > 0 # ...and open them again. part.concat opened.flatten part_size = 0 redo unless content.empty? else part << content << item part_size += content.size end content = nil when Symbol case content when :begin_group, :begin_line opened << [content, item] when :end_group, :end_line opened.pop else raise ArgumentError, 'Unknown token action: %p, kind = %p' % [content, item] end part << content << item content = nil else raise ArgumentError, 'Token input junk: %p, kind = %p' % [content, item] end end parts << part parts << Tokens.new while parts.size < sizes.size parts end
def text_token text, kind
def text_token text, kind self << text << kind end
def to_s
def to_s encode CodeRay::Encoders::Encoder.new end