class Syntax::Tokenizer
a single token.
methods to make sure adjacent tokens of identical groups are returned as
looping until all tokens have been extracted. It also provides convenience
The base class of all tokenizers. It sets up the scanner and manages the
def self.delegate( sym )
def self.delegate( sym ) define_method( sym ) { |*a| @text.__send__( sym, *a ) } end
def append( data )
def append( data ) @chunk << data end
def end_region( gr, data=nil )
def end_region( gr, data=nil ) flush_chunk @group = gr @callback.call( Token.new( data||"", @group, :region_close ) ) end
def finish
Finish tokenizing. This flushes the buffer, yielding any remaining text
def finish start_group nil teardown end
def flush_chunk
def flush_chunk @callback.call( Token.new( @chunk, @group ) ) unless @chunk.empty? @chunk = "".dup end
def option(opt)
def option(opt) @options ? @options[opt] : nil end
def set( opts={} )
not) publish any options, but if a tokenizer does those options may be
Specify a set of tokenizer-specific options. Each tokenizer may (or may
def set( opts={} ) ( @options ||= Hash.new ).update opts end
def setup
Subclasses may override this method to provide implementation-specific
def setup end
def start( text, &block )
such as creating a new scanner for the text and saving the callback block.
Start tokenizing. This sets up the state in preparation for tokenization,
def start( text, &block ) @chunk = "".dup @group = :normal @callback = block @text = StringScanner.new( text ) setup end
def start_group( gr, data=nil )
After the new group is started, if +data+ is non-nil it will be appended
contents will be yielded to the client as a token, and then cleared.
group is created and the current chunk is not empty, the chunk's
as the group being requested, a new group will not be created. If a new
Request that a new group be started. If the current group is the same
def start_group( gr, data=nil ) flush_chunk if gr != @group @group = gr @chunk << data if data end
def start_region( gr, data=nil )
def start_region( gr, data=nil ) flush_chunk @group = gr @callback.call( Token.new( data||"", @group, :region_open ) ) end
def step
Subclasses must implement this method, which is called for each iteration
def step raise NotImplementedError, "subclasses must implement #step" end
def subgroup(n)
def subgroup(n) @text[n] end
def subtokenize( syntax, text )
def subtokenize( syntax, text ) tokenizer = Syntax.load( syntax ) tokenizer.set @options if @options flush_chunk tokenizer.tokenize( text, &@callback ) end
def teardown
Subclasses may override this method to provide implementation-specific
def teardown end
def tokenize( text, &block )
Begins tokenizing the given text, calling #step until the text has been
def tokenize( text, &block ) start text, &block step until @text.eos? finish end