class HexaPDF::Content::Processor

methods.
provided. Both can directly be invoked from the ‘show_text’ and ‘show_text_with_positioning’
Two utility methods #decode_text and #decode_text_with_positioning for extracting text are
== Text Processing
while parsing inline images and do not reflect separate operators.
operators ‘ID’ and ‘EI’ exist for inline images, they are not used because they are consumed
For inline images only the ‘BI’ operator mapped to ‘inline_image’ is used. Although also the
canvas.
processor could use the processing state to extract the text. Or paint the content on a
to concern itself with ensuring the consistency of the processing state. For example, the
The task of these methods is to do something useful with the content itself, it doesn’t need
’save_graphics_state“.
OPERATOR_MESSAGE_NAME_MAP constant. For example, the operator ‘q’ is mapped to
they exist). Each PDF operator name is mapped to a nicer message name via the
After that methods corresponding to the operator names are invoked on the processor object (if
for this task and not more, so they are very specific and normally don’t need to be changed.
actually modify the #graphics_state object. However, operator implementations are only used
the processing state is consistent. For example, operators that modify the graphics state do
The operator implementations (see the Operator module) are called first and they ensure that
== How Processing Works
setup (like modifying the graphics state) is done before further processing.
these operators are usually processed with a Processor instance that ensures that the needed
When a content stream is read, operators and their operands are extracted. After extracting
== General Information
This class is used for processing content operators extracted from a content stream.

def decode_horizontal_text(array)

See: PDF2.0 s9.4.4

writing direction is horizontal.
Decodes the given array containing text and positioning information while assuming that the

def decode_horizontal_text(array)
  font = graphics_state.font
  scaled_char_space = graphics_state.scaled_character_spacing
  scaled_word_space = (font.word_spacing_applicable? ? graphics_state.scaled_word_spacing : 0)
  scaled_font_size = graphics_state.scaled_font_size
  below_baseline = font.bounding_box[1] * scaled_font_size / \
    graphics_state.scaled_horizontal_scaling + graphics_state.text_rise
  above_baseline = font.bounding_box[3] * scaled_font_size / \
    graphics_state.scaled_horizontal_scaling + graphics_state.text_rise
  text = CompositeBox.new
  array.each do |item|
    if item.kind_of?(Numeric)
      graphics_state.tm.translate(-item * scaled_font_size, 0)
    else
      font.decode(item).each do |code_point|
        char = font.to_utf8(code_point)
        width = font.width(code_point) * scaled_font_size + scaled_char_space + \
          (code_point == 32 ? scaled_word_space : 0)
        matrix = graphics_state.ctm.dup.premultiply(*graphics_state.tm)
        fragment = GlyphBox.new(code_point, char,
                                *matrix.evaluate(0, below_baseline),
                                *matrix.evaluate(width, below_baseline),
                                *matrix.evaluate(0, above_baseline))
        text << fragment
        graphics_state.tm.translate(width, 0)
      end
    end
  end
  text.freeze
end

def decode_text(data)

text strings together with positioning information (+TJ+ operator).
The argument may either be a simple text string (+Tj+ operator) or an array that contains

Decodes the given text object and returns it as UTF-8 string.

def decode_text(data)
  if data.kind_of?(Array)
    data = data.each_with_object(''.b) {|obj, result| result << obj if obj.kind_of?(String) }
  end
  font = graphics_state.font
  font.decode(data).map {|code_point| font.to_utf8(code_point) }.join
end

def decode_text_with_positioning(data)

font's bounding box.
predetermined but not the height. The latter is chosen to be the height and offset of the
For each glyph a GlyphBox object is computed. For horizontal fonts the width is

text strings together with positioning information (+TJ+ operator).
The argument may either be a simple text string (+Tj+ operator) or an array that contains

Decodes the given text object and returns it as a CompositeBox object.

def decode_text_with_positioning(data)
  data = Array(data)
  if graphics_state.font.writing_mode == :horizontal
    decode_horizontal_text(data)
  else
    decode_vertical_text(data)
  end
end

def decode_vertical_text(_data)

writing direction is vertical.
Decodes the given array containing text and positioning information while assuming that the

def decode_vertical_text(_data)
  raise "Not yet implemented"
end

def initialize(resources = nil)

prior to processing operators!
It is not mandatory to set the resources dictionary on initialization but it needs to be set

while processing operators.
Initializes a new processor that uses the resources PDF dictionary for resolving resources

def initialize(resources = nil)
  @operators = Operator::DEFAULT_OPERATORS.dup
  @graphics_state = GraphicsState.new
  @graphics_object = :none
  @original_resources = nil
  self.resources = resources
end

def paint_xobject(name)

XObject.
It checks if the XObject is a Form XObject and if so, processes the contents of the Form

Provides a default implementation for the 'Do' operator.

def paint_xobject(name)
  xobject = resources.xobject(name)
  return unless xobject[:Subtype] == :Form
  res = resources
  graphics_state.save
  graphics_state.ctm.premultiply(*xobject[:Matrix]) if xobject.key?(:Matrix)
  xobject.process_contents(self, original_resources: @original_resources)
  graphics_state.restore
  self.resources = res
end

def process(operator, operands = [])

operations and then the corresponding method on this object is invoked.
The operator is first processed with an operator implementation (if any) to ensure correct

Processes the operator with the given operands.

def process(operator, operands = [])
  @operators[operator].invoke(self, *operands) if @operators.key?(operator)
  msg = OPERATOR_MESSAGE_NAME_MAP[operator]
  send(msg, *operands) if msg && respond_to?(msg, true)
end

def resources=(res)

resources dictionary instead.
needed because form XObject don't need to have a resources dictionary and can use the page's
The first time resources are set, they are also stored as the "original" resources. This is

Sets the resources dictionary used during processing.

def resources=(res)
  @original_resources = res if @original_resources.nil?
  @resources = res
end

Modules

Classes

class HexaPDF::Content::Processor

def decode_horizontal_text(array)

def decode_text(data)

def decode_text_with_positioning(data)

def decode_vertical_text(_data)

def initialize(resources = nil)

def paint_xobject(name)

def process(operator, operands = [])

def resources=(res)

Namespace

Classes in this namespace

Instance Methods

Defined in