class HexaPDF::Revisions

See: PDF2.0 s7.5.6, HexaPDF::Revision
best to use the convenience methods of this class to create, access or delete indirect objects.
this should only be done if one is familiar with the inner workings of HexaPDF. Otherwise it is
Important: It is possible to manipulate the individual revisions and their objects oneself but
written.
the newest revision the highest index. This is also the order in which the revisions get
The order of the revisions is important. In HexaPDF the oldest revision always has index 0 and
content.
are made. This allows for adding information/content to a PDF file without changing the original
A PDF document has one revision when it is created. Later, new revisions are added when changes
Manages the revisions of a PDF document.

def add

*and the PDF specification.
*Note*: This method should only be used if one is familiar with the inner workings of HexaPDF

Adds a new empty revision to the document and returns it.

def add
  if @revisions.empty?
    trailer = {}
  else
    trailer = current.trailer.value.dup
    trailer.delete(:Prev)
    trailer.delete(:XRefStm)
  end
  rev = Revision.new(@document.wrap(trailer, type: :XXTrailer))
  @revisions.push(rev)
  rev
end

def add_object(obj)

If +object+ is a direct object, an object number is automatically assigned.

Adds the given HexaPDF::Object to the current revision and returns it.

revisions.add_object(object) -> object
:call-seq:

def add_object(obj)
  if obj.indirect? && (rev_obj = current.object(obj.oid))
    if rev_obj.data == obj.data
      return obj
    else
      raise HexaPDF::Error, "Can't add object because there is already " \
        "an object with object number #{obj.oid}"
    end
  end
  obj.oid = next_oid unless obj.indirect?
  current.add(obj)
end

def all

*and the PDF specification.
*Note*: This method should only be used if one is familiar with the inner workings of HexaPDF

Returns a list of all revisions.

def all
  @revisions
end

def current

*and the PDF specification.
*Note*: This method should only be used if one is familiar with the inner workings of HexaPDF

Returns the current revision.

def current
  @revisions.last
end

def delete_object(ref)

Deletes the indirect object specified by an exact reference or by an object number.

revisions.delete_object(oid)
revisions.delete_object(ref)
:call-seq:

def delete_object(ref)
  @revisions.reverse_each do |rev|
    if rev.object?(ref)
      rev.delete(ref)
      break
    end
  end
end

def each(&block)

*and the PDF specification.
*Note*: This method should only be used if one is familiar with the inner workings of HexaPDF

Iterates over all revisions from oldest to current one.

revisions.each -> Enumerator
revisions.each {|rev| block } -> revisions
:call-seq:

def each(&block)
  return to_enum(__method__) unless block_given?
  @revisions.each(&block)
  self
end

def each_object(only_current: true, only_loaded: false, &block)

done. If it is still done, one has to take care to avoid an invalid document state.
*Note* that setting +only_current+ to +false+ is normally not necessary and should not be

oid/gen [3,1].
generation numbers in different revisions, e.g. one object with oid/gen [3,0] and one with
* Additionally, there may also be objects with the same object number but different

two (different) objects with oid/gen [3,0].
* Multiple revisions may contain objects with the same object and generation numbers, e.g.

revisions:
The +only_current+ option can make a difference because the document can contain multiple

from newest to oldest are returned, not only the current version of each object.
number is yielded exactly once. If the +only_current+ option is +false+, all stored objects
By default, only the current version of each object is returned which implies that each object

This does only matter when the document instance was created from an existing PDF document.
If +only_loaded+ is +true+, only the already loaded objects of the PDF document are yielded.

Yields every object and optionally the revision it is in.

revisions.each_object(only_current: true, only_loaded: false) -> Enumerator
revisions.each_object(only_current: true, only_loaded: false) {|obj, rev| block } -> revisions
revisions.each_object(only_current: true, only_loaded: false) {|obj| block } -> revisions
:call-seq:

def each_object(only_current: true, only_loaded: false, &block)
  unless block_given?
    return to_enum(__method__, only_current: only_current, only_loaded: only_loaded)
  end
  yield_rev = (block.arity == 2)
  oids = {}
  @revisions.reverse_each do |rev|
    rev.each(only_loaded: only_loaded) do |obj|
      next if only_current && oids.include?(obj.oid)
      yield_rev ? yield(obj, rev) : yield(obj)
      oids[obj.oid] = true
    end
  end
  self
end

def from_io(document, io)

If the +io+ object is +nil+, an empty Revisions object is returned.

object.
Loads all revisions for the document from the given IO and returns the created Revisions

def from_io(document, io)
  return new(document) if io.nil?
  parser = Parser.new(io, document)
  object_loader = lambda {|xref_entry| parser.load_object(xref_entry) }
  revisions = []
  begin
    offset = parser.startxref_offset
    seen_xref_offsets = {}
    while offset && !seen_xref_offsets.key?(offset)
      # PDF2.0 s7.5.5 states that :Prev needs to be indirect, Adobe's reference 3.4.4 says it
      # should be direct. Adobe's POV is followed here. Same with :XRefStm.
      xref_section, trailer = parser.load_revision(offset)
      seen_xref_offsets[offset] = true
      stm = trailer[:XRefStm]
      if stm && !seen_xref_offsets.key?(stm)
        if xref_section.max_oid == 0 && trailer[:Prev] > stm
          # Revision is completely empty, with xref stream in previous revision
          merge_revision = trailer[:Prev]
        end
        stm_xref_section, = parser.load_revision(stm)
        stm_xref_section.merge!(xref_section)
        xref_section = stm_xref_section
        seen_xref_offsets[stm] = true
      end
      if parser.linearized? && !trailer.key?(:Prev)
        merge_revision = offset
      end
      if merge_revision == offset && !revisions.empty?
        xref_section.merge!(revisions.first.xref_section)
        offset = trailer[:Prev] # Get possible next offset before overwriting trailer
        trailer = revisions.first.trailer
        revisions.shift
      else
        offset = trailer[:Prev]
      end
      revisions.unshift(Revision.new(document.wrap(trailer, type: :XXTrailer),
                                     xref_section: xref_section, loader: object_loader))
    end
  rescue HexaPDF::MalformedPDFError
    raise unless (reconstructed_revision = parser.reconstructed_revision)
    unless revisions.empty?
      reconstructed_revision.trailer.data.value = revisions.last.trailer.data.value
    end
    revisions << reconstructed_revision
  end
  document.version = parser.file_header_version rescue '1.0'
  new(document, initial_revisions: revisions, parser: parser)
end

def initialize(document, initial_revisions: nil, parser: nil)

incremental writing.
even though the document was read from an IO stream, some parts may not work, like
The parser with which the initial revisions were read. If this option is not specified
parser::

single empty revision is added.
An array of revisions that should initially be used. If this option is not specified, a
initial_revisions::

Options:

Creates a new revisions object for the given PDF document.

def initialize(document, initial_revisions: nil, parser: nil)
  @document = document
  @parser = parser
  @revisions = []
  if initial_revisions
    @revisions += initial_revisions
  else
    add
  end
end

def merge(range = 0..-1)

overwrite those from older ones.
Merges the revisions specified by the given range into one. Objects from newer revisions

revisions.merge(range = 0..-1) -> revisions
:call-seq:

def merge(range = 0..-1)
  @revisions[range].reverse.each_cons(2) do |rev, prev_rev|
    prev_rev.trailer.value.replace(rev.trailer.value)
    rev.each do |obj|
      if obj.data != prev_rev.object(obj)&.data
        prev_rev.delete(obj.oid, mark_as_free: false)
        prev_rev.add(obj)
      end
    end
  end
  _first, *other = *@revisions[range]
  other.each {|rev| @revisions.delete(rev) }
  self
end

def next_oid

Returns the next object identifier that should be used when adding a new object.

def next_oid
  @revisions.map(&:next_free_oid).max
end

def object(ref)

See: PDF2.0 s7.3.9

PDF Null object, not by +nil+!
For references to unknown objects, +nil+ is returned but free objects are represented by a

given object number.
Returns the current version of the indirect object for the given exact reference or for the

revisions.object(oid) -> obj or nil
revisions.object(ref) -> obj or nil
:call-seq:

def object(ref)
  i = @revisions.size - 1
  while i >= 0
    if (result = @revisions[i].object(ref))
      return result
    end
    i -= 1
  end
  nil
end

def object?(ref)

because this method takes *all* revisions into account.
Even though this method might return +true+ for some references, #object may return +nil+

reference or for the given object number.
Returns +true+ if one of the revisions contains an indirect object for the given exact

revisions.object?(oid) -> true or false
revisions.object?(ref) -> true or false
:call-seq:

def object?(ref)
  @revisions.any? {|rev| rev.object?(ref) }
end

Namespace

HexaPDF

Included Modules

HexaPDF::Revisions::Enumerable

Instance Methods

Defined in

lib/hexapdf/revisions.rb