Jekyll::Algolia::FileBrowser

def self.absolute_path(file)

we have a consistent way of accessing it
(pages) or as an absolute paths (posts and static assets). We make sure
Jekyll handles the .path property of some files as relative to the root

file - The Jekyll file to inspect

Public: Return the absolute path of a Jekyll file

def self.absolute_path(file)
  pathname = Pathname.new(file.path)
  return pathname.cleanpath.to_s if pathname.absolute?
  File.join(Configurator.get('source'), file.path)
end

def self.allowed_extension?(file)

`extensions_to_index` config option.
and raw HTML files but this list can be extended using the
can convert many more file formats. By default we'll only index markdown
Jekyll can transform markdown files to HTML by default. With plugins, it

file - The Jekyll file

Public: Check if the file has one of the allowed extensions

def self.allowed_extension?(file)
  extensions = Configurator.algolia('extensions_to_index')
  extname = File.extname(file.path)[1..-1]
  extensions.include?(extname)
end

def self.collection(file)

part of a collection
are purposefully excluded from it as well even if they are technically
Only collection documents can have a collection name. Pages don't. Posts

file - The Jekyll file

Public: Returns the name of the collection

def self.collection(file)
  return nil unless file.respond_to?(:collection)
  collection_name = file.collection.label
  # Posts are a special kind of collection, but it's an implementation
  # detail from my POV, so I'll exclude them
  return nil if collection_name == 'posts'
  collection_name
end

def self.date(file)

while they haven't.
Otherwise the diff indexing mode will think that records have changed
defined, we have to make sure the date is actually nil when we index it.
Note that because the default date is the current one if none is

For pages, only dates defined in the front-matter will be used.

current date.
front-matter or the filename prefix. If none is set, Jekyll will use the
All collections (including posts) will have a date taken either from the

file - The Jekyll file

Public: Returns a timestamp of the file date

def self.date(file)
  date = file.data['date']
  return nil if date.nil?
  # The date is *exactly* the time where the `jekyll algolia` was run.
  # What a coincidence! It's a safe bet to assume that the original date
  # was nil and has been overwritten by Jekyll
  return nil if date.to_i == Jekyll::Algolia.start_time.to_i
  date.to_i
end

def self.excerpt_html(file)

Only collections (including posts) have an excerpt. Pages don't.

file - The Jekyll file

Public: Returns the HTML version of the excerpt

def self.excerpt_html(file)
  excerpt = excerpt_raw(file)
  return nil if excerpt.nil?
  return nil if excerpt.empty?
  excerpt.to_s.tr("\n", ' ').strip
end

def self.excerpt_raw(file)

error.
the potential Liquid error in the terminal, even if we catch the actual
silence all logger output as Jekyll is quite verbose and will display
This might throw an exception if the excerpt is invalid. We also

file - The Jekyll file

Jekyll. Swallow any error that could occur when reading.
Public: Returns the raw excerpt of a file, directly as returned by

def self.excerpt_raw(file)
  Logger.silent do
    return file.data['excerpt'].to_s
  end
rescue StandardError
  return nil
end

def self.excerpt_text(file)

Only collections (including posts) have an excerpt. Pages don't.

file - The Jekyll file

Public: Returns the text version of the excerpt

def self.excerpt_text(file)
  html = excerpt_html(file)
  Utils.html_to_text(html)
end

def self.excluded_by_user?(file)

or by defining a custom hook
Files can be excluded either by setting the `files_to_exclude` option,

file - The Jekyll file

Public: Check if the file has been excluded by the user

def self.excluded_by_user?(file)
  excluded_from_config?(file) || excluded_from_hook?(file)
end

def self.excluded_from_config?(file)

file - The Jekyll file

Public: Check if the file has been excluded by `files_to_exclude`

def self.excluded_from_config?(file)
  excluded_patterns = Configurator.algolia('files_to_exclude')
  jekyll_source = Configurator.get('source')
  # Transform the glob patterns into a real list of files
  excluded_files = []
  Dir.chdir(jekyll_source) do
    excluded_patterns.each do |pattern|
      Dir.glob(pattern).each do |match|
        excluded_files << File.expand_path(match)
      end
    end
  end
  excluded_files.include?(absolute_path(file))
end

def self.excluded_from_hook?(file)

file - The Jekyll file

hook
Public: Check if the file has been excluded by running a custom user

def self.excluded_from_hook?(file)
  Hooks.should_be_excluded?(file.path)
end

def self.indexable?(file)

all the static assets, only keep the actual content.
There are many reasons a file should not be indexed. We need to exclude

file - The Jekyll file

Public: Check if the file should be indexed

def self.indexable?(file)
  return false if static_file?(file)
  return false if is_404?(file)
  return false if pagination_page?(file)
  return false unless allowed_extension?(file)
  return false if excluded_by_user?(file)
  true
end

def self.is_404?(file)

rubocop:disable Naming/PredicateName

Source: https://help.github.com/articles/creating-a-custom-404-page-for-your-github-pages-site/
pages. We don't want to index those.
404 pages are not Jekyll defaults but a convention adopted by GitHub

file - The Jekyll file

Public: Check if the file is a 404 error page

def self.is_404?(file)
  File.basename(file.path, File.extname(file.path)) == '404'
end

def self.metadata(file)

slug, type and url
well as more specific fields like the collection name, date timestamp,
It contains both the raw metadata extracted from the front-matter, as

file - The Jekyll file

Public: Return a hash of all the file metadata

def self.metadata(file)
  raw_data = raw_data(file)
  specific_data = {
    collection: collection(file),
    date: date(file),
    excerpt_html: excerpt_html(file),
    excerpt_text: excerpt_text(file),
    slug: slug(file),
    type: type(file),
    url: url(file)
  }
  metadata = Utils.compact_empty(raw_data.merge(specific_data))
  metadata
end

def self.pagination_page?(file)

We don't want to index those
`jekyll-paginate` automatically creates pages to paginate through posts.

file - The Jekyll file

Public: Check if the page is a pagination page

def self.pagination_page?(file)
  # paginate_path contains a special `:num` part that is the page number
  # We convert that to a regexp
  paginate_path = Configurator.get('paginate_path')
  paginate_path_as_regexp = paginate_path.gsub(':num', '([0-9]*)')
  regexp = %r{#{paginate_path_as_regexp}/index\.html$}
  # Make sure all file paths start with a / for comparison
  filepath = file.path
  filepath = "/#{filepath}" unless filepath[0] == '/'
  Utils.match?(filepath, regexp)
end

def self.raw_data(file)

it will not be included in the data. It's always an empty array.
Note that even if you define tags and categories in a collection item,

def self.raw_data(file)
  data = file.data.clone
  # Remove all keys where we have a specific getter
  data.each_key do |key|
    data.delete(key) if respond_to?(key)
  end
  # Also delete keys we manually handle
  data.delete('excerpt')
  # Convert all keys to symbols
  data = Utils.keys_to_symbols(data)
  data
end

def self.relative_path(file)

we have a consistent way of accessing it
(pages) or as an absolute paths (posts and static assets). We make sure
Jekyll handles the .path property of some files as relative to the root

file - The Jekyll file to inspect

Public: Return the path of a Jekyll file relative to the Jekyll source

def self.relative_path(file)
  pathname = Pathname.new(file.path)
  return file.path if pathname.relative?
  jekyll_source = Pathname.new(Configurator.get('source'))
  pathname.relative_path_from(jekyll_source).cleanpath.to_s
end

def self.slug(file)

files, we have to create them from the basename
Slugs can be automatically extracted from collections, but for other

file - The Jekyll file

Public: Returns the slug of the file

def self.slug(file)
  # We get the real slug from the file data if available
  return file.data['slug'] if file.data.key?('slug')
  # We create it ourselves from the filepath otherwise
  File.basename(file.path, File.extname(file.path)).downcase
end

def self.static_file?(file)

We don't index static assets (js, css, images)

file - The Jekyll file

Public: Check if the specified file is a static Jekyll asset

def self.static_file?(file)
  file.is_a?(Jekyll::StaticFile)
end

def self.type(file)

Posts are a custom kind of Documents
Elements from a collection are called Documents
Pages are simple html and markdown documents in the tree

file - The Jekyll file

Public: Get the type of the document (page, post, collection, etc)

def self.type(file)
  type = file.class.name.split('::')[-1].downcase
  type = 'post' if type == 'document' && file.collection.label == 'posts'
  type
end

def self.url(file)

file - The Jekyll file

Public: Returns the url of the file, starting from the root

def self.url(file)
  file.url
end

Modules

Classes

module Jekyll::Algolia::FileBrowser