module Jekyll::Algolia::FileBrowser
def self.absolute_path(filepath)
Public: Return the absolute path of a Jekyll file
def self.absolute_path(filepath) pathname = Pathname.new(filepath) return pathname.cleanpath.to_s if pathname.absolute? File.expand_path(File.join(Configurator.get('source'), filepath)) end
def self.allowed_extension?(file)
and raw HTML files but this list can be extended using the
can convert many more file formats. By default we'll only index markdown
Jekyll can transform markdown files to HTML by default. With plugins, it
file - The Jekyll file
Public: Check if the file has one of the allowed extensions
def self.allowed_extension?(file) extensions = Configurator.extensions_to_index extname = File.extname(file.path)[1..-1] extensions.include?(extname) end
def self.categories(file)
Public: Returns the list of tags of a file, defaults to an empty array
def self.categories(file) file.data['categories'] || [] end
def self.collection(file)
are purposefully excluded from it as well even if they are technically
Only collection documents can have a collection name. Pages don't. Posts
file - The Jekyll file
Public: Returns the name of the collection
def self.collection(file) return nil unless file.respond_to?(:collection) collection_name = file.collection.label # Posts are a special kind of collection, but it's an implementation # detail from my POV, so I'll exclude them return nil if collection_name == 'posts' collection_name end
def self.date(file)
Pages and other collection items can only have a date set in
Posts have their date coming from the filepath, or the front-matter.
file - The Jekyll file
Public: Returns a timestamp of the file date
def self.date(file) # Collections get their date from .date, while pages read it from .data. # Jekyll by default will set the date of collection to the current date, # but we overwrote this. date = if file.respond_to?(:date) file.date else file.data['date'] end return nil if date.nil? date.to_time.to_i end
def self.excerpt_html(file)
Public: Returns the HTML version of the excerpt
def self.excerpt_html(file) # If it's a post with a custom separator for the excerpt, we honor it return excerpt_raw(file) if use_default_excerpt?(file) # Otherwise we take the first matching node html = file.content selector = Configurator.algolia('nodes_to_index') first_node = Nokogiri::HTML(html).css(selector).first return nil if first_node.nil? first_node.to_s end
def self.excerpt_raw(file)
the potential Liquid error in the terminal, even if we catch the actual
silence all logger output as Jekyll is quite verbose and will display
This might throw an exception if the excerpt is invalid. We also
file - The Jekyll file
Jekyll. Swallow any error that could occur when reading.
Public: Returns the raw excerpt of a file, directly as returned by
def self.excerpt_raw(file) Logger.silent do return file.data['excerpt'].to_s.strip end rescue StandardError nil end
def self.excerpt_text(file)
file - The Jekyll file
Public: Returns the text version of the excerpt
def self.excerpt_text(file) html = excerpt_html(file) Utils.html_to_text(html) end
def self.excluded_from_config?(file)
Public: Check if the file has been excluded by `files_to_exclude`
def self.excluded_from_config?(file) excluded_patterns = Configurator.algolia('files_to_exclude') jekyll_source = Configurator.get('source') path = absolute_path(file.path) excluded_patterns.each do |pattern| pattern = File.expand_path(File.join(jekyll_source, pattern)) return true if File.fnmatch(pattern, path, File::FNM_PATHNAME) end false end
def self.excluded_from_hook?(file)
hook
Public: Check if the file has been excluded by running a custom user
def self.excluded_from_hook?(file) Hooks.should_be_excluded?(file.path) end
def self.indexable?(file)
There are many reasons a file should not be indexed. We need to exclude
file - The Jekyll file
Public: Check if the file should be indexed
def self.indexable?(file) return false if static_file?(file) return false if is_404?(file) return false if redirect?(file) return false unless allowed_extension?(file) return false if excluded_from_config?(file) return false if excluded_from_hook?(file) true end
def self.is_404?(file)
Source: https://help.github.com/articles/creating-a-custom-404-page-for-your-github-pages-site/
pages. We don't want to index those.
404 pages are not Jekyll defaults but a convention adopted by GitHub
file - The Jekyll file
Public: Check if the file is a 404 error page
def self.is_404?(file) ['404.md', '404.html'].include?(File.basename(file.path)) end
def self.metadata(file)
well as more specific fields like the collection name, date timestamp,
It contains both the raw metadata extracted from the front-matter, as
file - The Jekyll file
Public: Return a hash of all the file metadata
def self.metadata(file) raw_data = raw_data(file) specific_data = { collection: collection(file), tags: tags(file), categories: categories(file), date: date(file), excerpt_html: excerpt_html(file), excerpt_text: excerpt_text(file), slug: slug(file), type: type(file), url: url(file) } metadata = Utils.compact_empty(raw_data.merge(specific_data)) metadata end
def self.raw_data(file)
Note that even if you define tags and categories in a collection item,
def self.raw_data(file) data = file.data.clone # Remove all keys where we have a specific getter data.each_key do |key| data.delete(key) if respond_to?(key) end data.delete('excerpt') # Delete other keys added by Jekyll that are not in the front-matter and # not needed for search data.delete('draft') data.delete('ext') # Convert all values to a version that can be serialized to JSON data = Utils.jsonify(data) # Convert all keys to symbols data = Utils.keys_to_symbols(data) data end
def self.redirect?(file)
an HTML meta refresh. We need to exclude those files from indexing.
Plugins like jekyll-redirect-from add dynamic pages that only contain
file - The Jekyll file
Public: Check if the file is redirect page
def self.redirect?(file) file.respond_to?(:name) && file.name == 'redirect.html' end
def self.relative_path(filepath)
Public: Return the path of a Jekyll file relative to the Jekyll source
def self.relative_path(filepath) pathname = Pathname.new(filepath) config_source = Configurator.get('source') || '' jekyll_source = Pathname.new(File.expand_path(config_source)) # Removing any starting ./ if pathname.relative? fullpath = File.expand_path(File.join(jekyll_source, pathname)) return fullpath.gsub(%r{^#{jekyll_source}/}, '') end pathname.relative_path_from(jekyll_source).cleanpath.to_s end
def self.slug(file)
Slugs can be automatically extracted from collections, but for other
file - The Jekyll file
Public: Returns the slug of the file
def self.slug(file) # We get the real slug from the file data if available return file.data['slug'] if file.data.key?('slug') # We create it ourselves from the filepath otherwise File.basename(file.path, File.extname(file.path)).downcase end
def self.static_file?(file)
file - The Jekyll file
Public: Check if the specified file is a static Jekyll asset
def self.static_file?(file) file.is_a?(Jekyll::StaticFile) end
def self.tags(file)
Public: Returns the list of tags of a file, defaults to an empty array
def self.tags(file) file.data['tags'] || [] end
def self.type(file)
Elements from a collection are called Documents
Pages are simple html and markdown documents in the tree
file - The Jekyll file
Public: Get the type of the document (page, post, collection, etc)
def self.type(file) type = file.class.name.split('::')[-1].downcase type = 'post' if type == 'document' && file.collection.label == 'posts' type end
def self.url(file)
Public: Returns the url of the file, starting from the root
def self.url(file) file.url end
def self.use_default_excerpt?(file)
element), but in some cases, we'll fallback to Jekyll's default excerpt
Most of the time, we'll use our own excerpt (the first matching
file - The Jekyll file
this file
Public: Return true if the Jekyll default excerpt should be used for
def self.use_default_excerpt?(file) # Only posts can have excerpt return false unless type(file) == 'post' # User defined their own separator in the config custom_separator = file.excerpt_separator.to_s.strip return false if custom_separator.empty? # This specific post contains this separator file.content.include?(custom_separator) end