Package 'htmltomarkdown'

Title: High-Performance HTML to Markdown Converter
Description: High-performance HTML to Markdown converter powered by a Rust core engine via 'extendr'. Supports full conversion options, metadata extraction, inline image extraction, and visitor-based conversion. Provides bindings to the 'html-to-markdown' Rust library for native performance.
Authors: Na'aman Hirschfeld [aut, cre], The authors of the dependency Rust crates [ctb] (see inst/AUTHORS file for details)
Maintainer: Na'aman Hirschfeld <[email protected]>
License: MIT + file LICENSE
Version: 3.2.2
Built: 2026-04-16 15:27:22 UTC
Source: https://github.com/kreuzberg-dev/html-to-markdown

Help Index


Create conversion options for html-to-markdown.

Description

Returns a named list of conversion options to pass to conversion functions. All parameters are optional; NULL values are omitted.

Usage

conversion_options(
  heading_style = NULL,
  list_indent_type = NULL,
  list_indent_width = NULL,
  bullets = NULL,
  strong_em_symbol = NULL,
  escape_asterisks = NULL,
  escape_underscores = NULL,
  escape_misc = NULL,
  escape_ascii = NULL,
  code_language = NULL,
  encoding = NULL,
  autolinks = NULL,
  default_title = NULL,
  keep_inline_images_in = NULL,
  br_in_tables = NULL,
  hocr_spatial_tables = NULL,
  highlight_style = NULL,
  extract_metadata = NULL,
  whitespace_mode = NULL,
  strip_newlines = NULL,
  wrap = NULL,
  wrap_width = NULL,
  strip_tags = NULL,
  preserve_tags = NULL,
  convert_as_inline = NULL,
  sub_symbol = NULL,
  sup_symbol = NULL,
  newline_style = NULL,
  code_block_style = NULL,
  preprocessing = NULL,
  debug = NULL
)

Arguments

heading_style

Style for headings: "atx", "atx_closed", or "underlined".

list_indent_type

Indent type for lists: "spaces" or "tabs".

list_indent_width

Number of spaces/tabs per indent level.

bullets

Characters to use for bullet points.

strong_em_symbol

Character for strong/emphasis: "*" or "_".

escape_asterisks

Whether to escape asterisks.

escape_underscores

Whether to escape underscores.

escape_misc

Whether to escape miscellaneous characters.

escape_ascii

Whether to escape ASCII characters.

code_language

Default language for code blocks.

encoding

Input encoding (e.g., "utf-8").

autolinks

Whether to use autolinks for URLs.

default_title

Whether to use default title attributes.

keep_inline_images_in

Tags to preserve inline images in.

br_in_tables

Whether to use br tags in table cells.

hocr_spatial_tables

Whether to use HOCR spatial table layout.

highlight_style

Highlight style: "double_equal", "html", "bold", "none".

extract_metadata

Whether to extract metadata.

whitespace_mode

Whitespace handling: "normalized" or "strict".

strip_newlines

Whether to strip newlines.

wrap

Whether to wrap text.

wrap_width

Maximum line width for wrapping.

strip_tags

Tags to strip from output.

preserve_tags

Tags to preserve in output.

convert_as_inline

Whether to convert as inline.

sub_symbol

Symbol for subscript.

sup_symbol

Symbol for superscript.

newline_style

Newline style: "spaces" or "backslash".

code_block_style

Code block style: "indented", "backticks", or "tildes".

preprocessing

Named list with preprocessing options.

debug

Whether to enable debug output.

Value

A named list of conversion options.


Convert HTML to Markdown.

Description

Convert HTML to Markdown.

Usage

convert(html)

Arguments

html

A character string of HTML content.

Value

A character string of Markdown content.


Get the version of the html-to-markdown Rust core.

Description

Get the version of the html-to-markdown Rust core.

Usage

version()

Value

A character string with the version.