@DefaultKey(value="htmlTool")
public class HtmlTool
extends org.apache.velocity.tools.generic.SafeConfig
The methods utilise CSS selectors to refer to specific elements for manipulation.
Modifier and Type | Class and Description |
---|---|
static interface |
HtmlTool.ExtractResult
A container to carry element extraction results.
|
static interface |
HtmlTool.IdElement
Representation of a HTML element with ID and a text content.
|
static class |
HtmlTool.JoinSeparator
Enum indicating separator handling strategy for document partitioning.
|
Modifier and Type | Field and Description |
---|---|
static String |
DEFAULT_SLUG_SEPARATOR
Default separator using to generate slug heading name.
|
Constructor and Description |
---|
HtmlTool() |
Modifier and Type | Method and Description |
---|---|
String |
addClass(String content,
String selector,
List<String> classNames)
Adds given class names to the elements in HTML.
|
String |
addClass(String content,
String selector,
List<String> classNames,
int amount)
Adds given class names to the elements in HTML.
|
String |
addClass(String content,
String selector,
String className)
Adds given class to the elements in HTML.
|
static List<String> |
concat(List<String> elements,
String text,
boolean append)
Utility method to concatenate a String to a list of Strings.
|
protected void |
configure(org.apache.velocity.tools.generic.ValueParser values) |
String |
ensureHeadingIds(String pageType,
String currentPage,
String content,
String idSeparator)
Transforms the given HTML content by adding IDs to all heading elements (
h1-6 ) that do not have one. |
HtmlTool.ExtractResult |
extract(String content,
String selector,
int amount)
Extracts HTML elements from the main HTML content.
|
String |
fixTableHeads(String content)
Fixes table heads: wraps rows with
<th> (table heading) elements into <thead> element if they are currently in <tbody> . |
List<String> |
getAttr(String content,
String selector,
String attributeKey)
Retrieves attribute value on elements in HTML.
|
String |
headingAnchorToId(String content)
Transforms the given HTML content by moving anchor (
<a name="myheading"> ) names to IDs for heading
elements. |
List<? extends HtmlTool.IdElement> |
headingTree(String content,
List<String> sections)
Reads all headings in the given HTML content as a hierarchy.
|
String |
normaliseWhitespace(String html)
Normalise the whitespace within this string; multiple spaces collapse to a single, and all whitespace characters
(e.g. newline, tab) convert to a simple space
|
org.jsoup.nodes.Document |
parse(String content)
Parses body fragment to the
<body> element. |
String |
remove(String content,
String selector)
Removes elements from HTML.
|
String |
reorderToTop(String content,
String selector,
int amount)
Reorders elements in HTML content so that selected elements are found at the top of the content.
|
String |
reorderToTop(String content,
String selector,
int amount,
String wrapRemaining)
Reorders elements in HTML content so that selected elements are found at the top of the content.
|
String |
replace(String content,
String selector,
String replacement)
Replaces elements in HTML.
|
String |
replaceAll(String content,
Map<String,String> replacements)
Replaces elements in HTML.
|
String |
replaceWith(String content,
String selector,
String newElement)
Replaces All elements in HTML corresponding to
selector while preserving the content of this
element. |
String |
setAttr(String content,
String selector,
String attributeKey,
String value)
Sets attribute to the given value on elements in HTML.
|
static String |
slug(String input)
Creates a slug (latin text with no whitespace or other symbols) for a longer text (i.e. to use in URLs).
|
List<String> |
split(String content,
String separatorCssSelector)
Splits the given HTML content into partitions based on the given separator selector.
|
List<String> |
split(String content,
String separatorCssSelector,
HtmlTool.JoinSeparator separatorStrategy)
Splits the given HTML content into partitions based on the given separator selector.The separators are either
dropped or joined with before/after depending on the indicated separator strategy.
|
List<String> |
split(String content,
String separatorCssSelector,
String separatorStrategy)
Splits the given HTML content into partitions based on the given separator selector.
|
List<String> |
splitOnStarts(String content,
String separatorCssSelector)
Splits the given HTML content into partitions based on the given separator selector.
|
List<String> |
text(String content,
String selector)
Retrieves text content of the selected elements in HTML.
|
String |
wrap(String content,
String selector,
String wrapHtml,
int amount)
Wraps elements in HTML with the given HTML.
|
public static final String DEFAULT_SLUG_SEPARATOR
protected void configure(org.apache.velocity.tools.generic.ValueParser values)
configure
in class org.apache.velocity.tools.generic.SafeConfig
SafeConfig.configure(ValueParser)
@Nullable public String normaliseWhitespace(@Nullable String html)
html
- html content to normalise.public List<String> split(@Nonnull String content, @Nonnull String separatorCssSelector)
content
- body HTML content to split (can not be empty or null
).separatorCssSelector
- CSS selector for separators (can not be empty or null
).split(String, String, JoinSeparator)
public List<String> splitOnStarts(@Nonnull String content, @Nonnull String separatorCssSelector)
Note that the first part is removed if the split was successful. This is because the first part does not include the separator.
content
- HTML content to splitseparatorCssSelector
- CSS selector for separatorssplit(String, String, JoinSeparator)
public List<String> split(@Nonnull String content, @Nonnull String separatorCssSelector, String separatorStrategy)
content
- HTML content to splitseparatorCssSelector
- CSS selector for separatorsseparatorStrategy
- strategy to drop or keep separators, one of "after", "before" or "no"split(String, String, JoinSeparator)
public List<String> split(@Nonnull String content, @Nonnull String separatorCssSelector, @Nonnull HtmlTool.JoinSeparator separatorStrategy)
Note that splitting algorithm tries to resolve nested elements so that returned partitions are self-contained HTML elements. The nesting is normally contained within the first applicable partition.
content
- Body HTML content to splitseparatorCssSelector
- CSS selector for separatorsseparatorStrategy
- strategy to drop or keep separatorspublic String reorderToTop(String content, String selector, int amount)
content
- HTML content to reorderselector
- CSS selector for elements to bring to top of the contentamount
- Maximum number of elements to reorderpublic String reorderToTop(String content, String selector, int amount, String wrapRemaining)
content
- HTML content to reorderselector
- CSS selector for elements to bring to top of the contentamount
- Maximum number of elements to reorderwrapRemaining
- HTML to wrap the remaining (non-reordered) part@Nonnull public HtmlTool.ExtractResult extract(String content, String selector, int amount)
content
- HTML content to extract elements fromselector
- CSS selector for elements to extractamount
- Maximum number of elements to extractpublic String setAttr(String content, String selector, String attributeKey, String value)
content
- HTML content to set attributes onselector
- CSS selector for elements to modifyattributeKey
- Attribute namevalue
- Attribute valuepublic org.jsoup.nodes.Document parse(@Nonnull String content)
<body>
element.content
- body HTML fragment (can not be null
).body
element of the parsed contentpublic List<String> getAttr(String content, String selector, String attributeKey)
content
- HTML content to read attributes fromselector
- CSS selector for elements to findattributeKey
- Attribute namepublic String addClass(String content, String selector, List<String> classNames, int amount)
content
- HTML content to modifyselector
- CSS selector for elements to add classes toclassNames
- Names of classes to add to the selected elementsamount
- Maximum number of elements to modifypublic String addClass(String content, String selector, List<String> classNames)
content
- HTML content to modifyselector
- CSS selector for elements to add classes toclassNames
- Names of classes to add to the selected elementspublic String addClass(String content, String selector, String className)
content
- HTML content to modifyselector
- CSS selector for elements to add the class toclassName
- Name of class to add to the selected elementspublic String wrap(String content, String selector, String wrapHtml, int amount)
content
- HTML content to modifyselector
- CSS selector for elements to wrapwrapHtml
- HTML to use for wrapping the selected elementsamount
- Maximum number of elements to modifypublic String remove(String content, String selector)
content
- HTML content to modifyselector
- CSS selector for elements to removepublic String replace(String content, String selector, String replacement)
content
- HTML content to modifyselector
- CSS selector for elements to replacereplacement
- HTML replacement (must parse to a single element)public String replaceAll(String content, Map<String,String> replacements)
content
- HTML content to modifyreplacements
- Map of CSS selectors to their replacement HTML texts. CSS selectors find elements to be replaced with
the HTML in the mapping. The HTML must parse to a single element.public String replaceWith(String content, String selector, String newElement)
selector
while preserving the content of this
element.content
- HTML content to modifyselector
- CSS selector for elements to replacenewElement
- HTML replacement (must parse to a single element)public List<String> text(@Nullable String content, @Nonnull String selector)
content
- HTML content with the elementsselector
- CSS selector for elements to extract contentspublic String headingAnchorToId(String content)
<a name="myheading">
) names to IDs for heading
elements.
The anchors are used to indicate positions within a HTML page. In HTML5, however, the name
attribute is
no longer supported on <a>
) tag. The positions within pages are indicated using id
attribute
instead, e.g. <h1 id="myheading">
.
The method finds anchors inside, immediately before or after the heading tags and uses their name as heading
id
instead. The anchors themselves are removed.
content
- HTML content to modifypublic static List<String> concat(List<String> elements, String text, boolean append)
elements
- list of elements to append/prepend the text totext
- the given text to append/prependappend
- if true
, text will be appended to the elements. If false
, it will be prependedpublic String ensureHeadingIds(String pageType, String currentPage, String content, String idSeparator)
h1-6
) that do not have one.
IDs on heading elements are used to indicate positions within a HTML page in HTML5. If a heading tag without an
id
is found, its "slug" is generated automatically based on the heading contents and used as the ID.
Note that the algorithm also modifies existing IDs that have symbols not allowed in CSS selectors, e.g. ":", ".", etc. The symbols are removed.
pageType
- The type of page.currentPage
- The name of current page.content
- HTML content to modify.idSeparator
- the seperator used to slug ID.String
representing HTML content with all heading elements having id
attributes. If all headings were with IDs already, the original content is returned.public String fixTableHeads(String content)
<th>
(table heading) elements into <thead>
element if they are currently in <tbody>
.content
- HTML content to modifypublic static String slug(String input)
input
- text to generate the slug frompublic List<? extends HtmlTool.IdElement> headingTree(String content, List<String> sections)
<h2>
is nested under preceding <h1>
.
Only headings with IDs are included in the hierarchy. The result elements contain ID and heading text for each heading. The hierarchy is useful to generate a Table of Contents for a page.
content
- HTML content to extract heading hierarchy fromsections
- list of all sectionsCopyright © 2012–2023 Friederich Christophe. All rights reserved.