Class HtmlTool
- java.lang.Object
-
- org.apache.velocity.tools.generic.SafeConfig
-
- org.devacfr.maven.skins.reflow.HtmlTool
-
@DefaultKey("htmlTool") public class HtmlTool extends org.apache.velocity.tools.generic.SafeConfigAn Apache Velocity tool that provides utility methods to manipulate HTML code using jsoup HTML5 parser.The methods utilise CSS selectors to refer to specific elements for manipulation.
- Since:
- 1.0
- Author:
- Andrius Velykis, Christophe Friederich
- See Also:
- jsoup HTML parser, jsoup CSS selectors
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static interfaceHtmlTool.ExtractResultA container to carry element extraction results.static interfaceHtmlTool.IdElementRepresentation of a HTML element with ID and a text content.static classHtmlTool.JoinSeparatorEnum indicating separator handling strategy for document partitioning.
-
Field Summary
Fields Modifier and Type Field Description static StringDEFAULT_SLUG_SEPARATORDefault separator using to generate slug heading name.
-
Constructor Summary
Constructors Constructor Description HtmlTool()
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description StringaddClass(String content, String selector, String className)Adds given class to the elements in HTML.StringaddClass(String content, String selector, List<String> classNames)Adds given class names to the elements in HTML.StringaddClass(String content, String selector, List<String> classNames, int amount)Adds given class names to the elements in HTML.StringaddClasses(String baseClass, String additionalClasses)Adds given class names to a base class name.StringaddClasses(String baseClass, String... additionalClasses)Adds given class names to a base class name.static List<String>concat(List<String> elements, String text, boolean append)Utility method to concatenate a String to a list of Strings.protected voidconfigure(org.apache.velocity.tools.generic.ValueParser values)StringensureHeadingIds(String pageType, String currentPage, String content, String idSeparator)Transforms the given HTML content by adding IDs to all heading elements (h1-6) that do not have one.HtmlTool.ExtractResultextract(String content, String selector, int amount)Extracts HTML elements from the main HTML content.StringfixTableHeads(String content)Fixes table heads: wraps rows with<th>(table heading) elements into<thead>element if they are currently in<tbody>.List<String>getAttr(String content, String selector, String attributeKey)Retrieves attribute value on elements in HTML.StringheadingAnchorToId(String content)Transforms the given HTML content by moving anchor (<a name="myheading">) names to IDs for heading elements.List<? extends HtmlTool.IdElement>headingTree(String content, List<String> sections)Reads all headings in the given HTML content as a hierarchy.Stringimage(ISkinConfig config, String src, String alt, String border, String width, String height)Stringlink(ISkinConfig config, String href, String name, String target, String className)Stringlink(ISkinConfig config, String href, String name, String target, String img, String icon, String className)StringnormaliseWhitespace(String html)Normalise the whitespace within this string; multiple spaces collapse to a single, and all whitespace characters (e.g. newline, tab) convert to a simple spaceorg.jsoup.nodes.Documentparse(String content)Parses body fragment to the<body>element.Stringremove(String content, String selector)Removes elements from HTML.StringreorderToTop(String content, String selector, int amount)Reorders elements in HTML content so that selected elements are found at the top of the content.StringreorderToTop(String content, String selector, int amount, String wrapRemaining)Reorders elements in HTML content so that selected elements are found at the top of the content.Stringreplace(String content, String selector, String replacement)Replaces elements in HTML.StringreplaceAll(String content, Map<String,String> replacements)Replaces elements in HTML.StringreplaceWith(String content, String selector, String newElement)Replaces All elements in HTML corresponding toselectorwhile preserving the content of this element.StringsetAttr(String content, String selector, String attributeKey, String value)Sets attribute to the given value on elements in HTML.static Stringslug(String input)Creates a slug (latin text with no whitespace or other symbols) for a longer text (i.e. to use in URLs).List<String>split(String content, String separatorCssSelector)Splits the given HTML content into partitions based on the given separator selector.List<String>split(String content, String separatorCssSelector, String separatorStrategy)Splits the given HTML content into partitions based on the given separator selector.List<String>split(String content, String separatorCssSelector, HtmlTool.JoinSeparator separatorStrategy)Splits the given HTML content into partitions based on the given separator selector.The separators are either dropped or joined with before/after depending on the indicated separator strategy.List<String>splitOnStarts(String content, String separatorCssSelector)Splits the given HTML content into partitions based on the given separator selector.List<String>text(String content, String selector)Retrieves text content of the selected elements in HTML.Stringwrap(String content, String selector, String wrapHtml, int amount)Wraps elements in HTML with the given HTML.
-
-
-
Field Detail
-
DEFAULT_SLUG_SEPARATOR
public static final String DEFAULT_SLUG_SEPARATOR
Default separator using to generate slug heading name.- See Also:
- Constant Field Values
-
-
Method Detail
-
configure
protected void configure(org.apache.velocity.tools.generic.ValueParser values)
- Overrides:
configurein classorg.apache.velocity.tools.generic.SafeConfig- See Also:
SafeConfig.configure(ValueParser)
-
normaliseWhitespace
@Nullable public String normaliseWhitespace(@Nullable String html)
Normalise the whitespace within this string; multiple spaces collapse to a single, and all whitespace characters (e.g. newline, tab) convert to a simple space- Parameters:
html- html content to normalise.- Returns:
- Returns normalised string.
-
split
public List<String> split(@Nonnull String content, @Nonnull String separatorCssSelector)
Splits the given HTML content into partitions based on the given separator selector. The separators themselves are dropped from the results.- Parameters:
content- body HTML content to split (can not be empty ornull).separatorCssSelector- CSS selector for separators (can not be empty ornull).- Returns:
- a list of HTML partitions split on separator locations, but without the separators.
- Since:
- 1.0
- See Also:
split(String, String, JoinSeparator)
-
splitOnStarts
public List<String> splitOnStarts(@Nonnull String content, @Nonnull String separatorCssSelector)
Splits the given HTML content into partitions based on the given separator selector. The separators are kept as first elements of the partitions.Note that the first part is removed if the split was successful. This is because the first part does not include the separator.
- Parameters:
content- HTML content to splitseparatorCssSelector- CSS selector for separators- Returns:
- a list of HTML partitions split on separator locations (except the first one), with separators at the beginning of each partition
- Since:
- 1.0
- See Also:
split(String, String, JoinSeparator)
-
split
public List<String> split(@Nonnull String content, @Nonnull String separatorCssSelector, String separatorStrategy)
Splits the given HTML content into partitions based on the given separator selector. The separators are either dropped or joined with before/after depending on the indicated separator strategy.- Parameters:
content- HTML content to splitseparatorCssSelector- CSS selector for separatorsseparatorStrategy- strategy to drop or keep separators, one of "after", "before" or "no"- Returns:
- a list of HTML partitions split on separator locations.
- Since:
- 1.0
- See Also:
split(String, String, JoinSeparator)
-
split
public List<String> split(@Nonnull String content, @Nonnull String separatorCssSelector, @Nonnull HtmlTool.JoinSeparator separatorStrategy)
Splits the given HTML content into partitions based on the given separator selector.The separators are either dropped or joined with before/after depending on the indicated separator strategy.Note that splitting algorithm tries to resolve nested elements so that returned partitions are self-contained HTML elements. The nesting is normally contained within the first applicable partition.
- Parameters:
content- Body HTML content to splitseparatorCssSelector- CSS selector for separatorsseparatorStrategy- strategy to drop or keep separators- Returns:
- a list of HTML partitions split on separator locations. If no splitting occurs, returns the original content as the single element of the list
- Since:
- 1.0
-
reorderToTop
public String reorderToTop(String content, String selector, int amount)
Reorders elements in HTML content so that selected elements are found at the top of the content. Can be limited to a certain amount, e.g. to bring just the first of selected elements to the top.- Parameters:
content- HTML content to reorderselector- CSS selector for elements to bring to top of the contentamount- Maximum number of elements to reorder- Returns:
- HTML content with reordered elements, or the original content if no such elements found.
- Since:
- 1.0
-
reorderToTop
public String reorderToTop(String content, String selector, int amount, String wrapRemaining)
Reorders elements in HTML content so that selected elements are found at the top of the content. Can be limited to a certain amount, e.g. to bring just the first of selected elements to the top.- Parameters:
content- HTML content to reorderselector- CSS selector for elements to bring to top of the contentamount- Maximum number of elements to reorderwrapRemaining- HTML to wrap the remaining (non-reordered) part- Returns:
- HTML content with reordered elements, or the original content if no such elements found.
- Since:
- 1.0
-
extract
@Nonnull public HtmlTool.ExtractResult extract(String content, String selector, int amount)
Extracts HTML elements from the main HTML content. The result consists of the extracted HTML elements and the remainder of HTML content, with these elements removed. Can be limited to a certain amount, e.g. to extract just the first of selected elements.- Parameters:
content- HTML content to extract elements fromselector- CSS selector for elements to extractamount- Maximum number of elements to extract- Returns:
- HTML content of the extracted elements together with the remainder of the original content. If no elements are found, the remainder contains the original content.
- Since:
- 1.0
-
setAttr
public String setAttr(String content, String selector, String attributeKey, String value)
Sets attribute to the given value on elements in HTML.- Parameters:
content- HTML content to set attributes onselector- CSS selector for elements to modifyattributeKey- Attribute namevalue- Attribute value- Returns:
- HTML content with modified elements. If no elements are found, the original content is returned.
- Since:
- 1.0
-
parse
public org.jsoup.nodes.Document parse(@Nonnull String content)
Parses body fragment to the<body>element.- Parameters:
content- body HTML fragment (can not benull).- Returns:
- the
bodyelement of the parsed content
-
getAttr
public List<String> getAttr(String content, String selector, String attributeKey)
Retrieves attribute value on elements in HTML. Will return all attribute values for the selector, since there can be more than one element.- Parameters:
content- HTML content to read attributes fromselector- CSS selector for elements to findattributeKey- Attribute name- Returns:
- Attribute values for all matching elements. If no elements are found, empty list is returned.
- Since:
- 1.0
-
addClasses
@Nonnull public String addClasses(@Nonnull String baseClass, @Nonnull String additionalClasses)
Adds given class names to a base class name.- Parameters:
baseClass- Base class nameadditionalClasses- Additional class names- Returns:
- Combined class names
-
addClasses
@Nonnull public String addClasses(@Nonnull String baseClass, @Nonnull String... additionalClasses)
Adds given class names to a base class name.- Parameters:
baseClass- Base class nameadditionalClasses- Additional class names- Returns:
- Combined class names
-
addClass
public String addClass(String content, String selector, List<String> classNames, int amount)
Adds given class names to the elements in HTML.- Parameters:
content- HTML content to modifyselector- CSS selector for elements to add classes toclassNames- Names of classes to add to the selected elementsamount- Maximum number of elements to modify- Returns:
- HTML content with modified elements. If no elements are found, the original content is returned.
- Since:
- 1.0
-
addClass
public String addClass(String content, String selector, List<String> classNames)
Adds given class names to the elements in HTML.- Parameters:
content- HTML content to modifyselector- CSS selector for elements to add classes toclassNames- Names of classes to add to the selected elements- Returns:
- HTML content with modified elements. If no elements are found, the original content is returned.
- Since:
- 1.0
-
addClass
public String addClass(String content, String selector, String className)
Adds given class to the elements in HTML.- Parameters:
content- HTML content to modifyselector- CSS selector for elements to add the class toclassName- Name of class to add to the selected elements- Returns:
- HTML content with modified elements. If no elements are found, the original content is returned.
- Since:
- 1.0
-
wrap
public String wrap(String content, String selector, String wrapHtml, int amount)
Wraps elements in HTML with the given HTML.- Parameters:
content- HTML content to modifyselector- CSS selector for elements to wrapwrapHtml- HTML to use for wrapping the selected elementsamount- Maximum number of elements to modify- Returns:
- HTML content with modified elements. If no elements are found, the original content is returned.
- Since:
- 1.0
-
remove
public String remove(String content, String selector)
Removes elements from HTML.- Parameters:
content- HTML content to modifyselector- CSS selector for elements to remove- Returns:
- HTML content with removed elements. If no elements are found, the original content is returned.
- Since:
- 1.0
-
replace
public String replace(String content, String selector, String replacement)
Replaces elements in HTML.- Parameters:
content- HTML content to modifyselector- CSS selector for elements to replacereplacement- HTML replacement (must parse to a single element)- Returns:
- HTML content with replaced elements. If no elements are found, the original content is returned.
- Since:
- 1.0
-
replaceAll
public String replaceAll(String content, Map<String,String> replacements)
Replaces elements in HTML.- Parameters:
content- HTML content to modifyreplacements- Map of CSS selectors to their replacement HTML texts. CSS selectors find elements to be replaced with the HTML in the mapping. The HTML must parse to a single element.- Returns:
- HTML content with replaced elements. If no elements are found, the original content is returned.
- Since:
- 1.0
-
replaceWith
public String replaceWith(String content, String selector, String newElement)
Replaces All elements in HTML corresponding toselectorwhile preserving the content of this element.- Parameters:
content- HTML content to modifyselector- CSS selector for elements to replacenewElement- HTML replacement (must parse to a single element)- Returns:
- HTML content with replaced elements. If no elements are found, the original content is returned.
- Since:
- 2.0
-
text
public List<String> text(@Nullable String content, @Nonnull String selector)
Retrieves text content of the selected elements in HTML. Renders the element's text as it would be displayed on the web page (including its children).- Parameters:
content- HTML content with the elementsselector- CSS selector for elements to extract contents- Returns:
- A list of element texts as rendered to display. Empty list if no elements are found.
- Since:
- 1.0
-
link
public String link(ISkinConfig config, String href, String name, String target, String className)
-
link
public String link(ISkinConfig config, String href, String name, String target, String img, String icon, String className)
-
image
public String image(ISkinConfig config, String src, String alt, String border, String width, String height)
-
headingAnchorToId
public String headingAnchorToId(String content)
Transforms the given HTML content by moving anchor (<a name="myheading">) names to IDs for heading elements.The anchors are used to indicate positions within a HTML page. In HTML5, however, the
nameattribute is no longer supported on<a>) tag. The positions within pages are indicated usingidattribute instead, e.g.<h1 id="myheading">.The method finds anchors inside, immediately before or after the heading tags and uses their name as heading
idinstead. The anchors themselves are removed.- Parameters:
content- HTML content to modify- Returns:
- HTML content with modified elements. Anchor names are used for adjacent headings, and anchor tags are removed. If no elements are found, the original content is returned.
- Since:
- 1.0
-
concat
public static List<String> concat(List<String> elements, String text, boolean append)
Utility method to concatenate a String to a list of Strings. The text can be either appended or prepended.- Parameters:
elements- list of elements to append/prepend the text totext- the given text to append/prependappend- iftrue, text will be appended to the elements. Iffalse, it will be prepended- Returns:
- list of elements with the text appended/prepended
- Since:
- 1.0
-
ensureHeadingIds
public String ensureHeadingIds(String pageType, String currentPage, String content, String idSeparator)
Transforms the given HTML content by adding IDs to all heading elements (h1-6) that do not have one.IDs on heading elements are used to indicate positions within a HTML page in HTML5. If a heading tag without an
idis found, its "slug" is generated automatically based on the heading contents and used as the ID.Note that the algorithm also modifies existing IDs that have symbols not allowed in CSS selectors, e.g. ":", ".", etc. The symbols are removed.
- Parameters:
pageType- The type of page.currentPage- The name of current page.content- HTML content to modify.idSeparator- the seperator used to slug ID.- Returns:
- Returns a
Stringrepresenting HTML content with all heading elements havingidattributes. If all headings were with IDs already, the original content is returned. - Since:
- 1.0
-
fixTableHeads
public String fixTableHeads(String content)
Fixes table heads: wraps rows with<th>(table heading) elements into<thead>element if they are currently in<tbody>.- Parameters:
content- HTML content to modify- Returns:
- HTML content with all table heads fixed. If all heads were correct, the original content is returned.
- Since:
- 1.0
-
slug
public static String slug(String input)
Creates a slug (latin text with no whitespace or other symbols) for a longer text (i.e. to use in URLs). Uses "-" as a whitespace separator.- Parameters:
input- text to generate the slug from- Returns:
- the slug of the given text that contains alphanumeric symbols and "-" only
- Since:
- 1.0
-
headingTree
public List<? extends HtmlTool.IdElement> headingTree(String content, List<String> sections)
Reads all headings in the given HTML content as a hierarchy. Subsequent smaller headings are nested within bigger ones, e.g.<h2>is nested under preceding<h1>.Only headings with IDs are included in the hierarchy. The result elements contain ID and heading text for each heading. The hierarchy is useful to generate a Table of Contents for a page.
- Parameters:
content- HTML content to extract heading hierarchy fromsections- list of all sections- Returns:
- a list of top-level heading items (with id and text). The remaining headings are nested within these top-level items. Empty list if no headings are in the content.
- Since:
- 1.0
-
-