How to get the text of an FLC paragraph field

Description

When a Fusion Lifecycle (FLC) item has a field of type Paragraph and the value is formatted (e.g. special chars, bold text, italic, etc) the powerFLC cmdlets will return the raw HTML code instead of just the user readable text. This can make the text hard to read.

$changeOrders = Get-FLCItems -Workspace 'Change Orders'
$changeOrders[0].'Description of Change'
<#
<p><b>BoldText </b><<i> Italic</i></p><p></p><ul><li><i><u>Underlined</u></i></li></ul><p></p>
#>
 

To get to the text that would be displayed to the user in FLC, the HTML needs to be parsed / interpreted to get rid of the formatting and escaping. The easiest way to do this is to use a library. In this example the HtmlAgilityPack is used. The first step is to load the library from the current directory using the Add-Type cmdlet:

Add-Type -Path HtmlAgilityPack.dll
 

After loading the library an HtmlAgilityPack.HtmlDocument instance can be created which is used to parse the HTML and call the LoadHtml() method. With that it is possible to access the text contained in the HTML with the DocumentNode.InnerText properties. 

$doc = [HtmlAgilityPack.HtmlDocument]::new()
$doc.LoadHtml($changeOrders[0].'Description of Change')
$doc.DocumentNode.InnerText
<#
BoldText &lt; ItalicUnderlined
#>
 

The text still contains an encoded character (in this case &lt; which is an encoded < symbol). To finally decode all symbols in the string HtmlDecode can be used:

$text = [System.Web.HttpUtility]::HtmlDecode($doc.DocumentNode.InnerText)
$text
<#
BoldText < ItalicUnderlined
#>
 

It will return the same string but with all symbols decoded and should be similar to the text presented to the user in the FLC interface.

See also