Text truncating function that secures both HTML tags and full words

Akshaya K Sahu’s answer to this question at Stack Overflow is a great example of text parsing function that can truncate any HTML-encoded string at given length, taking care of all the needed aspects, i.e.:

  • full words,
  • properly closed HTML tags and
  • respected UTF-8 encoding (double-byte characters!)

I have actually nothing to add to it, so I keep a copy of this code only for my own reference. And only because the original answer lacks some comments.

 * Truncates string at given length with both HTML tags and full words secured.
 * Note that process of "securing" both HTML tags and full words in truncated
 * string is quite complicated process. So, don't expect that your string will
 * be truncated at exactly given length. Treat this value as an approximate one.
 * Source: http://stackoverflow.com/a/8741240/1469208
 * @param  string  $html      Input string containing HTML code.
 * @param  integer $maxLength Approximate length of string. See above notice.
 * @return string             Parsed string, truncated at given length.
public static function htmlSafeStringTruncate($html, $maxLength = 150)
    $printedLength = 0;
    $position = 0;
    $tags = array();
    $newContent = '';

    $html = $content = preg_replace("/<img [^/>]+>/i", "", $html);

    while($printedLength < $maxLength && preg_match('{</?([a-z]+)[^>]*>|&#?[a-zA-Z0-9]+;}', $html, $match, PREG_OFFSET_CAPTURE, $position))
        list($tag, $tagPosition) = $match[0];

         * Print text leading up to the tag.
        $str = mb_strcut($html, $position, $tagPosition - $position);

        if($printedLength + mb_strlen($str) > $maxLength)
            $newstr = mb_strcut($str, 0, $maxLength - $printedLength);
            $newstr = preg_replace('~s+S+$~', '', $newstr);  
            $newContent .= $newstr;
            $printedLength = $maxLength;


        $newContent .= $str;
        $printedLength += mb_strlen($str);

        if($tag[0] == '&')
             * Handle the entity.
            $newContent .= $tag;

             * Handle the tag.
            $tagName = $match[1][0];

            if($tag[1] == '/')
                * This is a closing tag.
                $openingTag = array_pop($tags);

                assert($openingTag == $tagName);

                $newContent .= $tag;
            else if($tag[mb_strlen($tag) - 2] == '/')
                 * This is a self-closing tag.
                $newContent .= $tag;
                * This is an opening tag.
                $newContent .= $tag;

                $tags[] = $tagName;

       * Continue after the tag.
      $position = $tagPosition + mb_strlen($tag);

     * Print any remaining text.
    if ($printedLength < $maxLength && $position < mb_strlen($html))
        $newstr = mb_strcut($html, $position, $maxLength - $printedLength);
        $newstr = preg_replace('~s+S+$~', '', $newstr);

        $newContent .= $newstr;

     * Close any remaining open tags.
    while (!empty($tags))
        $newContent .= sprintf('</%s>', array_pop($tags));

    return $newContent;

Leave a Reply