A modified sentenceNormalizer by gregomm
Features:
1- Removes duplicated question marks, exclamations and periods
2- Capitalize first letter of a sentence.
3- Split sentences not only with "." but also with "?" and "!"
4- Puts a white space at the end of each sentence
5- Retains newlines
--removed from orginal function--
undestand the meaning of "¡" and "¿" in languages like spanish.
undestand the htmlentitity version of this simbols.
--removed from orginal function--
<?php
function sentenceNormalizer($sentence_split) {
$sentence_split = preg_replace(array('/[!]+/','/[?]+/','/[.]+/'),
array('!','?','.'),$sentence_split);
$textbad = preg_split("/(\!|\.|\?|\n)/", $sentence_split,-1,PREG_SPLIT_DELIM_CAPTURE);
$newtext = array();
$count = sizeof($textbad);
foreach($textbad as $key => $string) {
if (!empty($string)) {
$text = trim($string, ' ');
$size = strlen($text);
if ($size > 1){
$newtext[] = ucfirst(strtolower($text));
}
elseif ($size == 1) {
$newtext[] = ($text == "\n") ? $text : $text . ' ';
}
}
}
return implode($newtext);
}
?>