Recently I had to write search engine in hebrew and ran into huge amount of problems. My data was stored in MySQL table with utf8_bin encoding.
So, to be able to write hebrew in utf8 table you need to do
<?php
$prepared_text = addslashes(urf8_encode($text));
?>
But then I had to find if some word exists in stored text. This is the place I got stuck. Simple preg_match would not find text since hebrew doesnt work that easy. I've tried with /u and who kows what else.
Solution was somewhat logical and simple...
<?php
$db_text = bin2hex(stripslashes(utf8_decode($db_text)));
$word = bin2hex($word);
$found = preg_match_all("/($word)+/i", $db_text, $matches);
?>
I've used preg_match_all since it returns number of occurences. So I could sort search results acording to that.
Hope someone finds this useful!