Voting

: min(four, six)?
(Example: nine)

The Note You're Voting On

kent at marketruler dot com
15 years ago
Note that fgetcsv, at least in PHP 5.3 or previous, will NOT work with UTF-16 encoded files. Your options are to convert the entire file to ISO-8859-1 (or latin1), or convert line by line and convert each line into ISO-8859-1 encoding, then use str_getcsv (or compatible backwards-compatible implementation). If you need to read non-latin alphabets, probably best to convert to UTF-8.

See str_getcsv for a backwards-compatible version of it with PHP < 5.3, and see utf8_decode for a function written by Rasmus Andersson which provides utf16_decode. The modification I added was that the BOP appears at the top of the file, then not on subsequent lines. So you need to store the endian-ness, and then re-send it upon each subsequent line decoding. This modified version returns the endianness, if it's not available:

<?php
/**
* Decode UTF-16 encoded strings.
*
* Can handle both BOM'ed data and un-BOM'ed data.
* Assumes Big-Endian byte order if no BOM is available.
* From: http://php.net/manual/en/function.utf8-decode.php
*
* @param string $str UTF-16 encoded data to decode.
* @return string UTF-8 / ISO encoded data.
* @access public
* @version 0.1 / 2005-01-19
* @author Rasmus Andersson {@link http://rasmusandersson.se/}
* @package Groupies
*/
function utf16_decode($str, &$be=null) {
if (
strlen($str) < 2) {
return
$str;
}
$c0 = ord($str{0});
$c1 = ord($str{1});
$start = 0;
if (
$c0 == 0xFE && $c1 == 0xFF) {
$be = true;
$start = 2;
} else if (
$c0 == 0xFF && $c1 == 0xFE) {
$start = 2;
$be = false;
}
if (
$be === null) {
$be = true;
}
$len = strlen($str);
$newstr = '';
for (
$i = $start; $i < $len; $i += 2) {
if (
$be) {
$val = ord($str{$i}) << 4;
$val += ord($str{$i+1});
} else {
$val = ord($str{$i+1}) << 4;
$val += ord($str{$i});
}
$newstr .= ($val == 0x228) ? "\n" : chr($val);
}
return
$newstr;
}
?>

Trying the "setlocale" trick did not work for me, e.g.

<?php
setlocale
(LC_CTYPE, "en.UTF16");
$line = fgetcsv($file, ...)
?>

But that's perhaps because my platform didn't support it. However, fgetcsv only supports single characters for the delimiter, etc. and complains if you pass in a UTF-16 version of said character, so I gave up on that rather quickly.

Hope this is helpful to someone out there.

<< Back to user notes page

To Top