After significant encoding handling changes in GDAL>=1.9 it became quite challenging to handle non-Latin attributes (Cyrillic in particular) stored in .dbf part of a shp-file. So the majority of the Cyrillic and other non-Latin users have to stick with GDAl 1.8.1 for now.
The cause of the issue is described here (Rus). Briefly: GDAL>=1.9 attemts to re-encode the .dbf attributes to UTF-8 on the basis of the LDID (Language Driver ID) written in .dbf header. But unfortunately LDID is usually missing, and in particular QGIS does not write it to the .dbf-file it creates. In case when LDID is missing, GDAL>=1.9 assumes that encoding of the .dbf-file is ISO8859_1 (Latin-1) which makes non-Latin characters unreadable.
The workaround I’m currently using is creating additional .cpg-file, that contains the ID of the encoding used. For example if encoding is Windows-1251, .cpg-file contains following record: “1251” (without quotes). When .cpg-file is present, GDAL>=1.9 + QGIS works just fine.
————————————————————————-
UPD: on some OS you will need to use ID from Additional ID column instead of Encoding ID column.
For Windows you may also try to use “unofficial” version of QGIS from here (with encoding issue solved). But it is possible that its installer is in Russian.
UPD2: There is another workaround. You can open .dbf-file in Libre Office Calc (Open Office Calc) providing encoding needed and save it from there. This will write necessary header to .dbf-file and QGIS will open attributes correctly. Note that this also will make fields names written in upper case.
UPD3: there is a plugin for encoding fixing available.
————————————————————————-
Here you are a table of the encoding IDs (taken from here):
Encoding ID | Encodind name | Additional ID | Other names |
1252 | Western | iso-8859-1 except when 128-159 is used, use “Windows-1252” |
iso8859-1, iso_8859-1, iso-8859-1, ANSI_X3.4-1968, iso-ir-6, ANSI_X3.4-1986, ISO_646, irv:1991, ISO646-US, us, IBM367, cp367, csASCII, latin1, iso_8859-1:1987, iso-ir-100, ibm819, cp819, Windows-1252 |
20105 | us-ascii | us-acii, ascii | |
28592 | Central European (ISO) | iso-8859-2 | iso8859-2, iso-8859-2, iso_8859-2, latin2, iso_8859-2:1987, iso-ir-101, l2, csISOLatin2 |
1250 | Central European (Windows) | Windows-1250 | Windows-1250, x-cp1250 |
1251 | Cyrillic (Windows) | Windows-1251 | Windows-1251, x-cp1251 |
1253 | Greek (Windows) | Windows-1253 | Windows-1253 |
1254 | Turkish (Windows) | Windows-1254 | Windows-1254 |
932 | Japanese (Shift-JIS) | shift_jis | shift_jis, x-sjis, ms_Kanji, csShiftJIS, x-ms-cp932 |
51932 | Japanese (EUC) | x-euc-jp | Extended_UNIX_Code_Packed_Format_for_Japanese, csEUCPkdFmtJapanese, x-euc-jp, x-euc |
50220 | Japanese (JIS) | iso-2022-jp | csISO2022JP, iso-2022-jp |
1257 | Baltic (Windows) | Windows-1257 | windows-1257 |
950 | Traditional Chinese (BIG5) | big5 | big5, csbig5, x-x-big5 |
936 | Simplified Chinese (GB2312) | gb2312 | GB_2312-80, iso-ir-58, chinese, csISO58GB231280, csGB2312, gb2312 |
20866 | Cyrillic (KOI8-R) | koi8-r | csKOI8R, koi8-r |
949 | Korean (KSC5601) | ks_c_5601 | ks_c_5601, ks_c_5601-1987, korean, csKSC56011987 |
1255 (logical) | Hebrew (ISO-logical) | Windows-1255 | iso-8859-8i |
1255 (visual) | Hebrew (ISO-Visual) | iso-8859-8 | ISO-8859-8 Visual, ISO-8859-8 , ISO_8859-8, visual |
862 | Hebrew (DOS) | dos-862 | dos-862 |
1256 | Arabic (Windows) | Windows-1256 | Windows-1256 |
720 | Arabic (DOS) | dos-720 | dos-720 |
874 | Thai | Windows-874 | Windows-874 |
1258 | Vietnamese | Windows-1258 | Windows-1258 |
65001 | Unicode UTF-8 | UTF-8 | UTF-8, unicode-1-1-utf-8, unicode-2-0-utf-8 |
65000 | Unicode UTF-7 | UNICODE-1-1-UTF-7 | utf-7, UNICODE-1-1-UTF-7, csUnicode11UTF7, utf-7 |
50225 | Korean (ISO) | ISO-2022-KR | ISO-2022-KR, csISO2022KR |
52936 | Simplified Chinese (HZ) | HZ-GB-2312 | HZ-GB-2312 |
28594 | Baltic (ISO) | iso-8869-4 | ISO_8859-4:1988, iso-ir-110, ISO_8859-4, ISO-8859-4, latin4, l4, csISOLatin4 |
28585 | Cyrillic (ISO) | iso_8859-5 | ISO_8859-5:1988, iso-ir-144, ISO_8859-5, ISO-8859-5, cyrillic, csISOLatinCyrillic, csISOLatin5 |
28597 | Greek (ISO) | iso-8859-7 | ISO_8859-7:1987, iso-ir-126, ISO_8859-7, ISO-8859-7, ELOT_928, ECMA-118, greek, greek8, csISOLatinGreek |
28599 | Turkish (ISO) | iso-8859-9 | ISO_8859-9:1989, iso-ir-148, ISO_8859-9, ISO-8859-9, latin5, l5, csISOLatin5 |
Hi Rebelious,
And where we put the .cpg-file?
Thanks!
Best regards
Hi! Put it alongside the corresponding shp-file: e.g. my_file.shp and my_file.cpg must be in the same folder.
AFAIK it may not work in Windows.
Works also on Windows without any problems (tested).
No, it doesn’t work (windows 7 64 bit, QGIS 1.8). Putting a .cpg file with the same name and the code 1250 changes nothing. That makes me use QGIS 1.7.
Thanks anyway!
Actually it is more GDAL than QGIS issue so you need get back not only to QGIS 1.7, but to GDAL 1.8.1 too. You may try to download “unofficial” version of QGIS (with encoding issue solved) for Windows from here [http://nextgis.ru/nextgis-qgis/], but it is possible that the installer is in Russian only.
Great! Russian installer is not a problem :-).
NextGIS works like a charm! Only one plugin had a problem (contour plugin 0.91).
Great that it worked for you!)))
[…] QGIS and GDAL>=1.9 Encoding Issue: a Workaround […]
this worked for me, but not with the encoding ID, but with the third column. In the config file I just inserted UTF-8, and ever since everything works as desired! thanks!
There is also a plugin for setting cpg files (and more) available: http://plugins.qgis.org/plugins/shapefile_encoding_fixer/
Thanks for the information!