Character encoding cleanup in SZARP

In the past half-year we did some profound cleanup in character encoding in SZARP. Many users may consider this as negligible, but in fact it could be a bit of hassle – especially for Polish users. What’s the point? SZARP software has been developed continuously for over 20 years, so for sure one will find in it parts of code written a dozen or so years ago. In that time character encoding were not as straight forward as it is today. A series of standards, gathered in ISO/IEC 8859, were commonly used. Year 2006 brought a change – a character encoding UTF-8 (which implements Unicode standard) started gaining popularity and in a few years it has become the dominant for the World Wide Web. Again, what’s the point? SZARP is using UTF-8 for some time.

Well, not exactly. It appeared that in many parts of code Latin-2 (ISO/IEC 8859-2) were still considered as default or even hard-coded! (it seemed appropriate 10 years ago) It also applied to SZARP’s encoding conversion module and that’s why it bit from time to time. The issue had to be put straight.

Introduced changes did not take a lot of code-lines (not more than a few hundreds), but possibly may affect almost every part of SZARP. For that reason they had been carefully tested. Nevertheless a few bugs spilled out month or two later.

Today, after a half year of our effort and decent testing, we announce that SZARP is fully capable of UTF-8 character encoding standard.