Character encoding problem -> ARCHICAD export list in UTF-8 --> GDL DATA I/O read the list in ANSI

Anonymous · ‎2021-06-09

Dear friends of ARCHICAD and GDL,

I have modded the "Pen and Colors" object (contained in the german/austrian libary) so the object can also display the pen description and the thickness of pens in a graphic table. As there is no way to get the description and the thickness directly, I export the pen-set configuration as a list (csv/txt) out from attribute manager and import the values using DATA I/O in the gdl-object. So far everything works as it should...
BUT: in german language we have special characters like ä, ö, ü, ß...

ARCHICAD export the pen-set configuration encoded in UTF-8, but the DATA I/O gdl-addon reads the txt-files as encoded in ANSI (Windows 1252). So when the imported values are displayed with "funny" characters in the object instead of Äs, Üs, Ös...

At the moment I do a character encoding convertion using 'notepad++', but this is not useable for 'my users' (architects not developer

)
Is there a chance to define the character encoding the DATA I/O addon should use interpreting the list (txt/csv)?
Or is it possible to define the encoding for the exported txt-file in the attribute manager?

...or do you know another way to come to a right character display in the object?

I hope you can help me to solve this issue - and perhaps someone from GRAPHISOFT also reads this post and in the future ARCHICAD will use the same character encoding for read/write or import/export txt/ascii files...

thank you for your help,
best regards from vienna,
Yours, Klaus

Podolsky · ‎2021-06-09

I faced to similar problem when was using I/O Text Add-On for GDL for saving some data in Hebrew. When GSM files do support another languages (Hebrew, Russian), text files coming only in English. The reason, I think, because this add-on is very very old since time when different language support was a problem in many programs - like time of Windows 95 or similar. I was able to sort this out only by making special translator, where script detects Hebrew letters in given text and translates it into another symbol. For example א - ::A, ב - ::B etc. When script is reading encoded text file, it decoding it back to Hebrew. Another way to solve it I didn’t find.

Anonymous · ‎2021-06-09

Dear Podolsky,

thank you for your fast reply.

Please let me see, if I have understood you correctly....

1) you read/import the value from the txt-list file using TEXT I/O or DATA I/O

2) in the value of the new parameter/variable you search for "special characters" and change them to "ASCII" characters (ascii decimal code below '128') (i.e. for "Ä" change to "Ae") using a "string exchange operation" (I do not know the right command/syntax doing this - perhaps you can help me :roll)

3) in the "object-output" you use the "changed string", which only contain characters in "ASCII encoding below 128" which get the same character in (nearly) all character encodings...

...have I understood your solution right?

Thank you for your help,
Yours, Klaus

Podolsky · ‎2021-06-09

Yes, you understood me correct. The different is, that I'm using text file to first export to it data from GSM and then import into another GSM. For example I'm entering standard notes, that going to be shown in title blocks, and after title block objects read these notes and show through all project.

In your case you already have txt file, generated by another part of program (pen set names) and just want to read it. The only way how you can make it work - probably in notepad use search and replace and change all umlauts to something else.

I can share with you my scripts of coder-decoder for Hebrew. This is not exactly what you need, but maybe it will help you to better understand principle how to modify texts via GDL commands.

Laszlo Nagy · ‎2021-06-11

Guys, how about the XML Add-On? Can you use that for this purpose?
It is a much newer GDL Add-On so it might support UTF-8.

Loving Archicad since 1995 - Find Archicad Tips at x.com/laszlonagy
AMD Ryzen9 5900X CPU, 64 GB RAM 3600 MHz, Nvidia GTX 1060 6GB, 500 GB NVMe SSD
2x28" (2560x1440), Windows 10 PRO ENG, Ac20-Ac29

Peter Baksa · ‎2021-06-14

Hi catch17,

The opened file is read as UTF-8 only if it has a BOM (for backwards compatibility).
Maybe you could give your users a simple bat/command file that adds the BOM.

Péter Baksa
Software Engineer, Library
Graphisoft SE, Budapest

Anonymous · ‎2021-06-18

Thank you all for your replies!

well, I hve done a lot of testing, scripting and googeling 😉 in the meantime...

what I find out:

I need not change i.e. "ü" to "ue" - all works fine when I use the notepad++ 'menue-command': "Convert to ANSI" --> all the special characters are displayed correctly...

when I convert the file to "UTF-8 BOM", it will not work - I get the "funny chars" in the gdl-object...

I am not successful creating a batch-cmd-file to convert the txt-file, although I do lot of googeling, testing etc. and though I am not a "Newbe" scripting cmd/bat-files

...so at the moment I have only the choice "extra converting work in notepad++" or getting the "funny chars" in the object displayed...

perhaps some of you have a better idea than loosing this war against the char-encoding possibilities

I am really gald for every hint some of you can give me...

best regards from (really hot

) vienna,
Yours, Klaus

Dominic Wyss · ‎2021-06-18

Hi Klaus

For me "UTF8 with BOM" works just fine.
It is important that the text file is saved with the correct encoding. In my case I write to an existing well formated file.

VS Code: create a new file, click in the right lower corner on "UTF8". Then a dropdown appears. Select "UTF with BOM". Save. Then refer to this file in Archicad or load it into the embedded libraray.

HTH

AC27 CHE - macOS Ventura M1

runxel · ‎2021-06-18

catch17 wrote:
well, I hve done a lot of testing, scripting and googeling 😉 in the meantime...

First result on Google.

You can't do it with a BAT file. Take powershell instead.
This snippet indeed sets the BOM when converting from cp1252 zo utf-8.

Get-Content .\test.txt | Set-Content -Encoding utf8 test-utf8.txt

With a loop you can even convert a whole folder.

Greetings from hot Berlin to Vienna, Klaus 😉

Lucas Becker | AC 29 on Mac (Sequoia) | Graphisoft Insider Panelist | Akroter.io – high-end GDL objects | Author of Runxel's Archicad Wiki | Editor at SelfGDL | Developer of the GDL plugin for Sublime Text

My List of AC shortcomings & bugs | I Will Piledrive You If You Mention AI Again |

POSIWID – The Purpose Of a System Is What It Does /// «Furthermore, I consider that Carth... yearly releases must be destroyed»

Character encoding problem -> ARCHICAD export list in UTF-8 --> GDL DATA I/O read the list in ANSI

Didn't find the answer?

Check other topics in this Forum

Read the latest accepted solutions!

Start a new conversation!