Archicad C++ API
About Archicad add-on development using the C++ API.

[newbie] opening a textfile and reading it line by line

Anonymous
Not applicable
I tried to use the example from the documentation:
#include "DG.h"    // brings file selection dialog (and also includes Location)

IO::Location loc;

if (!DGGetOpenFile (&loc)) {    // returns the selected location in loc
    // no file (folder, link) location was selected
}
but I just couldn't get it to work. I even searched through all the documentation without finding any other reference to DGGetOpenFile. So I gave it up.

Then I tried:
	IO::Location fileLoc;    // Location instance
	IO::File file (fileLoc);
	char buffer[128];

DG::FileDialog dlg (DG::FileDialog::OpenMultiFile);

    if (!dlg.Invoke ())
        return false;

    int count = dlg.GetSelectionCount ();

    for (int n = 0; n < count; n++) {
        IO::File file (dlg.GetSelectedFile (n));
	 errorCode = file.Open (IO::File::ReadMode);    // opening the file in read-only mode
	 errorCode = file.ReadBin(buffer, 128);

 DGAlert(DG_INFORMATION, "Inside while", buffer, 0, "OK", "Cancel", 0);
       
    }

This lets me open my file from a filedialog. One problem though, is that I can't open it as myFile.sos, which is what it is, I have to rename it to myFile.txt. Why is that, and what can I do with it?

The other thing is that I can't read one line at the time. How do I do that?
Yet another thing is that, if I can't read one line at the time, how do I read the whole file at once. And then the stringmanager doesn't seem too equiped whith functions to parse the text and split it up in sutable chunks. Can I use standard c++ functions and includes as well as the ones that comes with the API?
--
Regards,
Tor Jørgen
15 REPLIES 15
Oleg
Expert
vannassen wrote:
I don't now what kind of encoding it has, but one of the first lines in the file says characterset DOSN8, which I'm guessing is some sort of DOS textfile with norwegian characters in it - in other words it should include Æ Ø Å, but I'm getting stuff like ù è when I view it in TextEdit here on my Mac.
Hmm, yes I think DOSN8 is a 8 bit encoding. I was mistaken the strange output example:
H??†
O??†

Perhaps you need to convert some characters from DOSN8 to somesing other like ISO8859-10 ( http://www.statkart.no/standard/sosi/html_32/del1_2/del1_2.htm )
My question then is what kind of math do I have to perform to be able to copy the just the first line into another charbuffer?
It seem I am not quite well understoood your issue.
May be:

long lineLength=linePtr-buff;
char* lineBuffer=new char[lineLength+1];
BNCopyMemory(lineBuffer,buff,lineLength);
lineBuffer[lineLength]=0;
// using the lineBuffer
delete lineBuffer;

But much better, instead:
std::string line(buff,linePtr);
Anonymous
Not applicable
Thanks Oleg, you made my day

And I must say I'm impressed with you finding your way around norwegian documents
--
Regards,
Tor Jørgen
Ralph Wessel
Mentor
Tor wrote:
Then I'm searching for the first occurence of "\r" like this
		GSPtr linePtr = CHSearchSubstring("\r", buff, fLength);	// pointing to the first cr 

Then I'm left with a ptr to the beginning of the text in buff and a ptr to the first cr in linePtr.
The length of buff is 22261 and the length of linePtr is 0, which is not quite what I wanted - but I guess it makes sense since it just points to the position of cr.
My question then is what kind of math do I have to perform to be able to copy the just the first line into another charbuffer?
Are you saying that after calling CHSearchSubstring, the return result in linePtr is 0? If so, it is indicating to you that the string was not found, i.e. it did not find a CR in the data buffer. You should certainly check for a null result anyway, because any to attempt to use the value would be invalid.

However, if ArchiCAD has a problem with your string encoding (which it certainly seems to), the String Manager functions won't work. I wouldn't be surprised if it isn't even attempting to find a CR beyond the first character. You could try setting different default encodings for the String Manager to see if it makes a difference, e.g.:
CHSetDefaultCharCode(CC_WestEuropean);
Perhaps we could speed this up if I could take a look at this file. Can it be downloaded from somewhere? Otherwise you could send it to the address on Encina's contact page.
Ralph Wessel BArch
Software Engineer Speckle Systems
Anonymous
Not applicable
Ralph wrote:
Are you saying that after calling CHSearchSubstring, the return result in linePtr is 0?

What I was trying to say is: I guess the result in linePtr is the address to the first occurence of the search string. When I write the linePtr to the alert window it shows whats left of the textfile starting from the position after the search string. But when I try to figure out how long this piece of information is by using
BMGetPtrSize(linePtr)

I get the result 0.

The second line in the file says ..TEGNSETT DOSN8, which means characterset MS-Dos Norwegian 8-bits. If I'm to speculate, I would guess the reason for getting the strange .??† result when trying to display the first character, which is a ., is that the character in it self only occupies the first 8 bits of the 32 bit byte.

I would be happy to send you the file - coming soon to a mailbox near you...

--
Regards,
Tor Jørgen
Ralph Wessel
Mentor
Tor wrote:
When I write the linePtr to the alert window it shows whats left of the textfile starting from the position after the search string. But when I try to figure out how long this piece of information is by using
BMGetPtrSize(linePtr)

I get the result 0.
OK, I understand. This won't work because BMGetPtrSize will only tell you the size of a block of memory allocated as a Ptr, i.e. you can't ask it the size of an arbitrary address. A Ptr is not a string - it is an allocated block of memory which could contain anything.

As Oleg said, you can get the number of bytes between two addresses by using simple subtraction.
Tor wrote:
The second line in the file says ..TEGNSETT DOSN8, which means characterset MS-Dos Norwegian 8-bits. If I'm to speculate, I would guess the reason for getting the strange .??† result when trying to display the first character, which is a ., is that the character in it self only occupies the first 8 bits of the 32 bit byte.
Your assessment of the encoding type name sounds logical - hopefully it is an 8-bit encoding, because you can then use standard C/C++ string handling. BTW, a byte is an 8-bit value; a 32 bit value is often referred to as a long integer.
Tor wrote:
I would be happy to send you the file - coming soon to a mailbox near you...
Got it - I'll let you now once I've had a chance to look into it.
Ralph Wessel BArch
Software Engineer Speckle Systems
Ralph Wessel
Mentor
Tor wrote:
I would be happy to send you the file - coming soon to a mailbox near you...
I took a look at the file, and it does indeed seem to be an 8-bit encoding. I was able to read and parse the file using the String Manager functions with the default character code set to 'CC_WestEuropean'. Use 'CHSetDefaultCharCode' to change the default if this is not your system default. Let me know if this improves the situation.
Ralph Wessel BArch
Software Engineer Speckle Systems