dcsimg
Results 1 to 14 of 14

Thread: fopen's "r" vs "rb".

  1. #1

    Thread Starter
    Lively Member Sebouh's Avatar
    Join Date
    Jan 2005
    Posts
    73

    fopen's "r" vs "rb".

    Well, i didn't get this. It has something to do with \r\n and \n in different OSs. But if i have a txt or dat file that has \r\n or \n or \r and i am reading token by token or char by char....what is the effect?
    thanks.

  2. #2
    Smitten by reality Harsh Gupta's Avatar
    Join Date
    Feb 2005
    Posts
    2,938

    Re: fopen's "r" vs "rb".

    let me try to explain it from scratch.

    1) when you use "r" - opening in normal text format

    there is a difference in the way C and other (DOS in my example) represents the End-of-LINE (EOL). In C, the EOL is signalled by a single character, the \n the Newlinefeed. but in DOS, the EOL is signalled by 2 characters, combination of \r-Carriage Return and \n-Newlinefeed.

    so if you write a C program to count total number of characters in a file and say, for example, result is 100 (for 10 lines in a file), so when you run it in C, the result will be 100 but when you run the program form DOS, the result will be 110 (10 \r for 10 lines) because it will convert all the \n to \r\n.

    2) when using "rb" - opening file in Binary format

    but this is the not the case with Binary mode. when reading from disk, it will not convert the \r\ns to \ns. here if you run the program through C IDE, the result will be 110 as shown by DOS also and not 100 because of additional \rs.

    hope it helps you.

    Harsh
    Last edited by Harsh Gupta; Mar 26th, 2006 at 11:39 AM.
    Show Appreciation. Rate Posts.

  3. #3

    Thread Starter
    Lively Member Sebouh's Avatar
    Join Date
    Jan 2005
    Posts
    73

    Re: fopen's "r" vs "rb".

    ok, as i understand "rb" is used to make running in C similar to running in DOS. But, this is the text in a tutorial that tries to explain ..... does it say the same thing?
    Code:
    Still, it would be interesting to verify that we are getting the right count for a given file. Well thats easy. We count the characters with our program, and then we use the DIR directive of windows to verify that we get the right count.
    H:\lcc\examples>countchars countchars.c
    466
    H:\lcc\examples>dir countchars.c
    07/01/00 11:31p 492 countchars.c
    1 File(s) 492 bytes
    Wow, we are missing 492-466 = 26 chars!
    Why?
    We read again the specifications of the fopen function. It says that we should use it in read mode with r or in binary mode with rb. This means that when we open a file in read mode, it will translate the sequences of characters \r (return) and \n (new line) into ONE character. When we open a file to count all characters in it, we should count the return characters too.
    This has historical reasons. The C language originated in a system called UNIX, actually, the whole language was developed to be able to write the UNIX system in a convenient way. In that system, lines are separated by only ONE character, the new line character.
    When the MSDOS system was developed, dozens of years later than UNIX, people decided to separate the text lines with two characters, the carriage return, and the new line character. This provoked many problems with software that expected only ONE char as line separator. To avoid this problem the MSDOS people decided to provide a compatibility option for that case: fopen would by default open text files in text mode, i.e. would translate sequences of \r\n into \n, skipping the \r.
    Conclusion:
    Instead of opening the file with fopen(argv[1], r); we use fopen(argv[1],rb);, i.e. we force NO translation. We recompile, relink and we obtain:
    H:\lcc\examples> countchars countchars.c
    493
    H:\lcc\examples> dir countchars.c
    07/01/00 11:50p 493 countchars.c
    1 File(s) 493 bytes
    Yes, 493 bytes instead of 492 before, since we have added a b to the arguments of fopen!

  4. #4
    Smitten by reality Harsh Gupta's Avatar
    Join Date
    Feb 2005
    Posts
    2,938

    Re: fopen's "r" vs "rb".

    well, i could not understand your question properly, but yes, it says the same thing what i posted before. also take a look at this:
    Code:
    Streams
    
    Even though different devices are involved (terminals, disk drives, etc), the buffered file system transforms each into a logical device called a stream. Because streams are device-independent, the same function can write to a disk file or to another device, such as a console. There are two types of streams, ie:
    
        * Text Streams. A text stream is a sequence of characters. In a text stream, certain character translations may occur (eg, a newline may be converted to a carriage return/line-feed pair). This means that there may not be a one-to-one relationship between the characters written and those in the external device.
        * Binary Streams. A binary stream is a sequence of bytes with a one-to-one correspondence to those on the external device (ie, no translations occur). The number of bytes written or read is the same as the number on the external device. (However, an implementation-defined number of bytes may be appended to a binary stream (eg, to pad the information so that it fills a sector on a disk).)
    so i really dont know if binary mode was included with an intention to make it compatible with OSs, but i know one more thing about Binary mode that:

    --> in text mode, it stores the text and numbers or any character as one character per byte. numbers like Int and long will not occupy the same amount (they will not occupy 4 bytes). example if you have a number 123456 in text file then it will occupy 6 bytes (one byte for one character).

    but this is not the case with binary mode. it stores each number as same number of bytes on disk as it occupies in memory. so 123456 will occupy 4 bytes.
    The number of bytes written or read is the same as the number on the external device.
    so, the bottomline is: if you write a file in binary mode then use that file to read it in binary mode only.

    --> also, in text mode, it inserts a special character 26 at the end of the file as a mark. so when it encounters this special character in the file, it assumes it as the EOF and will stop looking ahead. but there is no such thing with binary mode. this is another reason why read-write mode should be taken care of.

    binary mode was not only included with a view to make it compatible with devices but there are 2 more reasons.

    Harsh
    Show Appreciation. Rate Posts.

  5. #5

    Thread Starter
    Lively Member Sebouh's Avatar
    Join Date
    Jan 2005
    Posts
    73

    Re: fopen's "r" vs "rb".

    ok, just to recap...when i am reading character by character 123456 will tell me 6 characters but in the disk they will occupy 4 / 6 bytes? Meaning the binary mode will know when a token is a string, a float, a double, an int, a long...?

    And for the \r\n (which is normally the case in windows for NEWLINEs)....in text mode this is translated to \n, so if i only have a NEWLINE in my txt file, opening in "r" mode will count as 1 character and in "rb" mode two?
    Sorry if i am being a pain but i just want to clear this!
    Last edited by Sebouh; Mar 26th, 2006 at 04:45 AM.

  6. #6
    Frenzied Member aewarnick's Avatar
    Join Date
    Dec 2002
    Posts
    1,037

    Re: fopen's "r" vs "rb".

    Harsh, you made this statement:
    "the same is the case with Binary mode. it will convert the \ns to \r\ns"

    Which is wrong. I don't think you really meant to say that.

  7. #7
    Smitten by reality Harsh Gupta's Avatar
    Join Date
    Feb 2005
    Posts
    2,938

    Re: fopen's "r" vs "rb".

    Quote Originally Posted by aewarnick
    Harsh, you made this statement:
    "the same is the case with Binary mode. it will convert the \ns to \r\ns"

    Which is wrong. I don't think you really meant to say that.
    sorry, yes you are right. got bit sleepy at that point.

    i mean, when reading file from disk in binary mode, the \r\n combination is not converted to \n. that's why it shows some additional characters.

    Thanks for pointing it out.

    PS: edited post #2
    Show Appreciation. Rate Posts.

  8. #8
    Smitten by reality Harsh Gupta's Avatar
    Join Date
    Feb 2005
    Posts
    2,938

    Re: fopen's "r" vs "rb".

    Meaning the binary mode will know when a token is a string, a float, a double, an int, a long...?
    yes, but only if you have written a file in binary mode. i am not sure so please wait for others to comment, but you cannot expect a file written in text mode and being read in binary mode to determine the token type. please read any good tutorial on fread() and fwrite() functions to make it more clear. you may start from here.
    if i only have a NEWLINE in my txt file
    this sounds confusing to me. it's like representation of newlines. when you press enter in a text file, it is signalled as \n, but the same thing, when written to disk is signalled as \r\n.

    all right, let me try explaining one more time.

    when you WRITE a file in text mode, all the \n (newlines, meaning when you press ENTER after each line), all \n get converted to \r\n. when you READ a file in text mode from disk, all the Return-Newline \r\n i.e. representation of newlines in DOS\disk, get converted to C-style represenation of newlines i.e. \n.

    the case is not same as in Binary mode. when reading a file in binary mode, all the combinations of \r\n remains as it is.

    Harsh
    Show Appreciation. Rate Posts.

  9. #9
    Frenzied Member aewarnick's Avatar
    Join Date
    Dec 2002
    Posts
    1,037

    Re: fopen's "r" vs "rb".

    To make life simple, I NEVER use ascii mode, I always use binary both for reading and writing. Today's text editors read \r\n and \n the same, so there is no need to write the useless \r character. Windows Text editor still has a problem with it, but it's trash anyway.

  10. #10

    Thread Starter
    Lively Member Sebouh's Avatar
    Join Date
    Jan 2005
    Posts
    73

    Re: fopen's "r" vs "rb".

    OK, i finally got it..........but just one more thing....if my program reads character by character and i have in it, if character is \n or \r print a new line.When at end of line, this will cause two new lines to be printed instead of 1 if reading in binary mode right, since there is \r\n....2 characters?

  11. #11
    Frenzied Member aewarnick's Avatar
    Join Date
    Dec 2002
    Posts
    1,037

    Re: fopen's "r" vs "rb".

    When you read a file, you will want to read into a buffer of about 1024 at least. When you find a \r, search the next character for a \n, if there is one, put a new line down and increment a character. If you don't find a \n, just input a line. If the last character is a \r, you will need to rememer that for your next buffer read and search the next character for a \n. Or dispose of the buffer and read from the \r to start the next buffer read.

    It's really all very simple.

  12. #12

    Thread Starter
    Lively Member Sebouh's Avatar
    Join Date
    Jan 2005
    Posts
    73

    Re: fopen's "r" vs "rb".

    Quote Originally Posted by aewarnick
    If you don't find a \n, just input a line.
    What do you mean just input a line? Or can there be a situation where there only is a \r but no \n following it...i am not talking end of buffer, but in the middle of a sentence. What would \r alone mean?

  13. #13
    Frenzied Member aewarnick's Avatar
    Join Date
    Dec 2002
    Posts
    1,037

    Re: fopen's "r" vs "rb".

    Mac uses \r. Windows uses \r\n. Linux uses \n. I use \n. A code snippet from my aStr class:
    PHP Code:
    void ReplaceLineBreaks(aStr &newStrccharrepluint len0)
    {
        if(!
    lenlenstrlen(repl);
        
    newStr"";
        
    charpc= (char*)astr;
        for(
    uint i0size; ++i, ++pc)
        {
            if(*
    pc == '\r')
            {
                if(
    i+size && *(pc+1) == '\n') {++i; ++pc; }
                
    newStr.App(repllen);
            }
            if(*
    pc == '\n')
            {
                
    newStr.App(repllen);
            }
            else 
    newStr.App(*pc);
        }


  14. #14

    Thread Starter
    Lively Member Sebouh's Avatar
    Join Date
    Jan 2005
    Posts
    73

    Re: fopen's "r" vs "rb".

    Aha....
    Thanks for the explanation. You've been a great help!

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Featured


Click Here to Expand Forum to Full Width