EOF Is and Is Not a Character

What the HECK is EOF?

It stands for End of File, but why is the following statement inaccurate:

This C program uses the fgetc function to read from standard input until the EOF character is read.

but the following statement correct:

This C program uses the fgetc function to read from standard input until the user enters the EOF character into terminal input.

What gives?

Inside You There Are Three EOFs

In the context of a C program running on POSIX-compliant systems, there are actually three concepts represented by the term EOF:

  1. Generally, EOF is the end-of-file condition, reached when there is no more input to be read from a source.

  2. In the C standard library, EOF is a macro representing a sentinel returned by the function fgetc and friends.

  3. In the POSIX specification for terminal interfaces, EOF is a special character that depending on context, result in a special function occuring.

It is in the latter two concepts that EOF can be thought of as not a character and as a character.

EOF Is Not a Character

I asked the friendly resident slop generator the following:

Open "hello.txt" and print out its contents character by character using fgetc in C.

And it graciously replied:

At first glance, this looks correct (it always does). If we compile and run the program, everything seems fine!

$ printf "Hello world!" > hello.txt && clang -O0 -Wall -Wextra -o eof eof.c && ./eof
Hello world!

However, tweak the example a bit:

$ printf "Hello \377 world!" > hello.txt && clang -O0 -Wall -Wextra -o eof eof.c && ./eof
Hello

where did world! go? What's the \377 doing there?

Yep! In fact, we can see why if we add the flag -Wconversion

when compiling eof.c[1]:

  1. It's a great shame -Wconversion is not included by default in -Wall or even -Wextra. It is included in -Weverything, but that turns on all diagnostics. My recommendation is to use -Weverything and turn off diagnostics you don't need using -Wno-*.

$ clang -O0 -Wconversion -Wall -Wextra -o eof eof.c
eof.c:15:18: warning: implicit conversion loses integer precision: 'int' to 'char' [-Wimplicit-int-conversion]
   15 |     while ((ch = fgetc(file)) != EOF) {
      |                ~ ^~~~~~~~~~~
1 warning generated.

Looking at the manual for fgetc:

fgetc(3)                                                Library Functions Manual                                                fgetc(3)

NAME
       fgetc, fgets, getc, getchar, ungetc - input of characters and strings

LIBRARY
       Standard C library (libc, -lc)

SYNOPSIS
       #include <stdio.h>

       int fgetc(FILE *stream);
       int getc(FILE *stream);
       int getchar(void);

       char *fgets(char s[restrict .size], int size, FILE *restrict stream);

       int ungetc(int c, FILE *stream);

DESCRIPTION
       fgetc() reads the next character from stream and returns it as an unsigned char cast to an int, or EOF on end of file or error.

       getc() is equivalent to fgetc() except that it may be implemented as a macro which evaluates stream more than once.

       getchar() is equivalent to getc(stdin).

       fgets()  reads  in  at  most  one less than size characters from stream and stores them into the buffer pointed to by s.  Reading
       stops after an EOF or a newline.  If a newline is read, it is stored into the buffer.  A terminating null byte ('\0')  is  stored
       after the last character in the buffer.

       ungetc() pushes c back to stream, cast to unsigned char, where it is available for subsequent read operations.  Pushed-back char‐
       acters will be returned in reverse order; only one pushback is guaranteed.

       Calls to the functions described here can be mixed with each other and with calls to other input functions from the stdio library
       for the same input stream.

       For nonlocking counterparts, see unlocked_stdio(3).

RETURN VALUE
       fgetc(), getc(), and getchar() return the character read as an unsigned char cast to an int or EOF on end of file or error.

       fgets() returns s on success, and NULL on error or when end of file occurs while no characters have been read.

       ungetc() returns c on success, or EOF on error.

ATTRIBUTES
       For an explanation of the terms used in this section, see attributes(7).
       ┌─────────────────────────────────────────────────────────────────────────────────────────────────────┬───────────────┬─────────┐
       │ Interface                                                                                           │ Attribute     │ Value   │
       ├─────────────────────────────────────────────────────────────────────────────────────────────────────┼───────────────┼─────────┤
       │ fgetc(), fgets(), getc(), getchar(), ungetc()                                                       │ Thread safety │ MT-Safe │
       └─────────────────────────────────────────────────────────────────────────────────────────────────────┴───────────────┴─────────┘

STANDARDS
       C11, POSIX.1-2008.

HISTORY
       POSIX.1-2001, C89.

NOTES
       It  is not advisable to mix calls to input functions from the stdio library with low-level calls to read(2) for the file descrip‐
       tor associated with the input stream; the results will be undefined and very probably not what you want.

SEE ALSO
       read(2), write(2), ferror(3), fgetwc(3), fgetws(3), fopen(3), fread(3),  fseek(3),  getline(3),  gets(3),  getwchar(3),  puts(3),
       scanf(3), ungetwc(3), unlocked_stdio(3), feature_test_macros(7)

Linux man-pages 6.10                                           2024-07-23                                                       fgetc(3)

We can see the following:

  1. The fgetc function returns an integer int, not a character char.

  2. Specifically, it returns either the unsigned char read from the stream cast into an int, or the value EOF.

  3. It returns EOF on end-of-file and on error.

What is EOF then? Looking at the glibc source code, we can see the location in stdio.h in which the macro is defined:

/* The value returned by fgetc and similar functions to indicate the
   end of the file.  */
#define EOF (-1)

EOF is a special value returned at the end of file or on error. It is not a character read from the stream, which makes sense because otherwise it would have to be in the contents of the stream.

Bingpot. Therefore, EOF is not a character.

An Aside on C APIs

So fgetc in the success case returns a cast unsigned char, with 255 possible values. However, to accommodate the error case (of which you need 1 value), the function returns an int, with 4,294,967,296 possible values. This problem is also incredibly easy to overlook for beginners, as the chances of

ÿ

(the 255th character in UTF-8 encoding) appearing in their sample text is low.

It is rather unfortunate that fgetc does not look like this:

int fgetc(FILE *stream, unsigned char *out);

returning 0 on success, -1 on failure and EOF, and placing the read character into the out-parameter out like some other C API.

It does not solve the fundamental problem of using an int with 4,294,967,296 possible values to store nowhere near that much information, but it at the very least more strongly hints at the fact that there are special cases other than the case of a successfully read character.

This bug is so common that it's the first entry in the Stdio section on the comp.lang.c Frequently Asked Questions website.

EOF Is a Character

If we run the following code:

#include <stdio.h>
#include <stdlib.h>

int main(void) {
    FILE* fp = stdin;

    int c;
    while ((c = fgetc(fp)) != EOF) {
        putchar(c);
    }

    if (ferror(fp)) {
        fprintf(stderr, "An error occurred while reading the file\n");
        return EXIT_FAILURE;
    }

    return EXIT_SUCCESS;
}

We observe the following:

$ clang -O0 -Wconversion -Wall -Wextra -o eof eof.c

$ ./eof
hellooooooooooo????????

Notably, nothing happens! We have to enter a newline before the program receives what we just typed (and then echoes it out):

$ ./eof
hello world!!
hello world!!
hi!
hi!

That's because the terminal by default is operating in canonical mode or cooked mode. The mode is described in the POSIX specification for General Terminal Interface and in the manual for termios(3).

When the terminal is in this mode (determined by whether or not the ICANON flag in c_lflag is set, input is made available to the program line by line. We can observe the flag by running stty (highlight mine):

$ stty --all
speed 38400 baud; rows 57; columns 137; line = 0;
intr = ^C; quit = ^\; erase = ^?; kill = ^U; eof = ^D; eol = <undef>; eol2 = <undef>; swtch = <undef>; start = ^Q; stop = ^S; susp = ^Z;
rprnt = ^R; werase = ^W; lnext = ^V; discard = ^O; min = 1; time = 0;
-parenb -parodd -cmspar cs8 -hupcl -cstopb cread -clocal -crtscts
-ignbrk -brkint -ignpar -parmrk -inpck -istrip -inlcr -igncr icrnl -ixon -ixoff -iuclc -ixany -imaxbel iutf8
opost -olcuc -ocrnl onlcr -onocr -onlret -ofill -ofdel nl0 cr0 tab0 bs0 vt0 ff0
isig icanon iexten echo echoe echok -echonl -noflsh -xcase -tostop -echoprt echoctl echoke -flusho -extproc

If we run our eof program again with ICANON unset, we expect all input to be made available immediately to the program, so keypresses should be echoed immediately after they are entered:

$ stty -icanon && ./eof
hheelllloo  wwoorrlldd!!!!

Looking at the output of stty again, we see a familiar term:

$ stty --all
speed 38400 baud; rows 57; columns 137; line = 0;
intr = ^C; quit = ^\; erase = ^?; kill = ^U; eof = ^D; eol = <undef>; eol2 = <undef>; swtch = <undef>; start = ^Q; stop = ^S; susp = ^Z;
rprnt = ^R; werase = ^W; lnext = ^V; discard = ^O; min = 1; time = 0;
-parenb -parodd -cmspar cs8 -hupcl -cstopb cread -clocal -crtscts
-ignbrk -brkint -ignpar -parmrk -inpck -istrip -inlcr -igncr icrnl -ixon -ixoff -iuclc -ixany -imaxbel iutf8
opost -olcuc -ocrnl onlcr -onocr -onlret -ofill -ofdel nl0 cr0 tab0 bs0 vt0 ff0
isig icanon iexten echo echoe echok -echonl -noflsh -xcase -tostop -echoprt echoctl echoke -flusho -extproc

Let's try it out!

$ ./eof
hello world!hello world!oh??oh??

Yep! Reading the relevant docs:

  • POSIX:

    Special character on input, which is recognized if the ICANON flag is set. When received, all the bytes waiting to be read are immediately passed to the process without waiting for a <newline>, and the EOF is discarded. Thus, if there are no bytes waiting (that is, the EOF occurred at the beginning of a line), a byte count of zero shall be returned from the read(), representing an end-of-file indication. If ICANON is set, the EOF character shall be discarded when processed.

  • termios(3):

    ...this character causes the pending tty buffer to be sent to the waiting user program without waiting for end-of-line. If it is the first character of the line, the read(2) in the user program returns 0, which signifies end-of-file. Recognized when ICANON is set, and then not passed as input.

By default, ^D is bound to send a special character that informs the terminal of EOF. The terminal will then either flush any existing input, or return 0 to the read(2) call expecting input from the terminal.

You got it duck.

$ stty eof '^N' && ./eof
hello world!!hello world!!yayy!!!!yayy!!!!

The example rebinds the EOF character from CTRL+D to CTRL+N. In this scenario, EOF is a character.

EOF Recap

To recap the previous section, here is an example showing both all three EOFs in a step-by-step demo:

Demo

  1. The program is blocked waiting for input from the terminal.

  2. The input being typed is buffered by the terminal due to it being in canonical/cooked mode.

  3. The terminal receives the EOF character when CTRL+D is pressed. There is buffered input, so the terminal passes it to the program, which prints it out in a loop.

  4. After printing out the received input, the program once again blocks at the fgetc call.

  5. The terminal receives the EOF character when CTRL+D is pressed again. The terminal has no buffered input, so it returns a byte count of 0 to the program. The fgetc function interprets this as the EOF condition being reached, and returns the EOF macro to the caller. The program breaks the loop, and the program ends.

  1. The program is blocked waiting for input from the terminal.

  2. The input being typing is buffered by the terminal due to it being in canonical/cooked mode.

  3. The terminal receives the EOF character when CTRL+D is pressed. There is buffered input, so the terminal passes it to the program, which prints it out in a loop.

  4. After printing out the received input, the program once again blocks at the fgetc call.

  5. The terminal receives the EOF character when CTRL+D is pressed again. The terminal has no buffered input, so it returns a byte count of 0 to the program. The fgetc function interprets this as the EOF condition being reached, and returns the EOF macro to the caller. The program breaks the loop, and the program ends.

Code

#include <stdio.h>
#include <unistd.h>

int main(void) {
    FILE* fp = stdin;

    while (1) {
        int c = fgetc(fp);

        if (c == EOF) {
            break;
        }

        // a. Print to standard error so there is no output buffering
        // b. Wait 400ms after printing each character for demo purposes
        putc(c, stderr);
        usleep(400 * 1000);
    }

    return 0;
}

Conclusion

Here are a few facts that hopefully make sense by now!

  1. EOF is not a character in the sense that there exists some byte(s) at the end of the content of regular files to mark their end.

  2. EOF is not a character in the sense that a function like fgetc returns the "EOF character" upon reaching the end of a file.

  3. EOF is a special character that can be used to tell a terminal to signal the end-of-file condition to any programs reading from it.