What the HECK is EOF?
It stands for End of File, but why is the following statement inaccurate:
This C program uses the
fgetcfunction to read from standard input until the EOF character is read.
but the following statement correct:
This C program uses the
fgetcfunction to read from standard input until the user enters the EOF character into terminal input.
What gives?
Inside You There Are Three EOFs
In the context of a C program running on POSIX-compliant systems, there are actually three concepts represented by the term EOF:
Generally,
EOFis the end-of-file condition, reached when there is no more input to be read from a source.In the C standard library,
EOFis a macro representing a sentinel returned by the functionfgetcand friends.In the POSIX specification for terminal interfaces,
EOFis a special character that depending on context, result in a special function occuring.
It is in the latter two concepts that EOF can be thought of as not a character and as a character.
EOF Is Not a Character
I asked the friendly resident slop generator the following:
Open "hello.txt" and print out its contents character by character using fgetc in C.
And it graciously replied:
At first glance, this looks correct (it always does). If we compile and run the program, everything seems fine!
$ printf "Hello world!" > hello.txt && clang -O0 -Wall -Wextra -o eof eof.c && ./eof
Hello world!
However, tweak the example a bit:
$ printf "Hello \377 world!" > hello.txt && clang -O0 -Wall -Wextra -o eof eof.c && ./eof
Hello
where did world! go? What's the \377 doing there?
Yep! In fact, we can see why if we add the flag -Wconversion
when compiling eof.c[1]:
It's a great shame
-Wconversionis not included by default in-Wallor even-Wextra. It is included in-Weverything, but that turns on all diagnostics. My recommendation is to use-Weverythingand turn off diagnostics you don't need using-Wno-*.
$ clang -O0 -Wconversion -Wall -Wextra -o eof eof.c
eof.c:15:18: warning: implicit conversion loses integer precision: 'int' to 'char' [-Wimplicit-int-conversion]
15 | while ((ch = fgetc(file)) != EOF) {
| ~ ^~~~~~~~~~~
1 warning generated.
Looking at the manual for fgetc:
fgetc(3) Library Functions Manual fgetc(3)
NAME
fgetc, fgets, getc, getchar, ungetc - input of characters and strings
LIBRARY
Standard C library (libc, -lc)
SYNOPSIS
#include <stdio.h>
int fgetc(FILE *stream);
int getc(FILE *stream);
int getchar(void);
char *fgets(char s[restrict .size], int size, FILE *restrict stream);
int ungetc(int c, FILE *stream);
DESCRIPTION
fgetc() reads the next character from stream and returns it as an unsigned char cast to an int, or EOF on end of file or error.
getc() is equivalent to fgetc() except that it may be implemented as a macro which evaluates stream more than once.
getchar() is equivalent to getc(stdin).
fgets() reads in at most one less than size characters from stream and stores them into the buffer pointed to by s. Reading
stops after an EOF or a newline. If a newline is read, it is stored into the buffer. A terminating null byte ('\0') is stored
after the last character in the buffer.
ungetc() pushes c back to stream, cast to unsigned char, where it is available for subsequent read operations. Pushed-back char‐
acters will be returned in reverse order; only one pushback is guaranteed.
Calls to the functions described here can be mixed with each other and with calls to other input functions from the stdio library
for the same input stream.
For nonlocking counterparts, see unlocked_stdio(3).
RETURN VALUE
fgetc(), getc(), and getchar() return the character read as an unsigned char cast to an int or EOF on end of file or error.
fgets() returns s on success, and NULL on error or when end of file occurs while no characters have been read.
ungetc() returns c on success, or EOF on error.
ATTRIBUTES
For an explanation of the terms used in this section, see attributes(7).
┌─────────────────────────────────────────────────────────────────────────────────────────────────────┬───────────────┬─────────┐
│ Interface │ Attribute │ Value │
├─────────────────────────────────────────────────────────────────────────────────────────────────────┼───────────────┼─────────┤
│ fgetc(), fgets(), getc(), getchar(), ungetc() │ Thread safety │ MT-Safe │
└─────────────────────────────────────────────────────────────────────────────────────────────────────┴───────────────┴─────────┘
STANDARDS
C11, POSIX.1-2008.
HISTORY
POSIX.1-2001, C89.
NOTES
It is not advisable to mix calls to input functions from the stdio library with low-level calls to read(2) for the file descrip‐
tor associated with the input stream; the results will be undefined and very probably not what you want.
SEE ALSO
read(2), write(2), ferror(3), fgetwc(3), fgetws(3), fopen(3), fread(3), fseek(3), getline(3), gets(3), getwchar(3), puts(3),
scanf(3), ungetwc(3), unlocked_stdio(3), feature_test_macros(7)
Linux man-pages 6.10 2024-07-23 fgetc(3)
We can see the following:
The
fgetcfunction returns an integerint, not a characterchar.Specifically, it returns either the
unsigned charread from the stream cast into anint, or the valueEOF.It returns
EOFon end-of-file and on error.
What is EOF then? Looking at the glibc source code, we can see the location in stdio.h in which the macro is defined:
/* The value returned by fgetc and similar functions to indicate the
end of the file. */
#define EOF (-1)
EOF is a special value returned at the end of file or on error. It is not a character read from the stream, which makes sense because otherwise it would have to be in the contents of the stream.
Bingpot. Therefore, EOF is not a character.
An Aside on C APIs
So fgetc in the success case returns a cast unsigned char, with 255 possible values. However, to accommodate the error case (of which you need 1 value), the function returns an int, with 4,294,967,296 possible values. This problem is also incredibly easy to overlook for beginners, as the chances of
(the 255th character in UTF-8 encoding) appearing in their sample text is low.
It is rather unfortunate that fgetc does not look like this:
int fgetc(FILE *stream, unsigned char *out);
returning 0 on success, -1 on failure and EOF, and placing the read character into the out-parameter out like some other C API.
It does not solve the fundamental problem of using an int with 4,294,967,296 possible values to store nowhere near that much information, but it at the very least more strongly hints at the fact that there are special cases other than the case of a successfully read character.
This bug is so common that it's the first entry in the Stdio section on the comp.lang.c Frequently Asked Questions website.
EOF Is a Character
If we run the following code:
#include <stdio.h>
#include <stdlib.h>
int main(void) {
FILE* fp = stdin;
int c;
while ((c = fgetc(fp)) != EOF) {
putchar(c);
}
if (ferror(fp)) {
fprintf(stderr, "An error occurred while reading the file\n");
return EXIT_FAILURE;
}
return EXIT_SUCCESS;
}
We observe the following:
$ clang -O0 -Wconversion -Wall -Wextra -o eof eof.c
$ ./eof
hellooooooooooo????????
Notably, nothing happens! We have to enter a newline before the program receives what we just typed (and then echoes it out):
$ ./eof
hello world!!
hello world!!
hi!
hi!
That's because the terminal by default is operating in canonical mode or cooked mode. The mode is described in the POSIX specification for General Terminal Interface and in the manual for termios(3).
When the terminal is in this mode (determined by whether or not the ICANON flag in c_lflag is set, input is made available to the program line by line. We can observe the flag by running stty (highlight mine):
$ stty --all
speed 38400 baud; rows 57; columns 137; line = 0;
intr = ^C; quit = ^\; erase = ^?; kill = ^U; eof = ^D; eol = <undef>; eol2 = <undef>; swtch = <undef>; start = ^Q; stop = ^S; susp = ^Z;
rprnt = ^R; werase = ^W; lnext = ^V; discard = ^O; min = 1; time = 0;
-parenb -parodd -cmspar cs8 -hupcl -cstopb cread -clocal -crtscts
-ignbrk -brkint -ignpar -parmrk -inpck -istrip -inlcr -igncr icrnl -ixon -ixoff -iuclc -ixany -imaxbel iutf8
opost -olcuc -ocrnl onlcr -onocr -onlret -ofill -ofdel nl0 cr0 tab0 bs0 vt0 ff0
isig icanon iexten echo echoe echok -echonl -noflsh -xcase -tostop -echoprt echoctl echoke -flusho -extproc
If we run our eof program again with ICANON unset, we expect all input to be made available immediately to the program, so keypresses should be echoed immediately after they are entered:
$ stty -icanon && ./eof
hheelllloo wwoorrlldd!!!!
Looking at the output of stty again, we see a familiar term:
$ stty --all
speed 38400 baud; rows 57; columns 137; line = 0;
intr = ^C; quit = ^\; erase = ^?; kill = ^U; eof = ^D; eol = <undef>; eol2 = <undef>; swtch = <undef>; start = ^Q; stop = ^S; susp = ^Z;
rprnt = ^R; werase = ^W; lnext = ^V; discard = ^O; min = 1; time = 0;
-parenb -parodd -cmspar cs8 -hupcl -cstopb cread -clocal -crtscts
-ignbrk -brkint -ignpar -parmrk -inpck -istrip -inlcr -igncr icrnl -ixon -ixoff -iuclc -ixany -imaxbel iutf8
opost -olcuc -ocrnl onlcr -onocr -onlret -ofill -ofdel nl0 cr0 tab0 bs0 vt0 ff0
isig icanon iexten echo echoe echok -echonl -noflsh -xcase -tostop -echoprt echoctl echoke -flusho -extproc
Let's try it out!
$ ./eof
hello world!hello world!oh??oh??
Yep! Reading the relevant docs:
-
Special character on input, which is recognized if the ICANON flag is set. When received, all the bytes waiting to be read are immediately passed to the process without waiting for a <newline>, and the EOF is discarded. Thus, if there are no bytes waiting (that is, the EOF occurred at the beginning of a line), a byte count of zero shall be returned from the read(), representing an end-of-file indication. If ICANON is set, the EOF character shall be discarded when processed.
-
...this character causes the pending tty buffer to be sent to the waiting user program without waiting for end-of-line. If it is the first character of the line, the read(2) in the user program returns 0, which signifies end-of-file. Recognized when ICANON is set, and then not passed as input.
By default, ^D is bound to send a special character that informs the terminal of EOF. The terminal will then either flush any existing input, or return 0 to the read(2) call expecting input from the terminal.
You got it duck.
$ stty eof '^N' && ./eof
hello world!!hello world!!yayy!!!!yayy!!!!
The example rebinds the EOF character from CTRL+D to CTRL+N. In this scenario, EOF is a character.
EOF Recap
To recap the previous section, here is an example showing both all three EOFs in a step-by-step demo:
Demo
The program is blocked waiting for input from the terminal.
The input being typed is buffered by the terminal due to it being in canonical/cooked mode.
The terminal receives the EOF character when CTRL+D is pressed. There is buffered input, so the terminal passes it to the program, which prints it out in a loop.
After printing out the received input, the program once again blocks at the
fgetccall.The terminal receives the EOF character when CTRL+D is pressed again. The terminal has no buffered input, so it returns a byte count of 0 to the program. The
fgetcfunction interprets this as the EOF condition being reached, and returns the EOF macro to the caller. The program breaks the loop, and the program ends.
The program is blocked waiting for input from the terminal.
The input being typing is buffered by the terminal due to it being in canonical/cooked mode.
The terminal receives the EOF character when CTRL+D is pressed. There is buffered input, so the terminal passes it to the program, which prints it out in a loop.
After printing out the received input, the program once again blocks at the
fgetccall.The terminal receives the EOF character when CTRL+D is pressed again. The terminal has no buffered input, so it returns a byte count of 0 to the program. The
fgetcfunction interprets this as the EOF condition being reached, and returns the EOF macro to the caller. The program breaks the loop, and the program ends.
Code
#include <stdio.h>
#include <unistd.h>
int main(void) {
FILE* fp = stdin;
while (1) {
int c = fgetc(fp);
if (c == EOF) {
break;
}
// a. Print to standard error so there is no output buffering
// b. Wait 400ms after printing each character for demo purposes
putc(c, stderr);
usleep(400 * 1000);
}
return 0;
}
Conclusion
Here are a few facts that hopefully make sense by now!
EOF is not a character in the sense that there exists some byte(s) at the end of the content of regular files to mark their end.
EOF is not a character in the sense that a function like
fgetcreturns the "EOF character" upon reaching the end of a file.EOF is a special character that can be used to tell a terminal to signal the end-of-file condition to any programs reading from it.