Understanding stdlib FileIO Caching
This post is regarding how how stdlib FIO caching is implemented.
FIO in stdlib uses its own caching technique to achieve performance optimization. It supports 3 buffering modes:
Line buffering (__SLBF) - characters are transmitted
to the system as a block when a new-line character is encountered. Line
buffering is meaningful only for text streams.
Full buffering (__SFBF)- characters are transmitted
to the system as a block when a buffer is filled.
No buffering (__SNBF)- characters are
transmitted to the system as they are written.
How buffering is achieved:-
FIO in stdlib uses in-process caching for providing buffered
IO. In-process buffering is achieved by a local buffer which is maintained by
FILE object. File object uses many data members to maintain caching
information, For e.g. fp->_w, fp->_bf._base, fp->_bf._size, fp->_p
to point out few.
Based on the type of caching, the necessary data members are
initialized during the first read of the file. During that time the necessary
file data is cached from the disc to the buffer. All the buffered
reads/writes which is later made, modifies this local buffer.
Only when the buffer is full or the user explicitly flushes
the data, the buffer is written to discs.
Every call to fread
and fwrite
first checks whether the stream is buffered IO or not and based on that it
either updates the local buffer or directly write/reads using system read/write
functions.
Other Related Information:-
1. By default Full buffering (__SFBF) is used for FIO. We
mostly deals with the Full buffering mode.
2. We can use the setvbuf()
and setbuf() library functions to control buffering. After opening a stream
(but before any other operations have been performed on it), we can explicitly
specify what kind of buffering we need by setting one of the flags,
_IOFBF (for full buffering),
_IOLBF (for line buffering), or
_IONBF (for unbuffered input/output).
3. Incase of unbuffered IO (__SNBF) we will be making direct
system level FIO calls, somewhat similar to what we were doing in fileio.c
build 9110.
Deductions:-
- Using Stdlib FIO, the performance improvement that we
are getting is due to its local caching but the quantum will purely be
dependent upon our usage.
- We could have leveraged the feature of stdlib unbuffered IO for our purpose but fflush() only uses write() system (and not uses fsync) which does not guarantee data write to disc. One way is we write our own fflush() that will call fsync() after calling stdlib fflush(). This way we can guarantee file writes to discs.
*Excuse me for poor formatting/