Thursday, 18 October 2018

Linux Useful Commands


  1. Check if the command can be executed by some user? 

    > su -s /bin/bash <your_user_to_test>

    This command will give you the shell with same privileges as the user you want to test the command.
    then you can type the command that you want to test and check.
  2. Read a file inside the tar ball,
    Find out the path of the embedded archive file that we wish to list or see:
    tar -tzf outer.tar.gz | grep tar

    Assume the ouput is `path/to/inner.tar`.
    then the command to,
    1.  list from the inner tar would be,
                tar -xOf outer.tar.gz path/to/inner.tar | tar -tf -
    1.   see the content of the file inside the inner tar is,
            tar -xOf outer.tar.gz path/to/inner.tar | tar -xOf - inner/some_file.txt | less

     3. To print all the words, containing the given word,
        grep -E 'rust|rs$' /usr/share/dict/words | perl -MList::Util=shuffle -e 'print shuffle(<STDIN>);' | less 

note:- in this change the word rust to whatever you want to search


DS points for reference:-

Wednesday, 28 March 2018

Understanding stdlib FileIO Caching


Understanding stdlib FileIO Caching


This post is regarding how how stdlib FIO caching  is implemented.

FIO in stdlib uses its own caching technique to achieve performance optimization. It supports 3 buffering modes:


Line buffering (__SLBF) - characters are transmitted to the system as a block when a new-line character is encountered. Line buffering is meaningful only for text streams.
Full buffering (__SFBF)- characters are transmitted to the system as a block when a buffer is filled.
No buffering   (__SNBF)- characters are transmitted to the system as they are written.

How buffering is achieved:-
FIO in stdlib uses in-process caching for providing buffered IO. In-process buffering is achieved by a local buffer which is maintained by FILE object. File object uses many data members to maintain caching information, For e.g. fp->_w, fp->_bf._base, fp->_bf._size, fp->_p to point out few.
Based on the type of caching, the necessary data members are initialized during the first read of the file. During that time the necessary file data is cached from the disc to the buffer. All the buffered reads/writes which is later made, modifies this local buffer.
Only when the buffer is full or the user explicitly flushes the data, the buffer is written to discs.

Every call to fread and fwrite first checks whether the stream is buffered IO or not and based on that it either updates the local buffer or directly write/reads using system read/write functions.

Other Related Information:-
1. By default Full buffering (__SFBF) is used for FIO. We mostly deals with the Full buffering mode.

2. We can use the setvbuf() and setbuf() library functions to control buffering. After opening a stream (but before any other operations have been performed on it), we can explicitly specify what kind of buffering we need by setting one of the flags,
_IOFBF (for full buffering),
_IOLBF (for line buffering), or
_IONBF (for unbuffered input/output).

3. Incase of unbuffered IO (__SNBF) we will be making direct system level FIO calls, somewhat similar to what we were doing in fileio.c build 9110.

Deductions:-
  1. Using Stdlib FIO, the performance improvement that we are getting is due to its local caching but the quantum will purely be dependent upon our usage.
  2. We could have leveraged the feature of stdlib unbuffered IO for our purpose but fflush() only uses write() system (and not uses fsync) which does not guarantee data write to disc. One way is we write our own fflush() that will call fsync() after calling stdlib fflush(). This way we can guarantee file writes to discs. 

Please share your thoughts and doubts in comments.

*Excuse me for poor formatting/