Thursday, 10 August 2006

Simplifying data extraction using Linux text utilities

I remember, the first time I was introduced to Unix - Yes my first experience with a POSIX OS was with Unix, more specifically SCO Unix and not Linux - the instructor told us that the real power of Unix was in its accomplishment of complex tasks by splitting them into smaller tasks which in turn are split into even smaller tasks and then assigned to different utilities. And the output from these utilities is combined together to get the desired solution. In management speak, it is known as efficient delegation of duties which makes Unix/Linux a winner. Compared to this, in Windows, you have a monolithic software doing all the tasks by itself which leads to unnecessary duplication and waste of resources.

As an example, take the case of spell checking for instance. In Linux you have a utility called aspell which does the spell checking. This is regardless of which application you use - be it Abiword, Vim or OpenOffice.org, when you select the menu to spell check (assuming there is one), it will be passing on the task to aspell; and aspell will in turn pass back the result to the application. But in windows each application has its own spell check code inbuilt in it.

I find combining different utilities to achieve complex tasks in Linux/Unix really fascinating. My favorite one is using a combination of 'find' and 'grep' to search for a particular string in files on my hard disk and list the files which contain the string. This I achieve as follows:
$ find . -iname \*.txt -exec grep -s -l "Linux" {} \;
... The above command will search for and list all the text files with txt extension containing the word "Linux". Try out the command on your Linux machine and see the output. For more information on using the find utility, read this article.

Now try accomplishing the same task in Windows and you will understand what I mean. If you ask me, these little utilities which are bundled with all *nixes are the work horses which impart the sheer power to a POSIX operating system in the first place.

Today I came across this very informative introductory level article written by Harsha S. Adiga which explains how to use some of the most common utilities found in Linux to accomplish numerous day to day tasks. The author explains each tool - and 9 of them are covered - with the aid of examples. Reading this article made me really nostalgic because nowadays, with the beautiful desktops we have in Linux, even I feel a bit spoilt and do not use the command line as frequently as I used to.

No comments:

Post a Comment