Tuesday, November 29, 2011

Python: iterate (and read) all files in a directory (folder)

To iterate through all the files within the specified directory (folder), with ability to use wildcards (*, ?, and [ ]-style ranges), use the following code snippet:
PYTHON:
  1. import os
  2. import glob
  3.  
  4. path = 'sequences/'
  5. for infile in glob.glob( os.path.join(path, '*.fasta') ):
  6.     print "current file is: " + infile
If you do not need wildcards, then there is a simpler way to list all items in a directory:
PYTHON:
  1. import os
  2.  
  3. path = 'sequences/'
  4. listing = os.listdir(path)
  5. for infile in listing:
  6.     print "current file is: " + infile
print was promoted from a statement to a function in Python 3 (use print(infile) instead of print infile).
One should use 'os.path.join()' part to make the script cross-platform-portable (different OS use different path separators, and hard-coding path separator would stop the script from executing under a different OS).
Python docs mention that there is also iglob(), which is an iterator and thus working on directories with way too many files it will save memory by returning only single result per iteration, and not the whole list of files - as glob() does.

Monday, November 28, 2011

How to count number of files in a directory

$ ls -1 targetdir | wc -l

How to find - Size of a directory & Free disk space


This article explains 2 simple commands that most people want to know when they start using Linux. They are finding the size of a directory and finding the amount of free disk space that exists on your machine. The command you would use to find the directory size is ' du '. And to find the free disk space you could use ' df '.

All the information present in this article is available in the man pages for du and df. In case you get bored reading the man pages and you want to get your work done quickly, then this article is for you.

-

'du' - Finding the size of a directory

$ du
Typing the above at the prompt gives you a list of directories that exist in the current directory along with their sizes. The last line of the output gives you the total size of the current directory including its subdirectories. The size given includes the sizes of the files and the directories that exist in the current directory as well as all of its subdirectories. Note that by default the sizes given are in kilobytes.

$ du /home/david
The above command would give you the directory size of the directory /home/david


$ du -h
This command gives you a better output than the default one. The option '-h' stands for human readable format. So the sizes of the files / directories are this time suffixed with a 'k' if its kilobytes and 'M' if its Megabytes and 'G' if its Gigabytes.


$ du -ah

This command would display in its output, not only the directories but also all the files that are present in the current directory. Note that 'du' always counts all files and directories while giving the final size in the last line. But the '-a' displays the filenames along with the directory names in the output. '-h' is once again human readable format.

$ du -c
This gives you a grand total as the last line of the output. So if your directory occupies 30MB the last 2 lines of the output would be

30M .
30M total

The first line would be the default last line of the 'du' output indicating the total size of the directory and another line displaying the same size, followed by the string 'total'. This is helpful in case you this command along with the grep command to only display the final total size of a directory as shown below.


$ du -ch | grep total
This would have only one line in its output that displays the total size of the current directory including all the subdirectories.

Note : In case you are not familiar with pipes (which makes the above command possible) refer to Article No. 24 . Also grep is one of the most important commands in Unix. Refer to Article No. 25 to know more about grep.


$ du -s
This displays a summary of the directory size. It is the simplest way to know the total size of the current directory.

$ du -S
This would display the size of the current directory excluding the size of the subdirectories that exist within that directory. So it basically shows you the total size of all the files that exist in the current directory.

$ du --exculde=mp3
The above command would display the size of the current directory along with all its subdirectories, but it would exclude all the files having the given pattern present in their filenames. Thus in the above case if there happens to be any mp3 files within the current directory or any of its subdirectories, their size would not be included while calculating the total directory size.

-

'df' - finding the disk free space / disk usage

$ df
Typing the above, outputs a table consisting of 6 columns. All the columns are very easy to understand. Remember that the 'Size', 'Used' and 'Avail' columns use kilobytes as the unit. The 'Use%' column shows the usage as a percentage which is also very useful.


$ df -h
Displays the same output as the previous command but the '-h' indicates human readable format. Hence instead of kilobytes as the unit the output would have 'M' for Megabytes and 'G' for Gigabytes.

Most of the users don't use the other parameters that can be passed to 'df'. So I shall not be discussing them.

I shall in turn show you an example that I use on my machine. I have actually stored this as a script named 'usage' since I use it often.

Example :
I have my Linux installed on /dev/hda1 and I have mounted my Windows partitions as well (by default every time Linux boots). So 'df' by default shows me the disk usage of my Linux as well as Windows partitions. And I am only interested in the disk usage of the Linux partitions. This is what I use :

$ df -h | grep /dev/hda1 | cut -c 41-43

This command displays the following on my machine

45%

Basically this command makes 'df' display the disk usages of all the partitions and then extracts the lines with /dev/hda1 since I am only interested in that. Then it cuts the characters from the 41st to the 43rd column since they are the columns that display the usage in % , which is what I want.

Note : In case you are not familiar with pipes (which is used in the above command) then refer to Article No. 24 . 'cut' is another tool available in Unix. The above usage of cut gets the the characters that are present in the specified columns. If you are interested in knowing how to mount you Windows partitions under Linux, please refer to Article No. 3 .

There are a few more options that can be used with 'du' and 'df' . You could find them in the man pages.

Saturday, November 26, 2011

directory traverse in C using dirent.h

DIR *dir;
struct dirent *ent;
        dir = opendir ("c:\\src\\");
if (dir != NULL) {

  /* print all the files and directories within directory */
  while ((ent = readdir (dir)) != NULL) {
    printf ("%s\n", ent->d_name);
  }
  closedir (dir);
} else {
  /* could not open directory */
  perror ("");
  return EXIT_FAILURE;
}

linux command with directory


Removing directories >

There are two commands you can use for removing directories. If the directory is empty, you can use rmdir:
rmdir dir1
You can use rmdir only if the directory is empty. If you want to remove a directory with all its contents, you can use rm with the -r option. The -r option tells rm to remove a directory recursively:
rm -r dir1
It goes without saying that you can cause a lot of trouble with rm -r if you're not careful! In some cases it might be a good thing to use the -i option when deleting a directory with its contents so that you'd be prompted before each file in the directory gets deleted:
rm -ir dir1

Copying and moving directories >

For copying and moving directories you can use the cp and mv commands just like you use them with files. Yeah, I know. If you've already tried to copy a directory with cp, you've probably noticed that cp just complains at you. Probably it says something like cp: omitting directory yadda yadda. You see, the cp command wants you to use the -r option if you want to copy a directory with its contents. The -r means "copy recursively":
cp -r dir1 dir2
The above creates a directory named dir2 whose contents will be identical to dir1. However, if dir2 already exists, nothing will be overwritten: the directory dir1 will be copied into the dir2 directory under the name dir2/dir1.
When renaming directories, you use the mv command exactly the same way as with files:
mv dir1 dir2
When dealing with directories, mv works a bit like cp does. If dir2 doesn't exist, the above will rename dir1 to dir2, but if dir2 exists, the directory dir1 will be moved into the dir2 directory under the name dir2/dir1.

Saturday, November 12, 2011

python code/unicode/utf-8

s = urllib.urlopen('http://example.com').read()
s.decode('utf-8')

Friday, November 4, 2011

set, clear and toggle a single bit in C

Setting a bit
Use the bitwise OR operator (|) to set a bit.
 number |=1<< x; 
That will set bit x.
Clearing a bit
Use the bitwise AND operator (&) to clear a bit.
 number &=~(1<< x); 
That will clear bit x. You must invert the bit string with the bitwise NOT operator (~), then AND it.
Toggling a bit
The XOR operator (^) can be used to toggle a bit.
 number ^=1<< x; 
That will toggle bit x.
Checking a bit
You didn't ask for this but I might as well add it.
To check a bit, AND it with the bit you want to check:
 bit = number &(1<< x); 
That will put the value of bit x into the variable bit.

Two dimensional array in Python

[[0]*n for x in xrange(m)]
create an array of m*n, 0 initialized.