Bin Counter

To get Bin counter click on the Bin counter download link. Version 2.3 was written and tested for Python 2.5.2.

Using as input genomic DNA, or CHIP genomic DNA, or cDNA on Illumina GAII produces sequence reads that can be mapped on a reference genome. There are several programs available for mapping reads including a program provided by Illumina called Eland. A line of Eland output line looks like this:

>SOLEXA2:5:1:0:1551#0/1 NAAAATTCAAGATCTAAAAAATTCACTAATTCACAATTTT U0 1 0 0 chrC.fas 123252 F DD

Eland reports the chromosome and position on the reference sequence where the read maps. For example, the read above is mapped on the chloroplast chromosome at position 123,252. The read position and frequency can be analyzed to determine the input DNA composition. To do this, one needs to count the reads per unit length of genome, or per “bin”. Depending on coverage of the genome, you may want to choose something between 25 nucleotides and several kb as bin size. The program outputs a file for each chromosome with chr, bin, and number of read count/bin. The file can be examined on a text editor, or in a statistical program such as JMP (made by SAS). It is best visualized on a browser, but JMP can be very useful for preliminary analysis. Below is an example of the output file format:

chr bin count
chr2 4096000 44
chr2 8192000 74
chr2 2261000 20
chr2 8369000 36
chr2 18403000 47
chr2 10760000 66
chr2 17505000 50
chr2 16607000 84
chr2 426000 55
chr2 4167000 26
chr2 13913000 76
chr2 13015000 93
chr2 12117000 95
chr2 11219000 86
chr2 2687000 3
chr2 10321000 60

The file has a number of lines equal to the length of the chromosome divided by the bin size. Note that the file is not sorted. Sorting is pretty easy to do in Python or in JMP.

To get Bin counter click on the Bin counter download link, then copy and paste the text into a Python editor such as IDLE. Version 2.3 was written and tested for Python 2.5.2.

%d bloggers like this: