Bioinformatic Tools

Scripts for manipulating genomic sequence:

fasta2ms.pl takes an aligned fasta formatted sequence file as input and tranforms into an ms formatted file with or without monomorphic sites. It is designed to processes long sequences (full chromosomes) from a limited number of individuals. Note that the fasta format must have the sequence for each header on a single line.

demultiplex.pl is designed to demultiplex indexed illumina reads. As input it takes a file containing a list of custom indices and fastq file(s) with indices in the header. The input fastq file(s) are split into output files for each index and each input file. Indices can be custom and of any length. One mismatch is allowed.

pileup2fastaCNS.pl reformats a pileup file generated with samtools (input) into fasta formatted consensus sequence(s) (output). This script ignores insertions and deletions and only considers single nucleotide changes. Where thresholds are not met, an 'N' is recorded in the consensus sequence.