PeakRanger is a multi-purporse, ultrafast ChIP-Seq peak caller. It is able to identify enriched genomic regions of reads while at the same time discover summits within these regions.
PeakRanger can be downloded from the modENCODE website. You can extract the files using: the unzip command in Linux.
ranger requires at least 4G RAM and at least a dual-core CPU. Running ranger at 32-bit system is not recommended due to the memory limit.
Minimally, PeakRanger requires two input files, one for the data/treatment and another for input/control.
$ ./ranger -d treatment.file -c control.file --format=bowtie
The --format=bowtie shows the format of the files is Bowtie.
$ ./ranger -d treatment.file -c control.file --format=bowtie -p 1e-7
The -p 1e-7 changes the FDR cut off from default(1e-4) to 1e-7.
$ ./ranger -d treatment.file -c control.file --format=bowtie --mode=region
The --mode=region tells ranger to run on the region mode, in which only the summit with most reads within enriched regions will be returned.
A couple of files will be generated. All filenames start with the output locations.
peaks.bedThis file contains all summits identified. Each summit is only 1bp long.
peaks_with_region.bedThis file contains all enriched regions, with summits printed out together. If more than one summit are found, summits are separated with comma.
valley.bedThis file contains all vallyes identified. Each valley is only 1bp long.
valleys_with_region.bedThis file contains all enriched regions, with valleys printed out together. If more than one valley are found, valleys are separated with comma.
.wigThe wiggle file that can be visualized in common genome browsers such as Gbrowse, UCSC Genome Browser and IGB. Two wiggle files will be generated, one for the treatment, the other for the input
regionsThis file contains detailed info about the enriched regions identified.
rawThis file contains all enriched regions, before FDR filtering.
PeakRanger supports external configuration file.The sample_config.cfg file shows ALL avialable options. In this sample config file, instructions are given on how to set each option.
delta and smoothing bandwidth are the two parameters that can change the ways PeakRanger run. You can change these two values to see how they affect the results. Generally, smaller delta and bandwidth gives you more details about the structures of enriched regions while at the same time more false positives.
Most errors are self-explaining except the following:
If you see this, you produced exceptions that ranger could not handle. Usually this error is due to abnormal data files.
These errors are thrown by the GSL library. Common reasons include: memory shortage, abnormal data files. If you see malloc errors, you may need more memories.
This
usually indicates that the sample and control datasets contain
different sets of chromosomes. For example, the sample contains chr1 and chr2, but the control contains chr1, chr2 and chr3, with chr3 being the additional chromsome. In this case, you should remove all reads mapped to chr3 and run ranger again.
If you only want wiggle files, then you can use the
wig
program. The basic usage is similar with ranger:
Hadoop system has to be setup before running cloud-PeakRanger.
The control dataset must first be preprocessed using the script preprocessing/modify_controldata.
The cloud version relies on the Hadoop Streaming system. When the Hadoop system is setup, supply it with the mapper and reducer, which can be found in folders:
mapper and reducer
After the cloud-PeakRanger finishes, run postprocessing/postprocessing to process the raw output file. You must specify a FDR cut-off to get meaningful results.
We provide a pipeline bundled with Bowtie. The pipeline accepts raw FASTQ files and produce peak calls and wiggle files in one click. Currently, we only provide a basic configuration that fits the routine analysis goal.
Before using the pipeline, you have to first preapre the indexes for Bowtie. You can either go the homepage of Bowtie to download the indexes or build them yourself. To build the indexes, go to the folder
pipeline/bowtie*/indexes/
Then run the script that contains the name of the species of your data. All these scripts need network access to download genome files.
If you choose to download indexes, please place all downloaded *.ebwt files in the indexes folder.
Make sure bowtie indexes were built correctly, go to folder
'pipeline/bowtie*/'
In the termial, type:
./bowtie NAME_OF_THE_INDEX -c aa
If you see texts similar with the following, then it is good.
# reads processed: 1 |
# reads with at least one reported alignment: 0 (0.00%) |
# reads that failed to align: 1 (100.00%) |
In the termial, type:
$./bowtieranger sampledata inputdata ebwt_file multimatch threads
to fit your needs, change the parameters accordingly.The multimatch is the number of matches allowed for a single read. threads is the number of worker threads allocated for both Bowtie and ranger.ebwt_file is the Bowtie index file found in the folder:
pipeline/bowtie*/indexes
please only use the file name without the .ebwt extension. For example, if you have an index:
c_elegans201.1.ebwt
then you should only use:
c_elegans201
$./bowtieranger sample.fastq input.fastq c_elegans201 1 4