Table of Contents

Introduction

PeakRanger is a multi-purporse, ultrafast ChIP-Seq peak caller. It is able to identify enriched genomic regions of reads while at the same time discover summits within these regions.

Obtaining PeakRanger

PeakRanger can be downloded from the modENCODE website. You can extract the files using: the unzip command in Linux.

System requirement

ranger requires at least 4G RAM and at least a dual-core CPU. Running ranger at 32-bit system is not recommended due to the memory limit.

Getting started

Minimally, PeakRanger requires two input files, one for the data/treatment and another for input/control.

Example1: basic usage

$ ./ranger -d treatment.file -c control.file --format=bowtie

The --format=bowtie shows the format of the files is Bowtie.

Example2: specifiy FDR value

$ ./ranger -d treatment.file -c control.file --format=bowtie -p 1e-7

The -p 1e-7 changes the FDR cut off from default(1e-4) to 1e-7.

Example3: specify running modes

$ ./ranger -d treatment.file -c control.file --format=bowtie --mode=region

The --mode=region tells ranger to run on the region mode, in which only the summit with most reads within enriched regions will be returned.

Output files

A couple of files will be generated. All filenames start with the output locations.

peaks.bed

This file contains all summits identified. Each summit is only 1bp long.

peaks_with_region.bed

This file contains all enriched regions, with summits printed out together. If more than one summit are found, summits are separated with comma.

valley.bed

This file contains all vallyes identified. Each valley is only 1bp long.

valleys_with_region.bed

This file contains all enriched regions, with valleys printed out together. If more than one valley are found, valleys are separated with comma.

.wig

The wiggle file that can be visualized in common genome browsers such as Gbrowse, UCSC Genome Browser and IGB. Two wiggle files will be generated, one for the treatment, the other for the input

regions

This file contains detailed info about the enriched regions identified.

raw

This file contains all enriched regions, before FDR filtering.

Advanced functions

Configuration file

PeakRanger supports external configuration file.The sample_config.cfg file shows ALL avialable options. In this sample config file, instructions are given on how to set each option.

Tuning parameters

delta and smoothing bandwidth are the two parameters that can change the ways PeakRanger run. You can change these two values to see how they affect the results. Generally, smaller delta and bandwidth gives you more details about the structures of enriched regions while at the same time more false positives.

Other notes

Fatal errors

Most errors are self-explaining except the following:

  1. Segmentation fault

If you see this, you produced exceptions that ranger could not handle. Usually this error is due to abnormal data files.

  1. GSL errors

These errors are thrown by the GSL library. Common reasons include: memory shortage, abnormal data files. If you see malloc errors, you may need more memories.

  1. Not found chromosome indexes

This usually indicates that the sample and control datasets contain different sets of chromosomes. For example, the sample contains chr1 and chr2, but the control contains chr1, chr2 and chr3, with chr3 being the additional chromsome. In this case, you should remove all reads mapped to chr3 and run ranger again.

The wiggle file generator

If you only want wiggle files, then you can use the

wig

program. The basic usage is similar with ranger:

Example:

$ ./wig -d treatment.file --format=bowtie

The --format=bowtie shows the format of the files is Bowtie.

Running the cloud version of PeakRanger

Setting up the Hadoop environment

Hadoop system has to be setup before running cloud-PeakRanger.

Datasets preprocessing

The control dataset must first be preprocessed using the script preprocessing/modify_controldata.

Running PeakRanger

The cloud version relies on the Hadoop Streaming system. When the Hadoop system is setup, supply it with the mapper and reducer, which can be found in folders:

mapper and reducer

Post-processing

After the cloud-PeakRanger finishes, run postprocessing/postprocessing to process the raw output file. You must specify a FDR cut-off to get meaningful results.

The Bowtie-PeakRanger pipeline

We provide a pipeline bundled with Bowtie. The pipeline accepts raw FASTQ files and produce peak calls and wiggle files in one click. Currently, we only provide a basic configuration that fits the routine analysis goal.

Preparing Bowtie indexes

Before using the pipeline, you have to first preapre the indexes for Bowtie. You can either go the homepage of Bowtie to download the indexes or build them yourself. To build the indexes, go to the folder

pipeline/bowtie*/indexes/

Then run the script that contains the name of the species of your data. All these scripts need network access to download genome files.

If you choose to download indexes, please place all downloaded *.ebwt files in the indexes folder.

Checking Bowtie indexes

Make sure bowtie indexes were built correctly, go to folder

'pipeline/bowtie*/'

In the termial, type:

./bowtie NAME_OF_THE_INDEX -c aa

If you see texts similar with the following, then it is good.

# reads processed: 1
# reads with at least one reported alignment: 0 (0.00%)
# reads that failed to align: 1 (100.00%)

Using the pipeline

In the termial, type:

$./bowtieranger sampledata inputdata ebwt_file multimatch threads

to fit your needs, change the parameters accordingly.The multimatch is the number of matches allowed for a single read. threads is the number of worker threads allocated for both Bowtie and ranger.ebwt_file is the Bowtie index file found in the folder:

pipeline/bowtie*/indexes

please only use the file name without the .ebwt extension. For example, if you have an index:

c_elegans201.1.ebwt

then you should only use:

c_elegans201

Example

$./bowtieranger sample.fastq input.fastq c_elegans201 1 4