Table of Contents

Introduction

PeakRanger is a multi-purporse, ultrafast ChIP-Seq peak caller. It is able to identify enriched genomic regions while at the same time discover summits within these regions.

Obtaining PeakRanger

PeakRanger can be downloded from sourceforge.

System requirement

PeakRanger requires at least 750Mb RAM. The memory consumption will raise with larger datasets.

Compiling PeakRanger from source codes

Compiling in Ubuntu

Required libraries before compiling:

  1. The Boost library v1.47 or newer

  2. Pthread

  3. g++

Once all the libraries are installed, go to the root path of the unzipped package and type:

make

This will generate ranger and wig. Compilation in other similar distributions is similar.

Compiling in Mac OSX

Required libraries before compiling:

  1. Xcode developer tool kit from Apple

  2. The Boost library v1.47 or newer

The Xcode kit can be installed using the OSX installation disk.

Once all the libraries are installed, go to the root path of the unzipped package and type:

make

Getting started

Minimally, PeakRanger requires two input files, one for the data/treatment and another for input/control.

$ ./ranger -d treatment.file -c control.file --format bam -o ./result.file

The --format=bam shows the format of the files is bam. -o specifies the name of the result file. The output of this command is result.file and two wig files.

How PeakRanger works

PeakRanger uses a staged algorithm to discover enriched regions and the summits within them. In the first step, PeakRanger implements a FDR based adapative thresholding algorithm, which was originally proposed by PeakSeq. PeakRanger uses this thresholder to find regions with enriched reads that exceed expects. After that, PeakRanger searches for summits in these regions. The summit-search algorithm first looks for the location with largest number of reads. It then searchs for sub-summits with the sensitivity, the delta -r, specified by the user. Smaller -r will generate more summits. mode provides preset delta values for starter users. region of mode will mostly call 1 summit per region while resolution tries to find all reasonable summits.The coverage profiles are smoothed and padded before calling summits. The smoothing grade varies with -b. Higher smoothing bandwidth results less false summits but the accuracy of locations are also degraded.To measure the significance of the enriched regions, PeakRanger uses binormial distribution to model the relative enrichment of sample over control. A p value is generated as a result. Users can thus select highly significant peaks by using a smaller -p.

PeakRanger extends reads before calling peaks. The default reads extension length is 200. However, users can change this by -l if the datasets come with a different fragment size. The extension length will change the reads coverages generated from the raw reads as it will change the heights of peaks.

To help visualizing the results, PeakRanger generates reads coverage files in the wig format. These files can then be loaded into browsers to evaluate the authenticity of called peaks. Since smaller wiggle files take less time and memory to load, --split can be set to generate one small wig file per chromosome.

One of PeakRanger's advantages is its ease of use. --config allows the program to read configurations from a plain text file; And --chr_table asks ranger to only process data on chromosomes specified in the text file. For computers/clusters with multiple CPUs, larger -t will speed up the program by processing multiple chromosomes simultaneously.

Commandline options

Input

-d,--data

data file.(REQUIRED)

-c,--control

control(input) file.(REQUIRED)

--format

the format of the data file, can be one of : bowtie, sam, bam and bed.(REQUIRED)

--chr_table

process chromosomes contained in the specified chr table file.

--config

specify the location of the configuration file.

Qualities

-p,--pval

p value cut off.(default:1e-4)

-l,--ext_length

read extension length.(default:200)

-r,--delta

delta(see paper), must be in the region(0, 1).(default:0.8)

-b,--bandwidth

bandwidth.(default:99)

--pad

pad read coverage to avoid false positive summits.(default:off)

Running modes

--mode

specify the running mode, can be one of : region, resolution

-t

number of threads.(default: 1)

Output

-o

specify the location of output files. (REQUIRED)

--nowig

do not generate wiggle(.wig) files.(default:off)

--split,-s

generate one wig file per chromosome.(default:off)

Other

--verbose

print application progress.

Output files

The result file contains detailed information of peaks called. Wiggle files are also generated if --nowig was not set.

Other notes

Segmentation fault

oops..This indicates a new bug.

Fatal: locale::facet::_S_create_c_locale name not valid

This is a (unfixed) bug of the boost::filesystem library. Re-compile the codes in the machine with problems could potentially solve it.

The stand alone wiggle file generator

If you only need wiggle files, then you can use the

wig

program. The basic usage is similar with ranger:

$ ./wig -d treatment.file --format=bowtie -o ./result.wig