PeakRanger is a multi-purporse, ultrafast ChIP-Seq peak caller. It is able to identify enriched genomic regions while at the same time discover summits within these regions.
PeakRanger can be downloded from sourceforge.
PeakRanger requires at least 750Mb RAM. The memory consumption will raise with larger datasets.
Required libraries before compiling:
The Boost library v1.47 or newer
Pthread
g++
Once all the libraries are installed, go to the root path of the unzipped package and type:
make
This will generate ranger and wig. Compilation in other similar distributions is similar.
Required libraries before compiling:
Xcode developer tool kit from Apple
The Boost library v1.47 or newer
The Xcode kit can be installed using the OSX installation disk. The Boost library can be installed by following the instructions on its website. If you do not have root access, change the BOOST_PATH variable in the make file:
BOOST_PATH = -I/path/to/your/boost/header -L/path/to/your/boost/library
Once all the libraries are installed, go to the root path of the unzipped package and type:
make
If the compilation failed, double check the BOOST_PATH variable is correctly set. The resulting binaries require dynamic boost library files, to make sure ranger can find these files, type :
export DYLD_LIBRARY_PATH=$DYLD_LIBRARY_PATH:/path/to/your/boost/library
change the path accordingly.
Not supported.
Minimally, PeakRanger requires two input files, one for the data/treatment and another for input/control.
$ ./ranger -d treatment.file -c control.file --format bam -o ./result.file
The --format=bam shows the format of the files is bam. -o specifies the name of the result file. The output of this command is result.file and two wig files.
PeakRanger uses a staged algorithm to discover enriched regions and the summits within them. In the first step, PeakRanger implements a FDR based adapative thresholding algorithm, which was originally proposed by PeakSeq. PeakRanger uses this thresholder to find regions with enriched reads that exceed expects. After that, PeakRanger searches for summits in these regions. The summit-search algorithm first looks for the location with largest number of reads. It then searchs for sub-summits with the sensitivity, the delta -r, specified by the user. Smaller -r will generate more summits. mode provides preset delta values for starter users. region of mode will mostly call 1 summit per region while resolution tries to find all reasonable summits.The coverage profiles are smoothed and padded before calling summits. The smoothing grade varies with -b. Higher smoothing bandwidth results less false summits but the accuracy of locations are also degraded.To measure the significance of the enriched regions, PeakRanger uses binormial distribution to model the relative enrichment of sample over control. A p value is generated as a result. Users can thus select highly significant peaks by using a smaller -p. In addition, users can filter peaks by the '-q' option, which controls the FDR of peaks. For each p-value, the Benjamini-Hochberg procedure is applied to calculate the FDR.
PeakRanger extends reads before calling peaks. The default reads extension length is 200. However, users can change this by -l if the datasets come with a different fragment size. The extension length will change the reads coverages generated from the raw reads as it will change the heights of peaks.
To help visualizing the results, PeakRanger generates reads coverage files in the wig format. These files can then be loaded into browsers to evaluate the authenticity of called peaks. Since smaller wiggle files take less time and memory to load, --split can be set to generate one small wig file per chromosome.
One of PeakRanger's advantages is its ease of use. --config allows the program to read configurations from a plain text file; And --chr_table asks ranger to only process data on chromosomes specified in the text file. For computers/clusters with multiple CPUs, larger -t will speed up the program by processing multiple chromosomes simultaneously.
|
|
data file.(REQUIRED) |
|
|
control(input) file.(REQUIRED) |
|
|
the format of the data file, can be one of : bowtie, sam, bam and bed.(REQUIRED) |
|
|
process chromosomes contained in the specified chr table file. |
|
|
specify the location of the configuration file. |
|
|
p value cut off.(default:1e-4) |
|
|
FDR cut off.(default:5e-2) |
|
|
read extension length.(default:200) |
|
|
sensitivity of summits detector, must be in the region(0, 1).(default:0.8) |
|
|
bandwidth.(default:99) |
|
|
pad read coverage to avoid false positive summits.(default:off) |
|
|
specify the running mode, can be one of : region, resolution |
|
|
number of threads.(default: 1) |
|
|
specify the location of output files. (REQUIRED) |
|
|
do not generate wiggle(.wig) files.(default:off) |
|
|
generate one wig file per chromosome.(default:off) |
|
|
print application progress. |
The result file contains detailed information of peaks called. Usually three files will be geneated:
_summit.bed
_region.bed
_details
The first two bed files can be visualized in IGV. _summit.bed file contains the locations of summits ranked by their FDR. _regions.bed file contains the locations of regions ranked by their FDR. Each summit or region is annotated by the 4th column.
_details file contains both summits and regions as well as the regions's FDR and p values.
Wiggle files are also generated for both the treatment and control file if --nowig was not set.
If you only need wiggle files, then you can use the
wig
program. The basic usage is similar with ranger:
$ ./wig -d treatment.file --format=bowtie -o ./result.wig