PeakRanger is a multi-purporse software suite for analyzing next-generation sequencing (NGS) data. The suite contains the following tools:
nr noise rate estimator. Estimates signal to noise ratio which is an indicator for ChIP enrichmentlc library complexity calculator. Calculates the ratio of unique reads over total reads. Only accepts bam files.wig coverage file generator. Generates variable step format wiggle filewigpe coverage file generator. Generates bedGraph format wiggle file and supports spliced alignments and thus only supports bam filesranger ChIP-Seq peak caller. It is able to identify enriched genomic regions while at the same time discover summits within these regions.ccat ChIP-Seq peak caller. Tuned for the discovery of broad peaksBoth ranger and ccat supports generating HTML-based annotation reports.
wigpe can also generate coverage files for bam files containing spliced reads, such as those from RNA-Seq experiments.
If you use PeakRanger in your research, please cite:
if you use the ccat tool, please also cite:
--report and --gene_annot_file set.PeakRanger can be downloded from sourceforge.
It turns out that ranger servers better as a narrow-peak caller. It behaves in a conservative but sensitive way compared to similar algorithms.
ranger uses a staged algorithm to discover enriched regions and the summits within them. In the first step, PeakRanger implements a FDR based adapative thresholding algorithm, which was originally proposed by PeakSeq. PeakRanger uses this thresholder to find regions with enriched reads that exceed expects. After that, PeakRanger searches for summits in these regions. The summit-search algorithm first looks for the location with largest number of reads. It then searchs for sub-summits with the sensitivity, the delta -r, specified by the user. Smaller -r will generate more summits.The coverage profiles are smoothed and padded before calling summits. The smoothing grade varies with -b. Higher smoothing bandwidth results less false summits at the cost of degraded summit accuracy .To measure the significance of the enriched regions, PeakRanger uses binormial distribution to model the relative enrichment of sample over control. A p value is generated as a result. Users can thus select highly significant peaks by using a smaller -p.
ranger extends reads before calling peaks. The default reads extension length is 200. However, users can change this by -l if the datasets come with a different fragment size. The extension length will change the reads coverages generated from the raw reads as it will change the heights of peaks.
To help visualizing the results, wigpe and wig generates reads coverage files in the wig format. These files can then be loaded into browsers to evaluate the authenticity of called peaks. Since smaller wiggle files take less time and memory to load, --split can be set to generate one small wig file per chromosome.
Calling broad peaks remain unsolved for the ChIP-Seq community. It seems the CCAT algorithm is one of those that is designed for this problem, especially for calling histone modification marks.
For details of the algorithm, please refer to the original manuscript of CCAT:
nr is a module of the original CCAT algorithm that estimates the similarity of data and control. It indicates roughly how data departs from control
lc measures the percentage of unique reads. The result measures how diversified the reads are in the dataset. The idea is from:
Required libraries before compiling:
The Boost library v1.47 or newer
Pthread
g++
Once all the libraries are installed, go to the root path of the unzipped package and type:
make
This will generate bin/peakranger. Compilation in other Linux distributions is similar.
Required libraries before compiling:
Xcode developer tool kit from Apple
The Boost library v1.47 or newer
The Xcode kit can be installed using the OSX installation disk. If you dont have the installation disk, you can also get it for free from Apple Developer. The tool kit installs essential command line tools such as make and C++ compilers. The Boost library can be installed by following the instructions on its website. If you do not have root access, add the BOOST_PATH variable to the make file:
BOOST_PATH = -I/path/to/your/boost/header -L/path/to/your/boost/library
and add it to the variables of g++.
Once all the libraries are installed, go to the root path of the unzipped package and type:
make
If the compilation failed, double check the BOOST_PATH variable is correctly set. The resulting binaries require dynamic boost library files, to make sure peakranger can find these files, type :
export DYLD_LIBRARY_PATH=$DYLD_LIBRARY_PATH:/path/to/your/boost/library
please change the path accordingly.
Not supported but should be possible.
peakranger lc sample.bam
peakranger nr --format bam sample.bam control.bam
peakranger wig --format bam sample.bam sample.bam_coverage
peakranger wig --format bed sample.bed sample.bed_coverage
peakranger wigpe sample.bam sample.bam_coverage
peakranger wigpe sample.bam sample.bam_coverage_splitted -s
peakranger wigpe sample.bam sample.bam_coverage_splitted_by_strand -sx
peakranger wigpe sample.bam sample.bam_coverage_gzipped -z
peakranger wigpe sample.bam sample.bam_coverage_splitted_by_strand_gzip -sxz
peakranger wigpe sample.bam sample.bam_coverage_read_extend_to_200 -l 200
peakranger ranger --format bam sample.bam control.bam ranger_result
peakranger ranger --format bam sample.bam control.bam ranger_result_threaded_faster -t 3
peakranger ccat --format bam sample.bam contro.bam ccat_result
peakranger ccat --format bam sample.bam contro.bam ccat_result_with_HTML_report
--report --gene_annot_file hg19refGene.txt
peakranger ccat --format bam sample.bam contro.bam ccat_result_with_HTML_report_5kb_region
--report --gene_annot_file hg19refGene.txt
--plot_region 10000
|
|
data file. |
|
|
control file. |
|
|
the format of the data file, can be one of : bowtie, sam, bam and bed. |
|
|
read extension length |
|
|
show the usage |
|
|
show progress |
|
|
output the version number |
|
|
data file. |
|
|
show the usage |
|
|
show progress |
|
|
output the version number |
|
|
data file. |
|
|
the format of the data file, can be one of : bowtie, sam, bam and bed. |
|
|
the output location |
|
|
generate one wig file per chromosome |
|
|
compress the output |
|
|
generate one wig file per strand |
|
|
read extension length |
|
|
show the usage |
|
|
show progress |
|
|
output the version number |
|
|
data file. |
|
|
the output location |
|
|
generate one wig file per chromosome |
|
|
compress the output |
|
|
generate one wig file per strand |
|
|
read extension length |
|
|
show the usage |
|
|
show progress |
|
|
output the version number |
|
|
data file. |
|
|
control file. |
|
|
the format of the data file, can be one of : bowtie, sam, bam and bed. |
|
|
the output location |
|
|
generate html reports |
|
|
the length of the snapshort regions in the HTML report. It also controls the search span for nearby genes. |
|
|
the gene annotation file |
|
|
p value cut off |
|
|
FDR cut off |
|
|
read extension length |
|
|
sensitivity of the summit detector |
|
|
smoothing bandwidth. |
|
|
pad read coverage to avoid false positive summits |
|
|
number of threads.(default: 1) |
|
|
show the usage |
|
|
show progress |
|
|
output the version number |
|
|
data file. |
|
|
control file. |
|
|
the format of the data file, can be one of : bowtie, sam, bam and bed. |
|
|
the output location |
|
|
generate html reports |
|
|
the length of the snapshort regions in the HTML report. It also controls the search span for nearby genes. |
|
|
the gene annotation file |
|
|
FDR cut off |
|
|
sliding window size |
|
|
window moving step |
|
|
minimum window reads count |
|
|
minimum window reads fold change |
|
|
read extension length |
|
|
show the usage |
|
|
show progress |
|
|
output the version number |
lc does not generate any files
nr does not generate any files
wig generates a single file by default. When -x or -s is specified, it generates multiple files depending on the datasets.
similar to wig
Three files will be geneated:
_summit.bed
_region.bed
_details
The first two bed files can be visualized in IGV. _summit.bed file contains the locations of summits ranked by their FDR. _regions.bed file contains the locations of regions ranked by their FDR. Each summit or region is annotated by the 4th column.
_details file contains both summits and regions as well as the regions's FDR and p values.
When --report is enabled, this file will also contain nearby genes of called peaks.
--report enables HTML reporting that generates a folder named using the data file's name. The folder contains a single index.html visualizable in most browsers.
Similar to ranger