Getting started

Minimally, PeakRanger requires two input files, one for the data/treatment and another for input/control.

Example1: basic usage

$ ./ranger -d treatment.file -c control.file --format=bowtie

The --format=bowtie shows the format of the files is Bowtie.

Example2: specifiy FDR value

$ ./ranger -d treatment.file -c control.file --format=bowtie -p 1e-7

The -p 1e-7 changes the FDR cut off from default(1e-4) to 1e-7.

Example3: specify running modes

$ ./ranger -d treatment.file -c control.file --format=bowtie --mode=region

The --mode=region tells ranger to run on the region mode, in which only the summit with most reads within enriched regions will be returned.

Output files

A couple of files will be generated. All filenames start with the output locations.

`peaks.bed`

This file contains all summits identified. Each summit is only 1bp long.

`peaks_with_region.bed`

This file contains all enriched regions, with summits printed out together. If more than one summit are found, summits are separated with comma.

`valley.bed`

This file contains all vallyes identified. Each valley is only 1bp long.

`valleys_with_region.bed`

This file contains all enriched regions, with valleys printed out together. If more than one valley are found, valleys are separated with comma.

`.wig`

The wiggle file that can be visualized in common genome browsers such as Gbrowse, UCSC Genome Browser and IGB. Two wiggle files will be generated, one for the treatment, the other for the input

`regions`

This file contains detailed info about the enriched regions identified.

`raw`

This file contains all enriched regions, before FDR filtering.

Other notes

Fatal errors

Most errors are self-explaining except the following:

Segmentation fault

If you see this, you produced exceptions that ranger could not handle. Usually this error is due to abnormal data files.

GSL errors

These errors are thrown by the GSL library. Common reasons include: memory shortage, abnormal data files. If you see malloc errors, you may need more memories.

Not found chromosome indexes

This usually indicates that the sample and control datasets contain different sets of chromosomes. For example, the sample contains chr1 and chr2, but the control contains chr1, chr2 and chr3, with chr3 being the additional chromsome. In this case, you should remove all reads mapped to chr3 and run ranger again.

The wiggle file generator

If you only want wiggle files, then you can use the

wig

program. The basic usage is similar with ranger:

Example:

$ ./wig -d treatment.file --format=bowtie

The --format=bowtie shows the format of the files is Bowtie.

Running the cloud version of PeakRanger

Setting up the Hadoop environment

Hadoop system has to be setup before running cloud-PeakRanger.

Datasets preprocessing

The control dataset must first be preprocessed using the script preprocessing/modify_controldata.

Running PeakRanger

The cloud version relies on the Hadoop Streaming system. When the Hadoop system is setup, supply it with the mapper and reducer, which can be found in folders:

mapper and reducer

Post-processing

After the cloud-PeakRanger finishes, run postprocessing/postprocessing to process the raw output file. You must specify a FDR cut-off to get meaningful results.

The Bowtie-PeakRanger pipeline

We provide a pipeline bundled with Bowtie. The pipeline accepts raw FASTQ files and produce peak calls and wiggle files in one click. Currently, we only provide a basic configuration that fits the routine analysis goal.

Preparing Bowtie indexes

Before using the pipeline, you have to first preapre the indexes for Bowtie. You can either go the homepage of Bowtie to download the indexes or build them yourself. To build the indexes, go to the folder

pipeline/bowtie*/indexes/

Then run the script that contains the name of the species of your data. All these scripts need network access to download genome files.

If you choose to download indexes, please place all downloaded *.ebwt files in the indexes folder.

Checking Bowtie indexes

Make sure bowtie indexes were built correctly, go to folder

'pipeline/bowtie*/'

In the termial, type:

./bowtie NAME_OF_THE_INDEX -c aa

If you see texts similar with the following, then it is good.

# reads processed: 1

# reads with at least one reported alignment: 0 (0.00%)

# reads that failed to align: 1 (100.00%)

Using the pipeline

In the termial, type:

$./bowtieranger sampledata inputdata ebwt_file multimatch threads

to fit your needs, change the parameters accordingly.The multimatch is the number of matches allowed for a single read. threads is the number of worker threads allocated for both Bowtie and ranger.ebwt_file is the Bowtie index file found in the folder:

pipeline/bowtie*/indexes

please only use the file name without the .ebwt extension. For example, if you have an index:

c_elegans201.1.ebwt

then you should only use:

c_elegans201

Example

$./bowtieranger sample.fastq input.fastq c_elegans201 1 4

Table of Contents

Introduction

Obtaining PeakRanger

System requirement