Skip to content
Snippets Groups Projects
Commit 042294d9 authored by arq5x's avatar arq5x
Browse files

updates to closest docs

parent df4e52be
No related branches found
No related tags found
No related merge requests found
...@@ -4,6 +4,11 @@ ...@@ -4,6 +4,11 @@
*closest* *closest*
############### ###############
Similar to :doc:`../tools/intersect`, `closest` searches for overlapping features in A and B. In the event that
no feature in B overlaps the current feature in A, `closest` will report the nearest (that is, least
genomic distance from the start or end of A) feature in B. For example, one might want to find which
is the closest gene to a significant GWAS polymorphism. Note that `closest` will report an
overlapping feature as the closest---that is, it does not restrict to closest *non-overlapping* feature. The following iconic "cheatsheet" summarizes the funcitonality available through the various optyions provided by the `closest` tool.
| |
...@@ -12,11 +17,6 @@ ...@@ -12,11 +17,6 @@
| |
Similar to :doc:`../tools/intersect`, `closest` searches for overlapping features in A and B. In the event that
no feature in B overlaps the current feature in A, `closest` will report the nearest (that is, least
genomic distance from the start or end of A) feature in B. For example, one might want to find which
is the closest gene to a significant GWAS polymorphism. Note that `closest` will report an
overlapping feature as the closest---that is, it does not restrict to closest *non-overlapping* feature.
.. note:: .. note::
...@@ -24,6 +24,12 @@ overlapping feature as the closest---that is, it does not restrict to closest *n ...@@ -24,6 +24,12 @@ overlapping feature as the closest---that is, it does not restrict to closest *n
then by start position (e.g., ``sort -k1,1 -k2,2n in.bed > in.sorted.bed`` then by start position (e.g., ``sort -k1,1 -k2,2n in.bed > in.sorted.bed``
for BED files). for BED files).
.. note::
Reports "none" for chrom and "-1" for all other fields when a feature
is not found in B on the same chromosome as the feature in A.
E.g. `none -1 -1`
.. important:: .. important::
As of version 2.22.0, the `closest` tool can accept multiple files for As of version 2.22.0, the `closest` tool can accept multiple files for
...@@ -40,34 +46,26 @@ Usage and option summary ...@@ -40,34 +46,26 @@ Usage and option summary
:: ::
bedtools closest [OPTIONS] -a <FILE> \ bedtools closest [OPTIONS] -a <FILE> \
-b <FILE1, FILE2, ..., FILEN> -b <FILE1, FILE2, ..., FILEN>
**(or)**: **(or)**:
:: ::
closestBed [OPTIONS] -a <FILE> \ closestBed [OPTIONS] -a <FILE> \
-b <FILE1, FILE2, ..., FILEN> -b <FILE1, FILE2, ..., FILEN>
=========================== =============================================================================================================================================================================================================== =========================== ===============================================================================================================================================================================================================
Option Description Option Description
=========================== =============================================================================================================================================================================================================== =========================== ===============================================================================================================================================================================================================
**-s** | Require same strandedness. That is, find the closest feature in **-s** Require same strandedness. That is, find the closest feature in B that overlaps A on the _same_ strand. By default, overlaps are reported without respect to strand.
| B that overlaps A on the _same_ strand.
| By default, overlaps are reported without respect to strand.
**-S** | Require opposite strandedness. That is, find the closest feature **-S** Require opposite strandedness. That is, find the closest featurein B that overlaps A on the _opposite_ strand. By default, overlaps are reported without respect to strand.
| in B that overlaps A on the _opposite_ strand.
| By default, overlaps are reported without respect to strand.
**-d** | In addition to the closest feature in B, **-d** In addition to the closest feature in B, report its distance to A as an extra column. The reported distance for overlapping features will be 0.
| report its distance to A as an extra column.
| The reported distance for overlapping features will be 0.
**-D** | Like `-d`, report the closest feature in B, and its distance to A **-D** | Like `-d`, report the closest feature in B, and its distance to A as an extra column. However unlike `-d`, use negative distances to report upstream features.
| as an extra column. However unlike `-d`, use negative distances to
| report upstream features.
| The options for defining which orientation is "upstream" are: | The options for defining which orientation is "upstream" are:
| - `ref` Report distance with respect to the reference genome. | - `ref` Report distance with respect to the reference genome.
| B features with a lower (start, stop) are upstream | B features with a lower (start, stop) are upstream
...@@ -77,6 +75,25 @@ Option Description ...@@ -77,6 +75,25 @@ Option Description
| - `b` Report distance with respect to B. | - `b` Report distance with respect to B.
| When B is on the - strand, "upstream" means A has a | When B is on the - strand, "upstream" means A has a
| higher (start,stop). | higher (start,stop).
**-io** Ignore features in B that overlap A. That is, we want close, yet not touching features only.
**-iu** Ignore features in B that are upstream of features in A. This option requires -D and follows its orientation rules for determining what is "upstream".
**-id** Ignore features in B that are downstream of features in A. This option requires -D and follows its orientation rules for determining what is "downstream".
**-t** | Specify how ties for closest feature should be handled. This occurs when two features in B have exactly the same "closeness" with A. By default, all such features in B are reported.
| Here are all the options:
| - `all` Report all ties (default).
| - `first` Report the first tie that occurred in the B file.
| - `last` Report the last tie that occurred in the B file.
**-mdb** | Specifiy how multiple databases should be resolved.
| - `each` Report closest records for each database (default).
| - `all` Report closest records among all databases.
**-N** Require that the query and the closest hit have different names. For BED, the 4th column is compared.
**-header** Print the header from the A file prior to results.
=========================== =============================================================================================================================================================================================================== =========================== ===============================================================================================================================================================================================================
...@@ -85,35 +102,52 @@ Option Description ...@@ -85,35 +102,52 @@ Option Description
========================================================================== ==========================================================================
Default behavior Default behavior
========================================================================== ==========================================================================
**closestBed** first searches for features in B that overlap a feature in A. If overlaps are found, the feature The `closest` tool first searches for features in B that overlap a feature in A. If overlaps are found, the feature in B that overlaps the highest fraction of A is reported. If no overlaps are found, `closestBed` looks for
in B that overlaps the highest fraction of A is reported. If no overlaps are found, **closestBed** looks for the feature in B that is *closest* (that is, least genomic distance to the start or end of A) to A. For example,
the feature in B that is *closest* (that is, least genomic distance to the start or end of A) to A. For
example, in the figure below, feature B1 would be reported as the closest feature to A1.
:: For example, consider the case where one of the intervals im B overlaps the interval in B, yet another does not:
Chromosome ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
BED FILE A *************
BED File B ^^^^^^^^ ^^^^^^
Result ======
For example: .. code-block:: bash
:: $ cat a.bed
cat A.bed chr1 10 20 a1 1 -
chr1 100 200
cat B.bed $ cat b.bed
chr1 500 1000 chr1 7 8 b1 1 -
chr1 1300 2000 chr1 15 25 b2 2 +
closestBed -a A.bed -b B.bed $ bedtools closest -a a.bed -b b.bed
chr1 100 200 chr1 500 1000 chr1 10 20 a1 1 - chr1 15 25 b2 2 +
Now compare what happens when neither interval in B overlaps the record in A, yet one is closer than the other.
.. code-block:: bash
$ cat a.bed
chr1 10 20 a1 1 -
$ cat b.bed
chr1 7 8 b1 1 -
chr1 30 40 b2 2 +
$ bedtools closest -a a.bed -b b.bed
chr1 10 20 a1 1 - chr1 7 8 b1 1
But what if each interval in B is equally close to the interval in A? In this case, the default behavior is to report all intervals in B that are tied for proximity. Check out the `-t` option to adjust this behaviour.
.. code-block:: bash
$ cat a.bed
chr1 10 20 a1 1 -
$ cat b.bed
chr1 7 8 b1 1 -
chr1 22 22 b2 2 +
$ bedtools closest -a a.bed -b b.bed
chr1 10 20 a1 1 - chr1 7 8 b1 1 -
chr1 10 20 a1 1 - chr1 22 23 b2 2 +
========================================================================== ==========================================================================
``-s`` Enforcing "strandedness" ``-s`` Enforcing "strandedness"
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment