Skip to content
Snippets Groups Projects
RELEASE_HISTORY 20.2 KiB
Newer Older
Version 2.11.0 (January-21-2010)

Enhancements:
=============
1. Support for zero length features (i.e., start = end)
   - For example, this allows overlaps to be detected with insertions in the reference genome, as reported by dbSNP. 
2. Both 8 and 9 column GFF files are now supported.
3. slopBed can now extend the size of features by a percentage of it's size (-pct) instead of just a fixed number of bases.
4. Two improvements to shuffleBed:
   3a. A -f (overlapFraction) parameter that defines the maximum overlap that a randomized feature can have with an -excl feature. 
       That is, if a chosen locus has more than -f overlap with an -excl feature, a new locus is sought.
   3b. A new -incl option (thanks to Michael Hoffman and Davide Cittaro) that, defines intervals in which the randomized features should        be placed.  This is used instead of placing the features randomly in the genome.  Note that a genome file is still required so 
       that a randomized feature does not go beyond the end of a chromosome. 
5. bamToBed can now optionally report the CIGAR string as an additional field.
6. pairToPair can now report the entire paired feature from the B file when overlaps are found.
7. complementBed now reports all chromosomes, not just those with features in the BED file.
8. Improved randomization seeding in shuffleBed.  This prevents identical output for runs of shuffleBed that
   occur in the same second (often the case).


Bug Fixes:
==========
1. Fixed the "BamAlignmentSupportData is private" compilation issue.
2. Fixed a bug in windowBed that caused positions to run off the end of a chromosome.
 

Major Changes:
==============
1. The groupBy command is now part of the filo package (https://github.com/arq5x/filo) and will no longer be distributed with BEDTools.



Version 2.10.0 (September-21-2010)
==New tools==
1. annotateBed. Annotates one BED/VCF/GFF file with the coverage and number of overlaps observed
from multiple other BED/VCF/GFF files. In this way, it allows one to ask to what degree one feature coincides with multiple other feature types with a single command. For example, the following will annotate the fraction of the variants in variants.bed that are covered by genes, conservaed regions and know variation, respectively.
$ annotateBed -i variants.bed -files genes.bed conserv.bed known_var.bed

This tool was suggested by Can Alkan and was motivated by the example source code that he kindly provided.

==New features==
1. New frequency operations (freqasc and freqdesc) added to groupBy.  These operations report a histogram of the frequency that each value is observed in a given column.
2. Support for writing uncompressed bam with the -ubam option.
3. Shorthand arguments for groupBy (-g eq. -grp, -c eq. -opCols, -o eq. -opCols).
4. In addition, all BEDTools that require only one main input file (the -i file) will assume that input is
coming from standard input if the -i parameter is ignored. For example, the following are equivalent:
$ cat snps.bed | sortBed –i stdin
$ cat snps.bed | sortBed

As are these:
$ cat data.txt | groupBy -i stdin -g 1,2,3 -c 5 -o mean
$ cat data.txt | groupBy -g 1,2,3 -c 5 -o mean

==Bug fixes==
1. Increased the precision of the output from groupBy.



Version 2.9.0 (August-16-2010)
Aaron's avatar
Aaron committed
==New tools==
1. unionBedGraphs.  This is a very powerful new tool contributed by Assaf Gordon from  CSHL.  It will combine/merge multiple BEDGRAPH files into a single file, thus allowing comparisons of coverage (or any text-value) across multiple samples.

==New features==
1. New "distance feature" (-d) added to closestBed by Erik Arner.  In addition to finding the closest feature to each feature in A, the -d option will report the distance to the closest feature in B.  Overlapping features have a distance of 0.
2. New "per base depth feature" (-d) added to coverageBed.  This reports the per base coverage (1-based) of each feature in file B based on the coverage of features found in file A.  For example, this could report the per-base depth of sequencing reads (-a) across each capture target (-b).


==Bug Fixes==
1. Fixed bug in closestBed preventing closest features from being found for A features with start coordinates < 2048000.  Thanks to Erik Arner for pointing this out.
2. Fixed minor reporting annoyances in closestBed.  Thanks to Erik Arner.
3. Fixed typo/bug in genomeCoverageBed that reported negative coverage owing to numeric overflow.  Thanks to Alexander Dobin for the detailed bug report.
4. Fixed other minor parsing and reporting bugs/annoyances.


Version 2.8.3 (July-25-2010)
1. Fixed bug that caused some GFF files to be misinterpreted as VCF.  This prevented the detection of overlaps.
2. Added a new "-tag" option in bamToBed that allows one to choose the _numeric_ tag that will be used to populate the score field.  For example, one could populate the score field with the alignment score with "-tag AS".
3. Updated the BamTools API. 


Version 2.8.2 (July-18-2010)
1. Fixed a bug in bedFile.h preventing GFF strands from being read properly.
2. Fixed a bug in intersectBed that occasionally caused spurious overlaps between BAM alignments and BED features.
3. Fixed bug in intersectBed causing -r to not report the same result when files are swapped.
4. Added checks to groupBy to prevent the selection of improper opCols and groups.
5. Fixed various compilation issues, esp. for groupBy, bedToBam, and bedToIgv.
6. Updated the usage statements to reflect bed/gff/vcf support.
7. Added new fileType functions for auto-detecting gzipped or regular files.  Thanks to Assaf Gordon.


Version 2.8.1 (July-05-2010)
Aaron's avatar
Aaron committed
1.  Added bedToIgv.


Version 2.8.0 (July-04-2010)
Aaron's avatar
Aaron committed

1.  Proper support for "split" BAM alignments and "blocked" BED (aka BED12) features. By using the "-split" option, intersectBed, coverageBed, genomeCoverageBed, and bamToBed will now correctly compute overlaps/coverage solely for the "split" portions of BAM alignments or the "blocks" of BED12 features such as genes. 

2.  Added native support for the 1000 Genome Variant Calling Format (VCF) version 4.0.

3.  New bed12ToBed6 tool.  This tool will convert each block of a BED12 feature into discrete BED6 features.

4.  Useful new groupBy tool.  This is a very useful new tool that mimics the "groupBy" clause in SQL.  Given a file or stream that is sorted by the appropriate "grouping columns", groupBy will compute summary statistics on another column in the file or stream.  This will work with output from all BEDTools as well as any other tab-delimited file or stream.  Example summary operations include: sum, mean, stdev, min, max, etc.  Please see the help for the tools for examples.  The functionality in groupBy was motivated by helpful discussions with Erik Arner at Riken.

5.  Improvements to genomeCoverageBed.  Applied several code improvements provided by Gordon Assaf at CSHL.  Most notably, beyond the several efficiency and organizational changes he made, he include a "-strand" option which allows one to specify that coverage should only be computed on either the "+" or the "-" strand.

6.  Fixed a bug in closestBed found by Erik Arner (Riken) which incorrectly reported "null" overlaps for features that did not have a closest feature in the B file.

7.  Fixed a careless bug in slopBed also found by Erik Arner (Riken) that caused an infinite loop when the "-excl" option was used.
 
8.  Reduced memory consumption by ca. 15% and run time by ca. 10% for most tools.

9.  Several code-cleanliness updates such as templated functions and common tyedefs.

10.  Tweaked the genome binning approach such that 16kb bins are the most granular.


Version 2.7.1 (May-06-2010)
Fixed a typo that caused some compilers to fail on closestBed.

Version 2.7.0 (May-05-2010)

General:
1. "Gzipped" BED and GFF files are now supported as input by all BEDTools.  Such files must end in ".gz".
2. Tools that process BAM alignments now uniformly compute an ungapped alignment end position based on the BAM CIGAR string.  Specifically, "M", "D" and "N" operations are observed when computing the end position.
3. bamToBed requires the BAM file to be sorted/grouped by read id when creating BEDPE output.  This allows the alignments end coordinate  for each end of the pair to be properly computed based on its CIGAR string.  The same requirement applies to pairToBed.
4. Updated manual.
5. Many silent modifications to the code that improve clarity and sanity-checking and facilitate future additions/modifications.

	
New Tools:
1. bedToBam. This utility will convert BED files to BAM format.  Both "blocked" (aka BED12) and "unblocked" (e.g. BED6) formats are acceptable.  This allows one to, for example, compress large BED files such as dbSNP into BAM format for efficient visualization.


Changes to existing tools:
	intersectBed
		1. Added -wao option to report 0 overlap for features in A that do not intersect any features in B.  This is an extension of the -wo option. 
	
	bamToBed
		1. Requires that BAM input be sorted/grouped by read name.

	pairToBed
		1. Requires that BAM input be sorted/grouped by read name.
		2. Allows use of minimum mapping quality or total edit distance for score field.

	windowBed
		1. Now supports BAM input.

	genomeCoverageBed
		1. -bga option. Thanks to Gordon Assaf for the suggestion.
		2. Eliminated potential seg fault.

Acknowledgements:
	1. Gordon Assaf: for suggesting the -bga option in genomeCoverageBed and for testing the new bedToBam utility.
	2. Ivan Gregoretti: for helping to expedite the inclusion of gzip support.
	3. Can Alkan: for suggesting the addition of the -wao option to intersectBed.
	4. James Ward: for pointing out that bedToBam did not need to create "dummy" seq and qual entries.



Version 2.6.1 (Mar-29-2010)
1. Fixed a careless command line parsing bug in coverageBed.


Aaron's avatar
Aaron committed
Version 2.6.0 (Mar-23-2010)
***Specific improvements / additions to tools***
1. intersectBed
  * Added an option (-wo) that reports the number of overlapping bases for each intersection b/w A and B files.
    -- Not sure why this wasn't added sooner; it's obvious.

2. coverageBed
  * native BAM support
  * can now report a histogram (-hist) of coverage for each feature in B.  Useful for exome sequencing projects, for example.
    -- thanks for the excellent suggestion from Jose Bras
  * faster

3. genomeCoverageBed
  * native BAM support
  * can now report coverage in BEDGRAPH format (-bg)
    -- thanks for the code and great suggestion from Gordon Assaf, CSHL.

4. bamToBed
  * support for "blocked" BED (aka BED12) format.  This facilitates the creation of BED entries for "split" alignments (e.g. RNAseq or SV)
    -- thanks to Ann Loraine, UNCC for test data to support this addition.

5. fastaFromBed
  * added the ability to extract sequences from a FASTA file according to the strand in the BED file.  That is, when "-" the extracted sequence is reverse complemented.
    -- thanks to Thomas Doktor, U. of Southern Denmark for the code and suggestion.

6. ***NEW*** overlap
  * newly added tool for computing the overlap/distance between features on the same line.
  -- For example:
	$ cat test.out
	chr1	10	20	A	chr1	15	25	B
	chr1	10	20	C	chr1	25	35	D

	$ cat test.out | overlaps -i stdin -cols 2,3,6,7
	chr1	10	20	A	chr1	15	25	B	5
	chr1	10	20	C	chr1	25	35	D	-5

***Bug fixes***
1. Fixed a bug in pairToBed when comparing paired-end BAM alignments to BED annotations and using the "notboth" option.
2. Fixed an idiotic bug in intersectBed that occasionally caused segfaults when blank lines existed in BED files.
3. Fixed a minor bug in mergeBed when using the -nms option.

***General changes***
1. Added a proper class for genomeFiles.  The code is much cleaner and the tools are less sensitive to minor problems with the formatting of genome files.  Per Gordon Assaf's wise suggestion, the tools now support "chromInfo" files directly downloaded from UCSC.  Thanks Gordon---I disagreed at first, but you were right.
2. Cleaned up some of the code and made the API a bit more streamlined.  Will facilitate future tool development, etc.


Aaron's avatar
Aaron committed
1. Fixed an insidious bug that caused malform BAM output from intersectBed and pairToBed.  The previous BAM files worked fine with samtools as BAM input, but when piped in as SAM, there was an extra tab that thwarted conversion from SAM back to BAM.  Many thanks to Ivan Gregoretti for reporting this bug.  I had never used the BAM output in this way and thus never caught the bug!
Aaron's avatar
Aaron committed
Version 2.5.3 (Feb-19-2010)
1. Fixed bug to "re-allow" track and "browser" lines.
2. Fixed bug in reporting BEDPE overlaps.
3. Fixed bug when using type "notboth" with BAM files in pairToBed.
4. When comparing BAM files to BED/GFF annotations with intersectBed or pairToBed, the __aligned__ sequence is used, rather than the __original__ sequence.
5. Greatly increased the speed of pairToBed when using BAM alignments.
6. Fixed a bug in bamToBed when reporting edit distance from certain aligners.

Aaron's avatar
Aaron committed
Version 2.5.2 (Feb-2-2010)
1. The start and end coordinates for BED and BEDPE entries created by bamToBed are now based on the __aligned__ sequence, rather than the original sequence.  It's obvious, but I missed it originally...sorry.
2. Added an error message to mergeBed preventing one from using "-n" and "-nms" together.
3. Fixed a bug in pairToBed that caused neither -type "notispan" nor "notospan" to behave as described.

Aaron's avatar
Aaron committed
Version 2.5.1 (Jan-28-2010)
Aaron's avatar
Aaron committed
1. Fixed a bug in the new GFF/BED determinator that caused a segfault when start = 0.

Aaron's avatar
Aaron committed
Version 2.5.0 (Jan-27-2010)
Aaron's avatar
Aaron committed
1. Added support for custom BED fields after the 6th column.
2. Fixed a command line parsing bug in pairToBed.
3. Improved sanity checking.

Aaron's avatar
Aaron committed
Version 2.4.2 (Jan-23-2010)
Aaron's avatar
Aaron committed
1. Fixed a minor bug in mergeBed when -nms and -s were used together.
2. Improved the command line parsing to prevent the occasional segfault.

Aaron's avatar
Aaron committed
Version 2.4.1 (Jan-12-2010)
Aaron's avatar
Aaron committed
1. Updated BamTools libraries to remove some compilation issues on some systems/compilers.

Aaron's avatar
Aaron committed
Version 2.4.0 (Jan-11-2010)
Aaron's avatar
Aaron committed
1.  Added BAM support to intersectBed and pairToBed
2.  New bamToBed feature.
3.  Added support for GFF features
4.  Added support for "blocked" BED format (BED12)
6.  Wrote complete manual and included it in distribution.
7.  Fixed several minor bugs.
8.  Cleaned up code and improved documentation.


Aaron's avatar
Aaron committed
Version 2.3.3 (12/17/2009)
Rewrote complementBed to use a slower but much simpler approach.  This resolves several bugs with the previous logic.

Version 2.3.2 (11/25/2009)
Fixed a bug in subtractBed that prevent a file from subtracting itself when the following is used:
	$ subtractBed -a test.bed -b test.bed

Aaron's avatar
Aaron committed
Version 2.3.1 (11/19/2009)
Fixed a typo in closestBed that caused all nearby features to be returned instead of just the closest one.


Version 2.3.0 (11/18/2009)
1. Added four new tools:
	- shuffleBed. 			Randomly permutes the locations of a BED file among a genome.  Useful for testing for significant overlap enrichments.
	- slopBed.    			Adds a requested number of base pairs to each end of a BED feature.  Constrained by the size of each chromosome.
	- maskFastaFromBed. 	Masks a FASTA file based on BED coordinates.  Useful making custom genome files from targeted capture experiment, etc.
	- pairToPair.			Returns overlaps between two paired-end BED files.  This is great for finding structural variants that are private or shared among samples.
							
2. Increased the speed of intersectBed by nearly 50%.
Aaron's avatar
Aaron committed
3. Improved corrected some of the help messages.
4. Improved sanity checking for BED entries.


Aaron's avatar
Aaron committed
Version 2.2.4 (10/27/2009)
1. Updated the mergeBed documentation to describe the -names option which allows one to report the names of the
features that were merged (separated by semicolons).


Aaron's avatar
Aaron committed
Version 2.2.3 (10/23/2009)
1. Changed windowBed to optionally define "left" and "right" windows based on strand.  For example by default, -l 100 and -r 500 will
add 100 bases to the left (lower coordinates) of a feature in A when scanning for hits in B and 500 bases to the right (higher coordinates).

However if one chooses the -sw option (windows bases on strandedness), the behavior changes.  Assume the above example except that a feature in A
is on the negative strand ("-").  In this case, -l 100, -r 500 and -sw will add 100 bases to the right (higher coordinates) and 500 bases to the left (lower coordinates).

In addition, there is a separate option (-sm) that can optionally force hits in B to only be tracked if they are on the same strand as A.  

***NOTE: This replaces the previous -s option and may affect existing pipelines***.


Version 2.2.2 (10/20/2009)
1. Improved the speed of genomeCoverageBed by roughly 100 fold.  The memory usage is now less than 2.0 Gb.

Aaron's avatar
Aaron committed

Aaron's avatar
Aaron committed
Version 2.2.1
1. Fixed a very obvious bug in subtractBed that caused improper behavior when a feature in A was overlapped by more than one feature in B.
Many thanks to folks in the Hannon lab at CSHL for pointing this out.


Aaron's avatar
Aaron committed
Version 2.2.0
=== Notable changes in this release ===
1.  coverageBed will optionally only count features in BED file A (e.g. sequencing reads) that overlap with 
Aaron's avatar
Aaron committed
	the intervals/windows in BED file B on the same strand.  This has been requested several times recently 
	and facilitates CHiP-Seq and RNA-Seq experiments.
Aaron's avatar
Aaron committed

2.  intersectBed can now require a minimum __reciprocal__ overlap between intervals in BED A and BED B.  For example,
	previously, if one used -f 0.90, it required that a feature in B overlap 90% of the feature in A for the "hit"
	to be reported.  If one adds the -r (reciprocal) option, the hit must also cover 90% of the feature in B.  This helps
	to exclude overlaps between say small features in A and large features in B:

	A ==========
	B  **********************************************************
		
	-f 0.50 (Reported), whereas -f 0.50 -r (Not reported)

3.  The score field has been changed to be a string.  While this deviates from the UCSC definition, it allows one to track
	much more meaningful information about a feature/interval.  For example, score could now be:
	
	7.31E-05  (a p-value)
	0.334577  (mean enrichment)
	2:2.2:40:2 (several values encoded in a string)
	
4.  closestBed now, by default, reports __all__ intervals in B that overlap equally with an interval in A.  Previously, it
	merely reported the first such feature that appeared in B.  Here's a cartoon explaining the difference.
	
	**Prior behavior**
	
	A	 ==============
	B.1        				++++++++++++++
	B.2       				++++++++++++++
	B.3               				+++++++++

	-----------------------------------------
	Result = B.1 			++++++++++++++
	
	
	**Current behavior**
	
	A	 ==============
	B.1        				++++++++++++++
	B.2       				++++++++++++++
	B.3               				+++++++++

	-----------------------------------------
	Result = B.1 			++++++++++++++
			 B.2 			++++++++++++++

	Using the -t option, one can also choose to report either the first or the last entry in B in the event of a tie.

5.  Several other minor changes to the algorithms have been made to increase speed a bit.


Aaron's avatar
Aaron committed
VERSION 2.1.2
1. Fixed yet another bug in the parsing of "track" or "browser" lines.  Sigh...
2. Change the "score" column (i.e. column 5) to b stored as a string.  While this deviates
   from the UCSC convention, it allows significantly more information to be packed into the column.

Aaron's avatar
Aaron committed

VERSION 2.1.1
1. Added limits.h to bedFile.h to fix compilation issues on some systems.
Aaron's avatar
Aaron committed
2. Fixed bug in testing for "track" or "browser" lines.
Aaron's avatar
Aaron committed

Aaron's avatar
Aaron committed
VERSION 2.1.0
1. Fixed a bug in peIntersectBed that prevented -a from being correctly handled when passed via stdin.
2. Added new functionality to coverageBed that calculates the density of coverage.
3. Fixed bug in geneomCoverageBed.

Aaron's avatar
Aaron committed

VERSION 2.0.1
1. Added the ability to retain UCSC browser track/browser headers in BED files.

Aaron's avatar
Aaron committed

VERSION 2.0
1.  Sped up the file parsing.  ~10-20% increase in speed.
2.  Created reportBed() as a common method in the bedFile class.  Cleans up the code quite nicely.
3.  Added the ability to compare BED files accounting for strandedness.
4.  Paired-end intersect.
5.  Fixed bug that prevented overlaps from being reported when the overlap fraction requested is 1.0


Aaron's avatar
Aaron committed

VERSION 1.2, 04/27/2009. (1eb06115bdf3c49e75793f764a70c3501bb53f33)
1.  Added subtractBed.
	A. Fixed bug that prevented "split" overlaps from being reported.
	B. Prevented A from being reported if >=1 feature in B completely spans it.
2.  Added linksBed.
3.  Added the ability to define separate windows for upstream and downstream to windowBed.


VERSION 1.1, 04/23/2009. (b74eb1afddca9b70bfa90ba763d4f2981a56f432)
Initial release.