<p>This command will create a single file containing the pairwise Jaccard measurements from all 400 tests.</p>
<pre><code>find . \
| grep jaccard \
...
...
@@ -552,7 +552,8 @@ Note that PCA was used in this case as a toy example of what PCA does for the CS
NOTE: The following example assumes that you have both the <code>gplots</code> package installed on your computer. If it are not installed, run <code>install.packages("gplots")</code> from the R prompt and respond to the prompts that will follow.-
<h1id="puzzles-to-help-teach-you-more-bedtools.">Puzzles to help teach you more bedtools.</h1>
<olstyle="list-style-type: decimal">
<li><p>Create a BED file representing all of the intervals in the genome that are NOT exonic and not Promoters (based on the promoters in the hESC file).</p></li>
<li><p>Create a BED file representing all of the intervals in the genome that are NOT exonic and are not Promoters (based on the promoters in the hESC file).</p></li>
<li><p>What is the average distance from GWAS SNPs to the closest exon? (Hint - have a look at the <ahref="http://bedtools.readthedocs.org/en/latest/content/tools/closest.html">closest</a> tool.)</p></li>
<li><p>Count how many exons occur in each 500kb interval (“window”) in the human genome. (Hint - have a look at the <code>makewindows</code> tool.)</p></li>
<li><p>Are there any exons that are completely overlapped by an enhancer? If so, how many?</p></li>
<li><p>What fraction of the GWAS SNPs are exonic?</p></li>
<li><p>What fraction of the GWAS SNPs are exonic? Hint: should you worry about double counting?</p></li>
<li><p>What fraction of the GWAS SNPs are lie in either enhancers or promoters in the hESC data we have?</p></li>
<li><p>Create intervals representing the canonical 2bp splice sites on either side of each exon (don’t worry about excluding splice sites at the first or last exon). (Hint - have a look at the <ahref="http://bedtools.readthedocs.org/en/latest/content/tools/flank.html">flank</a> tool.)</p></li>
<li><p>What is the Jaccard statistic between CpG and hESC enhancers? Compare that to the Jaccard statistic between CpG and hESC promoters. Does the result make sense? (Hint - you will need <code>grep</code>).</p></li>
@@ -691,7 +692,7 @@ Puzzles to help teach you more bedtools.
1. Create a BED file representing all of the intervals in the genome
that are NOT exonic and not Promoters (based on the promoters in the hESC file).
that are NOT exonic and are not Promoters (based on the promoters in the hESC file).
2. What is the average distance from GWAS SNPs to the closest exon? (Hint - have a look at the [closest](http://bedtools.readthedocs.org/en/latest/content/tools/closest.html) tool.)
...
...
@@ -699,9 +700,9 @@ that are NOT exonic and not Promoters (based on the promoters in the hESC file).
4. Are there any exons that are completely overlapped by an enhancer? If so, how many?
5. What fraction of the GWAS SNPs are exonic?
5. What fraction of the GWAS SNPs are exonic? Hint: should you worry about double counting?
6. What fraction of the GWAS SNPs are lie in either enhancers or promoters in the hESC data we have?
6. What fraction of the GWAS SNPs are lie in either enhancers or promoters in the hESC data we have?
7. Create intervals representing the canonical 2bp splice sites on either side of each exon (don't worry about excluding splice sites at the first or last exon). (Hint - have a look at the [flank](http://bedtools.readthedocs.org/en/latest/content/tools/flank.html) tool.)