From: Charles Plessy Date: Tue, 4 Jan 2022 08:14:25 +0000 (+0900) Subject: Papa in the air, wohohowoooowooo X-Git-Url: https://source.charles.plessy.org/?a=commitdiff_plain;h=5e61151eacccbc08b1d50d5ea631fd49b355577e;p=source.git Papa in the air, wohohowoooowooo --- diff --git a/biblio/14500911.mdwn b/biblio/14500911.mdwn new file mode 100644 index 00000000..b0985a4d --- /dev/null +++ b/biblio/14500911.mdwn @@ -0,0 +1,18 @@ +[[!meta title="Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes."]] +[[!tag software method synteny genome alignment variants]] + +Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D. + +Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9. doi:10.1073/pnas.1932072100 + +Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. + +[[!pmid 14500911 desc="Primary paper for chains and nets, built with the BLASTZ and AXTCHAIN programs. Chains are one-to-many alignments and allow skipping over local inversions. In human/mouse comparisons, 2.0 inversion per Mbp, median length 814. Double gaps ≥ 100 per Mbp: 398.6, median length 411. Chains are called “short” when their span is <100,000 bases (span distribution of short chains apparently bimodal). 579 “long” chains (average length 983 kb) cover 32.9% of the bases in the human genome. Collectively all chains span 96.3% of the human genome and align to 34.6% of it. The authors note that the observed distribution of gap lengths violate the usual affine model of aligners."]] + +“A chained alignment [is] an ordered sequence of traditional pairwise nucleotide alignments (“blocks”) separated by larger gaps, some of which may be simultaneous gaps in both species. [...] intervening DNA in one species that does not align with the other because it is locally inverted or has been inserted in by lineage-specific translocation or duplication is skipped” + +“The chains are then put into a list sorted with the highest-scoring chain first. [...] each iteration taking the next chain off of the list, throwing out the parts of the chain that intersect with bases already covered by previously taken chains, and then marking the bases that are left in the chain as covered. [...] If a chain covers bases that are in a gap in a previously taken chain, it is marked as a child of the previous chain. In this way, a hierarchy of chains is formed that we call a net.” + +“To be considered syntenic, a chain has to either have a very high score itself or be embedded in a larger chain, on the same chromosome, and come from the same region as the larger chain. Thus, inversions and tandem duplications are considered syntenic.” + +“We define the (human) span of a chain to be the distance in bases in the human genome from the first to the last human base in the chain, including gaps, and we define the size of the chain as the number of aligning bases in it, not including gaps.” diff --git a/tags/assembly.mdwn b/tags/assembly.mdwn index 24b4f29f..bafbfec1 100644 --- a/tags/assembly.mdwn +++ b/tags/assembly.mdwn @@ -92,4 +92,7 @@ by [[Hoff and Stanke, 2018|biblio/30466165]]. A reference assembly can be used to search for structural variants in a different individual, for instance with NanoSV ([[Cretu Stancu and coll., 2017|biblio/29109544]]). +In [[2003, Kent and coll.|biblio/14500911]] aligned the human and mouse genome +together using the BLASTZ and AXTCHAIN software. + [[!inline pages="tagged(assembly)" actions="no" limit=0]] diff --git a/tags/synteny.mdwn b/tags/synteny.mdwn index bc987514..a44e3cd9 100644 --- a/tags/synteny.mdwn +++ b/tags/synteny.mdwn @@ -69,4 +69,7 @@ phenomenon “mesosynteny”. orthologue co-occurs close by in the other genome. It varies between 0 (no co-occurrence) and 1 (complete gene order conservation)”. + - “Chains” and “nets” of pairwise alignements between two genomes are described + in [[Kent and coll, 2003|biblio/14500911]]. + [[!inline pages="tagged(synteny)" limit=0]] diff --git a/tags/variants.mdwn b/tags/variants.mdwn index 75e312bd..5562a25e 100644 --- a/tags/variants.mdwn +++ b/tags/variants.mdwn @@ -38,6 +38,9 @@ supported by [[Steinberg and coll in 2012|biblio/22751100]]. Ectopic recombination of a Galileo element may have caused a recent large-scale inversion in _D. buzzati_ ([[Delprat and coll, 2009|biblio/19936241]]). +[[Kent and coll., 2003|biblio/14500911]] reported 2 inversions per Mbp in +human/mouse comparisons, median length 814. + ### Software - _NanoSV_ ([[Cretu Stancu and coll., 2017|biblio/29109544]]) uses nanopore long