CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
8ORS_1 7NQG_1 6PRK_1 Letter Amino acid
20 14 18 E Glutamic acid
39 26 7 D Aspartic acid
18 12 8 Q Glutamine
44 11 9 I Isoleucine
28 13 3 F Phenylalanine
48 21 1 P Proline
43 16 8 S Serine
44 18 3 T Threonine
59 8 4 N Asparagine
13 14 2 H Histidine
29 31 9 K Lycine
15 8 4 M Methionine
11 2 1 C Cysteine
59 23 3 G Glycine
53 25 8 L Leucine
7 9 0 W Tryptophan
34 11 6 Y Tyrosine
57 24 8 V Valine
50 33 8 A Alanine
31 8 11 R Arginine

8ORS_1|Chains A, B|Putative GMC-type oxidoreductase|Mimivirus reunion (2813486)
>7NQG_1|Chain A[auth AAA]|TrapT family, dctP subunit, C4-dicarboxylate periplasmic binding protein|Rhodopseudomonas palustris (strain ATCC BAA-98 / CGA009) (258594)
>6PRK_1|Chain A|RicF|Bacillus subtilis (strain 168) (224308)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
8ORS , Knot 273 702 0.85 40 285 648
MKNRECCKCYNPCEKICVNYSTTDVAFERPNPCKPTPCKPTPIPCDPCHNTKDNLTGDIVIIGAGAAGSLLAHYLARFSNMKIILLEAGHSHFNDPVVTDPMGFFGKYNPPNENISMSQNPSYSWQGAQEPNTGAYGNRPIIAHGMGFGGSTMINRLNLVVGGRTVFDNDWPVGWKYDDVKNYFRRVLVDINPVRDNTKASITSVALDALRIIAEQQIASGEPVDFLLNKATGNVPNVEKTTPDAVPLNLNDYEGVNSVVAFSSFYMGVNQLSDGNYIRKYAGNTYLNRNYVDENGRGIGKFSGLRVVSDAVVDRIIFKGNRAVGVNYIDREGIMHYVKVNKEVVVTSGAFYTPTILQRSGIGDFTYLSSIGVKNLVYNNPLVGTGLKNHYSPVTITRVHGEPSEVSRFLSNMAANPTNMGFKGLAELGFHRLDPNKPANANTVTYRKYQLMMTAGVGIPAEQQYLSGLSPSSNNLFTLIADDIRFAPEGYIKIGTPNIPRDVPKIFFNTFVTYTPTSAPADQQWPIAQKTLAPLISALLGYDIIYQTLISMNQTARDSGFQVSLEMVYPLNDLIYKLHNGLATYGANWWHYFVPTLVGDDTPAGREFADTLSKLSYYPRVGAHLDSHQGCSCSIGRTVDSNLKVIGTQNVRVADLSAAAFPPGGNTWATASMIGARAVDLILGFPYLRDLPVNDVPILNVN
7NQG , Knot 143 327 0.84 40 198 312
MAMDQDKTVNWKVSLWVPPAHPLVPATKAWAEDIQKASGGSIRMTVFPSEQLGKAFDHYDMARDGIADVTYVNPGYQPGRFPIVSAGQLPFVFKDGKKGTLALNEWYHKYAPTEMKDTKLCFAFIHDPGALHGKKKVLLPSDLSGLKVRPAQSTIGEMVKLFGGTNVQASAPESRDALERGVADEITFPWGSVFLFGIDKVVKYHMDVPLYTTVFTYNIGLKAYNALSDAQKKIIDDHCTPEWASKVTDPWTDFEANGRVKMKALQDHEVYPLTDAQLAEWKKATKPLRDSWAEQVKKSGGDPAAVESDLQNALKKYDAGLHHHHHH
6PRK , Knot 59 121 0.78 38 90 113
MYATMESVRLQSEAQQLAEMILQSETAENYRNCYKRLQEDEEAGRIIRSFIKIKEQYEDVQRFGKYHPDYREISRKMREIKRELDLNDKVADFKRAENELQSILDEVSVEIGTAVSEHVKV

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(8ORS_1)}(2) \setminus P_{f(7NQG_1)}(2)|=130\), \(|P_{f(7NQG_1)}(2) \setminus P_{f(8ORS_1)}(2)|=43\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:100000000001000101000000111001010010100101110010000000101011111111110111001101001011110110001001110011111100011000101000100010110010011010011110111111001100101111100110001111100001000100111010110000010100111011011100011010110111001010110100001011110100001100111100101110010010010001100010000100010111010110110011100111010011110010001110010100011100111001011000111010010011100110001111011000001101001010100100110011101001110111011100101001101001000000111011111110000101101000011011100101110101011010110011011100110001001110001111000111110111100110001101000100011010101101100110010011100110110011101110001110011001001000101110100001000011001000101110001011010111111110011010111101101111110100111001111010
Pair \(Z_2\) Length of longest common subsequence
8ORS_1,7NQG_1 173 4
8ORS_1,6PRK_1 233 4
7NQG_1,6PRK_1 186 4

Newick tree

 
[
	6PRK_1:11.99,
	[
		8ORS_1:86.5,7NQG_1:86.5
	]:24.49
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{1029 }{\log_{20} 1029}-\frac{327}{\log_{20}327})=187.\)
Status Protein1 Protein2 d d1/2
Query variables 8ORS_1 7NQG_1 238 172
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]