CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
3UQW_1 4SKN_1 7QTL_1 Letter Amino acid
28 0 79 E Glutamic acid
24 0 48 I Isoleucine
9 0 24 M Methionine
23 0 33 P Proline
7 0 12 W Tryptophan
6 1 17 C Cysteine
13 0 13 H Histidine
14 0 51 K Lycine
27 3 39 T Threonine
34 0 31 V Valine
34 0 49 S Serine
27 0 36 A Alanine
16 0 32 N Asparagine
23 0 37 D Aspartic acid
18 0 18 Q Glutamine
38 5 36 G Glycine
20 0 44 R Arginine
31 0 63 L Leucine
21 0 36 F Phenylalanine
20 0 19 Y Tyrosine

3UQW_1|Chain A|Beta-secretase 1|Homo sapiens (9606)
>4SKN_1|Chain A|DNA (5'-D(*TP*GP*GP*GP*(D1P)P*GP*GP*CP*TP*T)-3')|
>7QTL_1|Chain A|Polymerase acidic protein|Influenza A virus (A/Zhejiang/DTID-ZJU01/2013(H7N9)) (1318616)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
3UQW , Knot 182 433 0.85 40 238 410
MGSSHHHHHHSAGENLYFQGTLPRETDEEPEEPGRRGSFVEMVDNLRGKSGQGYYVEMTVGSPPQTLNILVDTGSSNFAVGAAPHPFLHRYYQRQLSSTYRDLRKGVYVPYTQGAWAGELGTDLVSIPHGPNVTVRANIAAITESDKFFINGSNWEGILGLAYAEIARPDDSLEPFFDSLVKQTHVPNLFSLQLCGAGFPLNQSEVLASVGGSMIIGGIDHSLYTGSLWYTPIRREWYYEVIIVRVEINGQDLKMDCKEYNYDKSIVDSGTTNLRLPKKVFEAAVKSIKAASSTEKFPDGFWLGEQLVCWQAGTTPWNIFPVISLYLMGEVTNQSFRITILPQQYLRPVEDVATSQDDCYKFAISQSSTGTVMGAVIMEGFYVVFDRARKRIGFAVSACHVHDEFRTAAVEGPFVTLDMEDCGYNIPQTDEST
4SKN , Knot 5 10 0.38 8 7 8
TGGGXGGCTT
7QTL , Knot 281 717 0.86 40 298 660
GMEDFVRQCFNPMIVELAEKAMKEYGEDPKIETNKFASICTHLEVCFMYSDFHFIDERGESTIIESGDPNVLLKHRFEIIEGRDRTMAWTVVNSICNTTGVEKPKFLPDLYDYKENRFIEIGVTRREVHIYYLEKANKIKSEKTHIHIFSFTGEEMATKADYTLDEESRARIKTRLFTIRQEMASRGLWDSFRQSERGEETIEERFEITGTMRRLADQSLPPNFSSLENFRAYVDGFEPNGCIEGKLSQMSKEVNARIEPFLRTTPRPLRLPDGPPCSQRSKFLLMDALKLSIEDPSHEGEGIPLYDAIKCMKTFFGWKEPNIIKPHEKGINPNYLLTWKQVLAELQDIENEEKIPRTKNMKKTSQLKWALGENMAPEKVDFEDCKDVNDLKQYDSDEPEPRSLACWIQSEFNKACELTDSSWVELDEIGEDVAPIEHIASMRRNYFTAEVSHCRATEYIMKGVYINTALLNASCAAMDDFQLIPMISKCRTKEGRRKTNLYGFIIKGRSHLRNDTDVVNFVSMEFSLTDPRLEPHKWEKYCVLEIGDMLLRTAVGQVSRPMFLYVRTNGTSKIKMKWGMEMRRCLLQSLQQIESMIEAESSVKEKDLTKEFFENKSETWPIGESPKGVEEGSIGKVCRTLLAKSVFNSLYASPQLEGFSAESRKLLLIVQALRDNLEPGTFDLEGLYEAIEECLINDPWVLLNASWFNSFLTHALR

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(3UQW_1)}(2) \setminus P_{f(4SKN_1)}(2)|=235\), \(|P_{f(4SKN_1)}(2) \setminus P_{f(3UQW_1)}(2)|=4\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:1100000000011001010101100000010011001011011001010010100101011011001011100100011111110111000000010000001001101100011111011001101101101010101111000001110100101111110101101000101110011000011011010101111110000111011101111110001001011001100010001111010101001010000000000110010001011001101110010110000011011111001101011001101111101011101000010101110001011001100000000111000001011111110110111001000111110100100010011101111010100010011000000
Pair \(Z_2\) Length of longest common subsequence
3UQW_1,4SKN_1 239 2
3UQW_1,7QTL_1 158 4
4SKN_1,7QTL_1 297 2

Newick tree

 
[
	4SKN_1:14.79,
	[
		3UQW_1:79,7QTL_1:79
	]:69.79
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{443 }{\log_{20} 443}-\frac{10}{\log_{20}10})=139.\)
Status Protein1 Protein2 d d1/2
Query variables 3UQW_1 4SKN_1 181 92
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]