CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
5JOW_1 6LVL_1 2DUH_1 Letter Amino acid
23 13 14 N Asparagine
5 6 2 C Cysteine
26 13 12 I Isoleucine
39 12 10 S Serine
30 16 16 T Threonine
30 13 11 Y Tyrosine
51 20 22 G Glycine
20 9 13 F Phenylalanine
31 17 10 P Proline
19 16 7 R Arginine
15 8 7 Q Glutamine
22 6 9 H Histidine
37 31 19 L Leucine
32 23 20 K Lycine
23 19 8 A Alanine
31 21 18 D Aspartic acid
28 26 16 E Glutamic acid
11 17 6 M Methionine
11 5 1 W Tryptophan
32 22 17 V Valine

5JOW_1|Chains A, B|Non-reducing end alpha-L-arabinofuranosidase BoGH43A|Bacteroides ovatus (411476)
>6LVL_1|Chains A, B|Fibroblast growth factor receptor 2|Homo sapiens (9606)
>2DUH_1|Chain A|Green fluorescent protein|Aequorea victoria (6100)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
5JOW , Knot 209 516 0.84 40 263 491
MGSSHHHHHHQGYSNPVIPGFHPDPSVCKAGDDYYLVNSSFQYFPGVPLFHSKDLVHWEQIGNCLTRPSQLDLTNANSGSGIFAPTIRYNDGVFYMITTNVSGKGNFLVHTTDPRSEWSEPVWLEQGGIDPSLYFEDGKCFMVSNPDGYINLCEIDPMTGKQLSSSKRIWNGTGGRYAEGPHIYKKDGWYYLLISEGGTELGHKVTIARSRYIDGPYQGNPANPILTHANESGQSSPIQGTGHADLVEGTDGSWWMVCLAYRIMPGTHHTLGRETYLAPVRWDKDAWPVVNSNGTISLKMDVPTLPQQEMKGRPERIDFKEGKLSPEWIHLQNPEAKNYIFTKDGKLRLIATPVTLSDWKSPTFVALRQEHFDMEASAPVVLQKAGVNDEAGISVFMEFHSHYDLFVRQDKDRKRSVGLRYKLGEITHYAKEVSLPTDGEVELVVKSDINYYYFGYKVNGIYHDLGKMNTRYLSTETAGGFTGVVLGLYITSASKDSKAYADFEYFKYKGKPGENK
6LVL , Knot 136 313 0.83 40 196 295
GSHMLAGVSEYELPEDPKWEFPRDKLTLGKPLGEGCFGQVVMAEAVGIDKDKPKEAVTVAVKMLKDDATEKDLSDLVSEMEMMKMIGKHKNIINLLGACTQDGPLYVIVEYASKGNLREYLRARRPPGMEYSYDINRVPEEQMTFKDLVSCTYQLARGMEYLASQKCIHRDLAARNVLVTENNVMKIADFGLARDINNIDYYKKTTNGRLPVKWMAPEALFDRVYTHQSDVWSFGVLMWEIFTLGGSPYPGIPVEELFKLLKEGHRMDKPANCTNELYMMMRDCWHAVPSQRPTFKQLVEDLDRILTLTTNEE
2DUH , Knot 110 238 0.84 40 161 225
MSKGEELFTGVVPILVELDGDVNGHKFSVSGEGEGDATYGKLTLKFICTTGKLPVPWPTLVTTFTYGVQCFSRYPDHMKRHDFFKSAMPEGYVQERTIFFKDDGNYKTRAEVKFEGDTLVNRIELKGIDFKEDGNILGHKLEYNYNSNNVYIMADKQKNGIKVNFKIRHNIEDGSVQLADHYQQNTPIGDGPVLLPDNHYLSTQSALSKDPNEKRDHMVLLEFVTAAGITHGMDELYK

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(5JOW_1)}(2) \setminus P_{f(6LVL_1)}(2)|=119\), \(|P_{f(6LVL_1)}(2) \setminus P_{f(5JOW_1)}(2)|=52\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:110000000001000111111010101001100001100010011111110000110100110010010010100100101111101000011101100010101011100001000100111100111010101001001110010101010010110100100000110101100101101000011001110011001100101100001011001011011100100010001101010101101001011110110011110000110000111101000111110001010101011011000101010010100101010110100101000110001010111011010010010111100001010101111100111000111011101000001110000000001110001101000100101100101011100010000110010110001101000010000111101111110100100000101010010001011000
Pair \(Z_2\) Length of longest common subsequence
5JOW_1,6LVL_1 171 4
5JOW_1,2DUH_1 182 5
6LVL_1,2DUH_1 165 4

Newick tree

 
[
	5JOW_1:90.14,
	[
		6LVL_1:82.5,2DUH_1:82.5
	]:7.64
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{829 }{\log_{20} 829}-\frac{313}{\log_{20}313})=140.\)
Status Protein1 Protein2 d d1/2
Query variables 5JOW_1 6LVL_1 178 141
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]