CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
8QME_1 6BJY_1 8FIU_1 Letter Amino acid
25 0 10 N Asparagine
20 0 15 Q Glutamine
58 0 18 L Leucine
29 0 11 K Lycine
16 0 11 M Methionine
17 0 4 W Tryptophan
33 0 18 P Proline
40 0 15 V Valine
54 0 20 A Alanine
39 0 7 D Aspartic acid
13 0 4 C Cysteine
56 0 18 G Glycine
24 0 6 H Histidine
39 0 15 I Isoleucine
43 0 16 T Threonine
40 0 11 R Arginine
62 0 16 E Glutamic acid
36 0 4 F Phenylalanine
38 0 9 S Serine
29 0 4 Y Tyrosine

8QME_1|Chains A, B, C, D|Beta-xylosidase|Geobacillus stearothermophilus (1422)
>6BJY_1|Chain A[auth R]|RNA (45-MER)|Vesicular stomatitis Indiana virus (11277)
>8FIU_1|Chains A, B, C|HIV-1 capsid|Human immunodeficiency virus 1 (11676)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
8QME , Knot 276 711 0.85 40 299 659
MPTNLFFNAHHSPVGAFASFTLGFPGKSGGLDLELARPPRQNVFIGVESPHEPGLYHILPFAETAGEDESKRCDIENPDPNPQKPNILIPFAKEEIKREFCVATDTWKAGDLTFTIYSPVKAVPDPETAAEEELKLALVPAVIVEMTIDNTNGTRTRRAFFGFEGTDPYTSMRRIDDTCPQLRGVGQGRILGIASKDEGVRSALHFSMEDILTATLEENWTFGLGKVGALIADVPAGEKKTYQFAVCFYRGGCVTAGMDASYFYTRFFHNIEEVGLYALEQAEVLKEQAFRSNELIEKEWLSDDQKFMMAHAIRSYYGNTQLLEHEGKPIWVVNEGEYRMMNTFDLTVDQLFFELKMNPWTVKNVLDFYVERYSYEDRVRFPGDETEYPGGISFTHDMGVANTFSRPHYSSYELYGISGCFSHMTHEQLVNWVLCAAVYIEQTKDWAWRDRRLTILEQCLESMVRRDHPDPEKRNGVMGLDSTRTMGGAEITTYDSLDVSLGQARNNLYLTGKCWAAYVALEKLFRDVGKEELAALAREQAEKCAATIVSHVTEDGYIPAVMGEGNDSKIIPAIEGLVFPYFTNCHEALREDGRFGDYIRALRQHLQYVLREGICLFPDGGWKISSTSNNSWLSKIYLCQFIARRILGWEWDEQGKRADAAHVAWLTHPTLSIWSWSDQIIAGEISGSKYYPRGVTSILWLEEGEHHHHHH
6BJY , Knot 2 45 0.05 2 1 1
UUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUU
8FIU , Knot 107 232 0.83 40 159 224
MPIVQNLQGQMVHQCISPRTLNAWVKVVEEKAFSPEVIPMFSALSCGATPQDLNTMLNTVGGHQAAMQMLKETINEEAAEWDRLHPVHAGPIAPGQMREPRGSDIAGTTSTLQEQIGWMTHNPPIPVGEIYKRWIILGLNKIVRMYSPTSILDIRQGPKEPFRDYVDRFYKTLRAEQASQEVKNAATETLLVQNANPDCKTILKALGPGATLEEMMTACQGVGGPGHKARVL

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(8QME_1)}(2) \setminus P_{f(6BJY_1)}(2)|=299\), \(|P_{f(6BJY_1)}(2) \setminus P_{f(8QME_1)}(2)|=1\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:110011101000111111010111110011101011011000111110010011100111110011000000001001010100101111110001000101100010110101010011011101001100010111111111010100001000001111101001000100100001010111010111110000110011010100110101000101111011111101111000000111010011010111010010001100100111011001011000110000110001100000111101100001000110001011111001000110010101001110101011010011010100000000101110000011110100011110010010000001011010100100001101110111010000011100001011000100110000101000011111000001111010000010101101000101010011101110011001100011111000100011011001000101111110100001111101111101000001100010110010110001001100110111011101000000011001010011100111101000100101101111001010110100011110101000010110011110010000000
Pair \(Z_2\) Length of longest common subsequence
8QME_1,6BJY_1 300 0
8QME_1,8FIU_1 204 5
6BJY_1,8FIU_1 160 0

Newick tree

 
[
	8QME_1:14.72,
	[
		8FIU_1:80,6BJY_1:80
	]:60.72
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{756 }{\log_{20} 756}-\frac{45}{\log_{20}45})=208.\)
Status Protein1 Protein2 d d1/2
Query variables 8QME_1 6BJY_1 275 138
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]