CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
7WSR_1 3BWZ_1 8XBQ_1 Letter Amino acid
15 6 11 H Histidine
52 14 25 I Isoleucine
38 10 13 K Lycine
44 6 16 F Phenylalanine
50 16 40 V Valine
66 15 75 A Alanine
28 4 15 M Methionine
33 7 40 P Proline
39 18 33 S Serine
17 1 5 W Tryptophan
20 3 25 R Arginine
11 1 6 C Cysteine
18 6 21 Q Glutamine
40 16 21 T Threonine
26 12 30 D Aspartic acid
18 8 24 E Glutamic acid
64 10 39 G Glycine
64 12 48 L Leucine
29 5 17 Y Tyrosine
18 11 24 N Asparagine

7WSR_1|Chains A, B|Iron-phytosiderophore transporter|Hordeum vulgare (4513)
>3BWZ_1|Chain A|Cellulosomal scaffoldin adaptor protein B|Acetivibrio cellulolyticus (35830)
>8XBQ_1|Chain A|Benzoylformate decarboxylase-K2|Pseudomonas putida (303)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
7WSR , Knot 266 690 0.84 40 276 638
MDIVAPDRTRIAPEIDRDEALEGDRESDPALASTREWQLEDMPRWQDELTVRGLVAALLIGFIYTVIVMKIALTTGLVPTLNVSAALLSFLALRGWTRLLERFGVVSRPFTRQENTIVQTCGVACYTIAFAGGFGSTLLGLNKKTYELAGDSPGNVPGSWKEPGIGWMTGFLLACSFGGLLTLIPLRQVLVVDYKLVYPSGTATAILINGFHTDQGDKNSRKQIRGFLKYFGGSFLWSFFQWFYTGGDACGFVQFPTFGLKAWKQTFYFDFSMTYVGAGMICPHIVNISTLLGAIISWGIMWPLISKNKGDWYPAKVPESSMKSLYGYKAFICIALIMGDGMYHFIKIVGITAMSMYRQFSHKQVNNKAKNADDTVSLEELHRQEIFKRGHIPSWMAYAGYALFSVLAVVTIPVMFKQVKWYYVVIAYVVAPMLGFANSYGTGLTDINMGYNYGKIALFVFAGWAGKENGVIAGLVAGTLVKQLVLISADLMQDFKTSYLTQTSPKSMMIAQVVGTAMGCIVSPLTFMLFYKAFDIGNPDGTWKAPYALIYRNMAILGVEGFSVLPKYCIVISGGFFAFAAILSITRDVMPHKYAKYVPLPMAMAVPFLVGGSFAIDMCLGSLIVFAWTKINKKEAGFMVPAVASALICGDGIWTFPASILALAKIKPPICMKFLPAATSAAHHHHHHHH
3BWZ , Knot 84 181 0.80 40 126 167
MAPTSSIEIVLDKTTASVGEIVTASINIKNITNFSGCQLNMKYDPAVLQPVTSSGVAYTKSTMPGAGTILNSDFNLRQVADNDLEKGILNFSKAYVSLDDYRTAAAPEQTGTVAVVKFKVLKEETSSISFEDTTSVPNAIDGTVLFDWNGDRIQSGYSVIQPAVINLDMIKASLEHHHHHH
8XBQ , Knot 211 528 0.83 40 240 483
MASVHGTTYELLRRQGIDTVFGNPGFNELPFLKDFPEDFRYILALQEACVVGIADGYAQASRKPAFINLHSAAGTGNAMGALSNARTSHSPLIVTAGQQTRAMIGVEAGETNVDAANLPRPLVKWSYEPASAAEVPHAMSRAIHMASMAPQGPVYLSVPYDDWDKDADPQSHHLFDRHVSSSVRLNDQDLDILVKALNSASNPAIVLGPDVDAANANADCVMLAERLKAPVWVAPSAPRCPFPTRHPCFRGLMPAGIAAISQLLEGHDVVLVIGAPVFRYYQYDPGQYLKPGTRLISVTCDPLEAARAPMGDAIVADIGAMASALANLVEESSRQLPTAAPEPAKVDQDAGRLHPETVFDTLNDMAPENAIYLNESTSTTAQMWQRLNMRNPGSYYFCAAGGLGFALPAAIGVQLAEPERQVIAVIGDGSANYSISALWTAAQYNIPTIFVIMNNGTYGMLRRFAGVLEAENVPGLDVPGIDFRALAKGYGVQALKADNLEQLKGSLQEALSAKGPVLIEVSTVSPVK

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(7WSR_1)}(2) \setminus P_{f(3BWZ_1)}(2)|=174\), \(|P_{f(3BWZ_1)}(2) \setminus P_{f(7WSR_1)}(2)|=24\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:101111000011101000011010000011110000101001101000101011111111111001111011100111101010111101111011001100111100110000001100011100011111111001111000000111001101110100111111011111001111101111001111000110101010111101100001000000010111001110111011011001101011101101110110001010101001111110101101001111110111111110000101011011000100101001110111111011001101111011010001000010001001000101001000011001011011101101110111110111110010100111101111111110001011001011000101111111111100011111111101100111101011001000010000100111101110111011011011110011011010101011011100011111101101110001110111111111101000111000100111111111111111011101011011111100100001111111110111010111011101111101011101011111001100000000
Pair \(Z_2\) Length of longest common subsequence
7WSR_1,3BWZ_1 198 6
7WSR_1,8XBQ_1 156 4
3BWZ_1,8XBQ_1 182 4

Newick tree

 
[
	3BWZ_1:10.13,
	[
		7WSR_1:78,8XBQ_1:78
	]:22.13
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{871 }{\log_{20} 871}-\frac{181}{\log_{20}181})=191.\)
Status Protein1 Protein2 d d1/2
Query variables 7WSR_1 3BWZ_1 242 150.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]