CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
2VYE_1 1WPS_1 4JHV_1 Letter Amino acid
40 17 7 E Glutamic acid
15 2 41 P Proline
32 10 36 S Serine
31 11 38 V Valine
34 9 16 R Arginine
21 3 29 N Asparagine
35 9 27 I Isoleucine
13 3 6 M Methionine
21 8 38 T Threonine
1 2 7 W Tryptophan
42 12 45 A Alanine
27 3 40 D Aspartic acid
24 2 20 Q Glutamine
24 17 40 G Glycine
12 4 15 Y Tyrosine
2 1 5 C Cysteine
4 7 18 H Histidine
43 18 36 L Leucine
20 7 7 K Lycine
13 2 25 F Phenylalanine

2VYE_1|Chains A, B|REPLICATIVE DNA HELICASE|GEOBACILLUS KAUSTOPHILUS HTA426 (235909)
>1WPS_1|Chains A, B|Hut operon positive regulatory protein|Bacillus subtilis (1423)
>4JHV_1|Chain A|LACCASE|Coriolopsis caperata (195176)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
2VYE , Knot 180 454 0.80 40 224 427
MSELFSERIPPQSIEAEQAVLGAVFLDPTALTLASERLIPEDFYRAAHQKIFHAMLRVADKGEPVDLVTVTAELAALEQLEEVGGVSYLSELADSVPTAANVEYYARIVEEKSLLRRLIRTATSIAQDGYTREDEIDVLLDEAERKIMEVSQRKHSGAFKNIKDVLVQTYDNIEMLHNRNGDITGIPTGFTELDRMTSGFQRSDLIIVAARPSVGKTAFALNIAQNVATKTNENVAIFSLEMSAQQLVMRMLCAEGNINAQNLRTGKLTPEDWGKLTMAMGSLSNAGIYIDDTPSIRVSDIRAKCRRLKQESGLGMVVIDYLQLIQGSGRNRENRQQEVSEISRSLKALARELEVPVIALSQLSRSVEQRQDKRPMMSDLRESGSIEQDADIVAFLYRDDYYNKDSENKNIIEIIIAKQRNGPVGTVQLAFIKEYNKFVNLERRFDEAQIPPGA
1WPS , Knot 71 147 0.80 40 108 143
TLHKERRIGRLSVLLLLNEAEESTQVEELERDGWKVCLGKVGSMDAHKVIAAIETASKKSGVIQSEGYRESHALYHATMEALHGVTRGEMLLGSLLRTVGLRFAVLRGNPYESEAEGDWIAVSLYGTIGAPIKGLEHETFGVGINHI
4JHV , Knot 198 496 0.82 40 224 459
GIGPVTDLTISDGPVSPDGFTRQAILVNNQFPSPLITGNKGDRFQLNVIDNMNNHTMLKSTSIHWHGFFQHGTNWADGPAFVNQCPISPGHSFLYDFQVPDQAGTFWYHSHLSTQYCDGLRGPIVVYDPQDPHKDLYDVDDDSTVITLADWYHLAAKVGPAVPTADATLINGLGRSISTLNADLAVISVTKGKRYRFRLVSLSCDPNHTFSIDGHTMTVIEADSVNLKPQVVDSIQIFAAQRYSFVLNADQDIGNYWIRAMPNSGTRNFDGGVNSAILRYDGADPVEPTTTQTPSSQPLVESALTTLEGTAAPGSPTAGGVDLAINMAFGFAGGRFTINGASFTPPTVPVLLQILSGAQNAQDLLPTGSVYSLPANADIEISLPATAAAPGFPHPFHLHGHTFAVVRSAGSSTYNYANPVYRDVVSTGSPGDNVTIRFRTDNPGPWFLHCHIDFHLEAGFAVVMAEDIPDVAAVNPVPQAWSDLCPTYNALDPNDQ

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(2VYE_1)}(2) \setminus P_{f(1WPS_1)}(2)|=149\), \(|P_{f(1WPS_1)}(2) \setminus P_{f(2VYE_1)}(2)|=33\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:1001100011100101001111111101011011000111001001100011011101100101101101010111100100111100100110011011010001011000011001100100110010000001011100100011010000001110010011100000101100001010111011001001001100001111110101100111101100110000001111010101001110110101010100100101010011010111101001110100010101001010000100001111111001011010100000000010010001011100101111110010001000000011100100010100010111110000000000000011011110000111101011110000011010001001011111
Pair \(Z_2\) Length of longest common subsequence
2VYE_1,1WPS_1 182 4
2VYE_1,4JHV_1 154 4
1WPS_1,4JHV_1 180 3

Newick tree

 
[
	1WPS_1:94.57,
	[
		2VYE_1:77,4JHV_1:77
	]:17.57
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{601 }{\log_{20} 601}-\frac{147}{\log_{20}147})=131.\)
Status Protein1 Protein2 d d1/2
Query variables 2VYE_1 1WPS_1 165 108
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]