CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
3PZW_1 8WGJ_1 2VJU_1 Letter Amino acid
52 0 14 V Valine
53 1 6 G Glycine
61 2 11 S Serine
46 0 7 T Threonine
14 2 3 W Tryptophan
40 2 10 N Asparagine
4 6 3 C Cysteine
27 0 8 Q Glutamine
56 2 9 E Glutamic acid
86 3 12 L Leucine
30 0 4 F Phenylalanine
51 3 5 P Proline
37 1 8 R Arginine
46 0 5 D Aspartic acid
25 0 5 H Histidine
54 0 12 I Isoleucine
56 1 9 A Alanine
49 0 18 K Lycine
15 0 4 M Methionine
37 1 6 Y Tyrosine

3PZW_1|Chain A|Seed lipoxygenase-1|Glycine max (3847)
>8WGJ_1|Chain A|T2|Phage #D (77920)
>2VJU_1|Chains A, B|TRANSPOSASE ORFA|HELICOBACTER PYLORI (210)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
3PZW , Knot 319 839 0.85 40 305 778
MFSAGHKIKGTVVLMPKNELEVNPDGSAVDNLNAFLGRSVSLQLISATKADAHGKGKVGKDTFLEGINTSLPTLGAGESAFNIHFEWDGSMGIPGAFYIKNYMQVEFFLKSLTLEAISNQGTIRFVCNSWVYNTKLYKSVRIFFANHTYVPSETPAPLVEYREEELKSLRGNGTGERKEYDRIYDYDVYNDLGNPDKSEKLARPVLGGSSTFPYPRRGRTGRGPTVTDPNTEKQGEVFYVPRDENLGHLKSKDALEIGTKSLSQIVQPAFESAFDLKSTPIEFHSFQDVHDLYEGGIKLPRDVISTIIPLPVIKELYRTDGQHILKFPQPHVVQVSQSAWMTDEEFAREMIAGVNPCVIRGLEEFPPKSNLDPAIYGDQSSKITADSLDLDGYTMDEALGSRRLFMLDYHDIFMPYVRQINQLNSAKTYATRTILFLREDGTLKPVAIELSLPHSAGDLSAAVSQVVLPAKEGVESTIWLLAKAYVIVNDSCYHQLMSHWLNTHAAMEPFVIATHRHLSVLHPIYKLLTPHYRNNMNINALARQSLINANGIIETTFLPSKYSVEMSSAVYKNWVFTDQALPADLIKRGVAIKDPSTPHGVRLLIEDYPYAADGLEIWAAIKTWVQEYVPLYYARDDDVKNDSELQHWWKEAVEKGHGDLKDKPWWPKLQTLEDLVEVCLIIIWIASALHAAVNFGQYPYGGLIMNRPTASRRLLPEKGTPEYEEMINNHEKAYLRTITSKLPTLISLSVIEILSTHASDEVYLGQRDNPHWTSDSKALQAFQKFGNKLKEIEEKLVRRNNDPSLQGNRLGPVQLPYTLLYPSSEEGLTFRGIPNSISI
8WGJ , Knot 15 24 0.66 22 22 22
GCCPPCARSPNNECSEWLLLWCYC
2VJU , Knot 77 159 0.81 40 125 155
GSAMASNAVLYKSNHNVVYSCKYHIVWCPKYRRKVLVGAVEMRLKEIIQEVAKELRVEIIEMQTDKDHIHILADIDPSFGVMKFIKTAKGRSSRILRQEFNHLKTKLPTLWTNSCFISTVGGAPLNVVKQYIENQQNSNRPKQKEKWKSYVDNLQTKAL

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(3PZW_1)}(2) \setminus P_{f(8WGJ_1)}(2)|=293\), \(|P_{f(8WGJ_1)}(2) \setminus P_{f(3PZW_1)}(2)|=10\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:11011001010111110001010101011001011110010101101001010101011000110110001101111001101010101011111110100010101110010101100010101100011000010001011110000110001111100000010010101010000000100001000110100000110111110001101001001011010010000010110110000110100001101100010011011100110100011010010010010011101100110011111110010000100110110101101000111000011001111101011011001110001011101000001010010101001001110001111000011110100100100100010001111000101011110101100110101110011111001100011111010111000000011001100011101111100001011011001101000001010111000110101110001110000101001100011100011110110011110010010110111000101101101111100110001110010000100000100110011001010100011110100100110101111111011011101100101111100101000111001010000110000010100100011011010110110001000101100001010000011011001100100100011000001010100111101100110100001101011100101
Pair \(Z_2\) Length of longest common subsequence
3PZW_1,8WGJ_1 303 3
3PZW_1,2VJU_1 224 5
8WGJ_1,2VJU_1 139 2

Newick tree

 
[
	3PZW_1:14.50,
	[
		2VJU_1:69.5,8WGJ_1:69.5
	]:79.00
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{863 }{\log_{20} 863}-\frac{24}{\log_{20}24})=244.\)
Status Protein1 Protein2 d d1/2
Query variables 3PZW_1 8WGJ_1 312 160.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]