CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
2PQI_1 4TNS_1 5LSY_1 Letter Amino acid
19 8 15 A Alanine
9 10 13 Q Glutamine
8 21 18 S Serine
9 2 16 N Asparagine
2 2 17 C Cysteine
12 12 24 E Glutamic acid
18 17 22 G Glycine
11 7 9 I Isoleucine
10 7 13 F Phenylalanine
20 5 11 T Threonine
15 5 14 D Aspartic acid
20 11 18 L Leucine
11 6 10 P Proline
5 1 11 Y Tyrosine
21 5 13 V Valine
12 11 23 R Arginine
5 10 11 H Histidine
23 6 26 K Lycine
7 4 8 M Methionine
6 1 3 W Tryptophan

2PQI_1|Chains A, B, C|Ribosome-inactivating protein 3|Zea mays (4577)
>4TNS_1|Chains A, B|Peptidyl-prolyl cis-trans isomerase NIMA-interacting 1|Homo sapiens (9606)
>5LSY_1|Chain A|Histone-lysine N-methyltransferase SETD2|Homo sapiens (9606)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
2PQI , Knot 114 243 0.86 40 180 237
MKFTEIFPVEDANYPYSAFIASVRKDVIKHCTDHKGIFQPVLPPEKKVPELWLYTELKTRTSSITLAIRMDNLYLVGFRTPGGVWWEFGKDGDTHLLGDNPRWLGFGGRYQDLIGNKGLETVTMGRAEMTRAVNDLAKKKKMLEPQADTKSKLVKLVVMVCEGLRFNTVSRTVDAGFNSQHGVTLTVTQGKQVQKWDRISKAAFEWADHPTAVIPDMQKLGIKDKNEAARIVALVKNQTTAAA
4TNS , Knot 71 151 0.78 40 109 145
MGSSHHHHHHSSGLVPRGSHMLEVLFQGPGSGGKNGQGEPARVRCSHLLVKHSQSRRPSSWRQEQITRTQEEALELINGYIQKIKSGEEDFESLASQFSDCSSAKARGDLGAFSRGQMQKPFEDASFALRTGEMSGPVFTDSGIHIILRTE
5LSY , Knot 130 295 0.83 40 195 282
MHHHHHHSSGRENLYFQGETSVPPGSALVGPSCVMDDFRDPQRWKECAKQGKMPCYFDLIEENVYLTERKKNKSHRDIKRMQCECTPLSKDERAQGEIACGEDCLNRLLMIECSSRCPNGDYCSNRRFQRKQHADVEVILTEKKGWGLRAAKDLPSNTFVLEYCGEVLDHKEFKARVKEYARNKNIHYYFMALKNDEIIDATQKGNCSRFMNHSCEPNCETQKWTVNGQLRVGFFTTKLVPSGSELTFDYQFQRYGKEAQKCFCGSANCRGYLGGENRVSIRAAGGKMKKERSRK

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(2PQI_1)}(2) \setminus P_{f(4TNS_1)}(2)|=125\), \(|P_{f(4TNS_1)}(2) \setminus P_{f(2PQI_1)}(2)|=54\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:101001111001001001111010001100000001110111110001101110001000000101110100101111001111110110010001110010111111000011100110010110101001100110000110101000001101111100110100100010111000011010100100100100100111011001011110100111000001101111100000111
Pair \(Z_2\) Length of longest common subsequence
2PQI_1,4TNS_1 179 3
2PQI_1,5LSY_1 179 3
4TNS_1,5LSY_1 182 9

Newick tree

 
[
	5LSY_1:90.50,
	[
		2PQI_1:89.5,4TNS_1:89.5
	]:1.00
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{394 }{\log_{20} 394}-\frac{151}{\log_{20}151})=72.9\)
Status Protein1 Protein2 d d1/2
Query variables 2PQI_1 4TNS_1 96 76
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]