CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
7QCL_1 3ETB_1 8PDE_1 Letter Amino acid
13 10 7 R Arginine
24 16 4 Q Glutamine
15 10 8 I Isoleucine
9 5 3 M Methionine
31 9 6 N Asparagine
61 4 2 C Cysteine
36 7 7 E Glutamic acid
18 0 2 H Histidine
24 14 4 Y Tyrosine
8 6 0 W Tryptophan
52 12 3 V Valine
24 11 4 A Alanine
38 40 4 G Glycine
28 12 11 K Lycine
50 9 1 P Proline
38 34 4 S Serine
36 17 7 T Threonine
30 13 5 D Aspartic acid
31 16 8 L Leucine
17 7 5 F Phenylalanine

7QCL_1|Chains A, B|Mucin-2|Homo sapiens (9606)
>3ETB_1|Chains A[auth F], B[auth G], C[auth H], D[auth I]|Antibody M18 light chain and antibody M18 heavy chain linked with a synthetic (GGGGS)4 linker|Mus musculus (10090)
>8PDE_1|Chains A, B, D, E|MEF2D protein|Homo sapiens (9606)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
7QCL , Knot 231 583 0.84 40 269 543
KPPECPDFDPPRQENETWWLCDCFMATCKYNNTVEIVKVECEPPPMPTCSNGLQPVRVEDPDGCCWHWECDCYCTGWGDPHYVTFDGLYYSYQGNCTYVLVEEISPSVDNFGVYIDNYHCDPNDKVSCPRTLIVRHETQEVLIKTVHMMPMQVQVQVNRQAVALPYKKYGLEVYQSGINYVVDIPELGVLVSYNGLSFSVRLPYHRFGNNTKGQCGTCTNTTSDDCILPSGEIVSNCEAAADQWLVNDPSKPHCPHSSSTTKRPAVTVPGGGKTTPHKDCTPSPLCQLIKDSLFAQCHALVPPQHYYDACVFDSCFMPGSSLECASLQAYAALCAQQNICLDWRNHTHGACLVECPSHREYQACGPAEEPTCKSSSSQQNNTVLVEGCFCPEGTMNYAPGFDVCVKTCGCVGPDNVPREFGEHFEFDCKNCVCLEGGSGIICQPKRCSQKPVTHCVEDGTYLATEVNPADTCCNITVCKCNTSLCKEKPSVCPLGFEVKSKMVPGRCCPFYWCESKGVCVHGNAEYQPGSPVYSSKCQDCVCTDKVDNNTLLNVIACTHVPCNTSCSPGFELMEAPGECCKKC
3ETB , Knot 107 252 0.78 38 158 225
MADYKDIQMTQTTSSLSASLGDRVTVSCRASQDIRNYLNWYQQKPDGTVKFLIYYTSRLQPGVPSRFSGSGSGTDYSLTINNLEQEDIGTYFCQQGNTPPWTFGGGTKLEIKRGGGGSGGGGSGGGGSGGGGSEVQLQQSGPELVKPGASVKISCKDSGYAFNSSWMNWVKQRPGQGLEWIGRIYPGDGDSNYNGKFEGKAILTADKSSSTAYMQLSSLTSVDSAVYFCARSGLLRYAMDYWGQGTSVTVSS
8PDE , Knot 50 95 0.80 38 85 92
MGRKKIQIQRITDERNRQVTFTKRKFGLMKKAYELSVLCDCEIALIIFNHSNKLFQYASTDMDKVLLKYTEYNEPHESRTNADIIETLRKKGFNG

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(7QCL_1)}(2) \setminus P_{f(3ETB_1)}(2)|=163\), \(|P_{f(3ETB_1)}(2) \setminus P_{f(7QCL_1)}(2)|=52\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:0110010101100000011100011100000001011010001111100001101101001010010100000001110100101011000001000011100101010011101000000100010010011100000011100101111010101000111110000110100011001101101111100011010101100011000010010000000000111010110000111001110010010010000000011101111100010000010110011000111000111110000010110001111001001010101110100010101000001101100100000010111001000000000000111010101010100111101010001011100110011001010000010101101110010000001100010010011001011000001010000001000010101111010001111000110100001101010100011011000000001000010000110111000110000001110110111000000
Pair \(Z_2\) Length of longest common subsequence
7QCL_1,3ETB_1 215 4
7QCL_1,8PDE_1 238 4
3ETB_1,8PDE_1 169 2

Newick tree

 
[
	7QCL_1:12.51,
	[
		3ETB_1:84.5,8PDE_1:84.5
	]:37.01
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{835 }{\log_{20} 835}-\frac{252}{\log_{20}252})=160.\)
Status Protein1 Protein2 d d1/2
Query variables 7QCL_1 3ETB_1 207 144.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]