CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
7PDA_1 7SCI_1 1CEV_1 Letter Amino acid
30 29 24 E Glutamic acid
45 43 33 G Glycine
14 45 14 K Lycine
18 17 15 S Serine
30 18 18 R Arginine
9 27 6 N Asparagine
8 4 0 C Cysteine
7 16 6 Q Glutamine
41 29 23 V Valine
11 10 11 H Histidine
34 36 30 L Leucine
29 30 13 T Threonine
37 34 18 D Aspartic acid
13 15 12 M Methionine
12 23 5 F Phenylalanine
33 25 14 P Proline
50 29 26 A Alanine
34 23 22 I Isoleucine
12 7 1 W Tryptophan
12 27 8 Y Tyrosine

7PDA_1|Chain A|UbiD family decarboxylase|Mycolicibacterium fortuitum (1766)
>7SCI_1|Chain A|Peptidase M60 domain-containing protein|Akkermansia muciniphila (strain ATCC BAA-835 / DSM 22959 / JCM 33894 / BCRC 81048 / CCUG 64013 / CIP 107961 / Muc) (349741)
>1CEV_1|Chains A, B, C, D, E, F|PROTEIN (ARGINASE)|Bacillus caldovelox (33931)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
7PDA , Knot 197 479 0.84 40 233 448
MAVFRDLRHYIDTLTEKLGADEVQTIKGANWDLEIGCITELSAEKEGPALLFDDIPGYPSGHRVFTNFMGTVSRCAVALGLPADTSAMDIIRAWKDLGKRIEPIPPVEVSEGAILENVLEGDDVDLEMFPTPRWHDGDGGRYIGTACMVITRDPDTGWVNVGTYRGCVQGKDRLSLWMLGNRHALAIAKKYWDRGTACPIAVVVGCDPILTTAAAIAAPSGVCEYDVAGGLRGVGVEVISAPGTGLPIPANAEIVFEGEMPPVEEESVHEGPFGEWTGYFTHAGDETVVRVQRILHRDSPIILGAPPMIPTVPAGDQAVPLYSASVTWDHLEASGVQNIKGVWAYARQLMMVISIEQTGAGDAMHALLAAAGRKRTGGVDRYFVVVDEDIDITDINHVLWALFTRVDPAESIHVLRTPTTAIDPRLSPAKREAGDMSMGIVLIDACKPFAWKDSYPRANRFDEPYRAEIRDRWKATLPL
7SCI , Knot 202 487 0.85 40 249 462
MANTPEHIGNDLKLFKDSSCTSLKPDVKNTSAFQSDAMKELATKILAGHYKPDYLYAEYRALPSPRQTGKNLRIGDGFSKYDNMTGVYLEKGRHVVLVGKTEGQEISLLLPNLMRKPAEGVQPTKDPNGWGLHKKQIPLKEGINIIDVETPANAYISYFTEDAGKAPKIPVHFVTGKANGYFDTTRGDTNKDWVRLLDQAVSPIMDARGKYIQVAYPVEFLKKFTKDRGTELINAYDKLIGIQYQLMGLDKYGKIPENRVLARVNFNYYMFRDGDGVAYLGNDGTMRMVTDPENVLKGDACWGFSHEVGHVMQMRPMTWGGMTEVSNNIFSLQAAAKTGNESRLKRQGSYDKARKEIIEGEIAYLQSKDVFNKLVPLWQLHLYFTKNGHPDFYPDVMEYLRNNAGNYGGNDTVKYQFEFVKACCDVTKTDLTDFFEKWGFFKPGKFHIGDYAQYDFNVTPEMVEETKKWIAGKGYPKPETDITELSE
1CEV , Knot 132 299 0.84 38 182 280
MKPISIIGVPMDLGQTRRGVDMGPSAMRYAGVIERLERLHYDIEDLGDIPIGKAERLHEQGDSRLRNLKAVAEANEKLAAAVDQVVQRGRFPLVLGGDHSIAIGTLAGVAKHYERLGVIWYDAHGDVNTAETSPSGNIHGMPLAASLGFGHPALTQIGGYSPKIKPEHVVLIGVRSLDEGEKKFIREKGIKIYTMHEVDRLGMTRVMEETIAYLKERTDGVHLSLDLDGLDPSDAPGVGTPVIGGLTYRESHLAMEMLAEAQIITSAEFVEVNPILDERNKTASVAVALMGSLFGEKLM

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(7PDA_1)}(2) \setminus P_{f(7SCI_1)}(2)|=73\), \(|P_{f(7SCI_1)}(2) \setminus P_{f(7PDA_1)}(2)|=89\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:11110010001001000111001001011010101101001010001111110011101010011001110100011111111000110110110011001011111010011110011010010101110101001011001101011100010011101100010101000101111100011111000100101011111110011100111111101100001111101111011011101111110101110101111000010011110101010011000110100110000111111111110111100111100101010010101100101111010011111010001110110111111100001110001111000101001001111110010110010110010011010101100011010111111010011110000101001001001010001010111
Pair \(Z_2\) Length of longest common subsequence
7PDA_1,7SCI_1 162 5
7PDA_1,1CEV_1 145 4
7SCI_1,1CEV_1 159 4

Newick tree

 
[
	7SCI_1:82.67,
	[
		7PDA_1:72.5,1CEV_1:72.5
	]:10.17
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{966 }{\log_{20} 966}-\frac{479}{\log_{20}479})=128.\)
Status Protein1 Protein2 d d1/2
Query variables 7PDA_1 7SCI_1 164 162
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]