CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
2GBY_1 6IRG_1 5GJF_1 Letter Amino acid
2 49 15 R Arginine
3 37 12 D Aspartic acid
0 11 10 C Cysteine
8 53 22 G Glycine
17 51 18 I Isoleucine
17 75 25 L Leucine
9 38 10 F Phenylalanine
13 26 12 Y Tyrosine
6 28 10 Q Glutamine
3 28 11 M Methionine
10 64 18 S Serine
11 48 10 T Threonine
9 63 23 A Alanine
20 54 24 E Glutamic acid
9 21 11 H Histidine
3 14 8 W Tryptophan
7 64 25 V Valine
23 41 10 N Asparagine
23 46 17 K Lycine
1 36 24 P Proline

2GBY_1|Chains A, B, C[auth D], D[auth E]|HTH-type transcriptional regulator qacR|Staphylococcus aureus (1280)
>6IRG_1|Chains A, B[auth C]|Glutamate receptor ionotropic, NMDA 1|Homo sapiens (9606)
>5GJF_1|Chain A|TAK1 kinase - TAB1 chimera fusion protein|Homo sapiens (9606)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
2GBY , Knot 85 194 0.77 38 118 180
MNLKDKILGVAKELFIKNGYNATTTGEIVKLSESSKGNLYYHFKTKENLFLEILNIEESKWQEQWKKEQIKAKTNREKFYLYNELSLTTEYYYPLQNAIIEFYTEYYKTNSINEKMNKLENKYIDAYHVIFKEGNLNGEWSINDVNAVSKIAANAVNGIVTFTHEQNINERIKLMNKFSQIFLNGLSKHHHHHH
6IRG , Knot 323 847 0.85 40 322 785
MSTMRLLTLALLFSCSVARAACDPKIVNIGAVLSTRKHEQMFREAVNQANKRHGSWKIQLNATSVTHKPNAIQMALSVCEDLISSQVYAILVSHPPTPNDHFTPTPVSYTAGFYRIPVLGLTTRMSIYSDKSIHLSFLRTVPPYSHQSSVWFEMMRVYSWNHIILLVSDDHEGRAAQKRLETLLEERESKAEKVLQFDPGTKNVTALLMEAKELEARVIILSASEDDAATVYRAAAMLNMTGSGYVWLVGEREISGNALRYAPDGILGLQLINGKNESAHISDAVGVVAQAVHELLEKENITDPPRGCVGNTNIWKTGPLFKRVLMSSKYADGVTGRVEFNEDGDRKFANYSIMNLQNRKLVQVGIYNGTHVIPNDRKIIWPGGETEKPRGYQMSTRLKIVTIHQEPFVYVKPTLSDGTCKEEFTVNGDPVKKVICTGPNDTSPGSPRHTVPQCCYGFCIDLLIKLARTMNFTYEVHLVADGKFGTQERVNNSNKKEWNGMMGELLSGQADMIVAPLTINNERAQYIEFSKPFKYQGLTILVKKEIPRSTLDSFMQPFQSTLWLLVGLSVHVVAVMLYLLDRFSPFGRFKVNSEEEEEDALTLSSAMWFSWRVLLNSGIGEGAPRSFSARILGMVWAGFAMIIVASYTANLAAFLVLDRPEERITGINDPRLRNPSDKFIYATVKQSSVDIYFRRQVELSTMYRHMEKHNYESAAEAIQAVRDNKLHAFIWDSAVLEFEASQKCDLVTTGELFFRSGFGIGMRKDSPWKQNVSLSILKSHENGFMEDLDKTWVRYQECDSRSNAPATLTFENMAGVFMLVAGGIVAGIFLIFIEIAYKRHKDARRKQ
5GJF , Knot 143 315 0.87 40 205 309
GPLHMIDYKEIEVEEVVGRGAFGVVCKAKWRAKDVAIKQIESESERKAFIVELRQLSRVNHPNIVKLYGACLNPVCLVMEYAEGGSLYNVLHGAEPLPYYTAAHAMSWCLQCSQGVAYLHSMQPKALIHRDLKPPNLLLVAGGTVLKICDFGTACDIQTHMTNNKGSAAWMAPEVFEGSNYSEKCDVFSWGIILWEVITRRKPFDEIGGPAFRIMWAVHNGTRPPLIKNLPKPIESLMTRCWSKDPSQRPSMEEIVKIMTHLMRYFPGADEPLQYPCQHSLPPGEDGRVEPYVDFAEFYRLWSVDHGEQSVVTAP

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(2GBY_1)}(2) \setminus P_{f(6IRG_1)}(2)|=13\), \(|P_{f(6IRG_1)}(2) \setminus P_{f(2GBY_1)}(2)|=217\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:10100011111001110010010001011010000010100010000011101101000010001000010100000010100010100000011001110100000000010001001000010100111001010101010010110011101101110100000100010110010011101100000000
Pair \(Z_2\) Length of longest common subsequence
2GBY_1,6IRG_1 230 4
2GBY_1,5GJF_1 189 4
6IRG_1,5GJF_1 177 5

Newick tree

 
[
	2GBY_1:11.26,
	[
		5GJF_1:88.5,6IRG_1:88.5
	]:21.76
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{1041 }{\log_{20} 1041}-\frac{194}{\log_{20}194})=230.\)
Status Protein1 Protein2 d d1/2
Query variables 2GBY_1 6IRG_1 298 179.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]