CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
7LQX_1 4UPM_1 4ZBP_1 Letter Amino acid
48 20 21 A Alanine
18 18 12 N Asparagine
34 36 22 L Leucine
10 27 22 K Lycine
5 11 4 C Cysteine
12 18 11 Q Glutamine
26 23 12 P Proline
10 16 9 Y Tyrosine
25 26 22 S Serine
15 13 7 W Tryptophan
30 24 23 V Valine
34 22 10 R Arginine
31 26 14 D Aspartic acid
27 25 30 E Glutamic acid
25 14 14 H Histidine
20 19 13 F Phenylalanine
35 25 19 G Glycine
11 23 18 I Isoleucine
15 11 9 M Methionine
20 25 18 T Threonine

7LQX_1|Chain A|Glycosyl hydrolase BlGH5_18|Bifidobacterium longum subsp. infantis (strain ATCC 15697 / DSM 20088 / JCM 1222 / NCTC 11817 / S12) (391904)
>4UPM_1|Chains A, B|NEURONAL NITRIC OXIDE SYNTHASE|RATTUS NORVEGICUS (10116)
>4ZBP_1|Chains A, B, C|Nudix hydrolase 7|Arabidopsis thaliana (3702)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
7LQX , Knot 187 451 0.84 40 239 428
MGSSHHHHHHSSGLVPRGSHMMRMRFGVNYTPSHGWFHFWLDPDWPSVKEDMRRIRNLGMDHVRVFPVWPYLQPNRTWINRKAIADVRRMVHIAGEQGMDAYVDVFQGHLSSFDFLPSWLVTWHRGNMFEDADAVKAEKTLVAELYGELAQEPAFRGLTLGNELNQFSDRPHPAKMATSSRRIDAWLADLLAVVDRRKHVALHSENDGVWYLDHHPFTPVQAANLGDMTTIHSWVFNGTAQGYGAMSGECTAHALYLAELSRAFARNPDRPVWLQEVGAPQNVLEAEQTPEFCRDTIAKAAQCPNLWGVTWWCSHDVDSRMSDFPPFEHALGLFDEHGNIKPIGRAFAEMAQEYRDKPAAGGNDAAVVIEVDENGNPLNRGACGPGGSIFERWMRLHAEGARPTLVTSATARDGEALRRLGVTRLETDDEPHGAKYYTAVSDSSFAELDAR
4UPM , Knot 186 422 0.88 40 262 411
CPRFLKVKNWETDVVLTDTLHLKSTLETGCTEHICMGSIMLPSQHTRKPEDVRTKDQLFPLAKEFLDQYYSSIKRFGSKAHMDRLEEVNKEIESTSTYQLKDTELIYGAKHAWRNASRCVGRIQWSKLQVFDARDCTTAHGMFNYICNHVKYATNKGNLRSAITIFPQRTDGKHDFRVWNSQLIRYAGYKQPDGSTLGDPANVQFTEICIQQGWKAPRGRFDVLPLLLQANGNDPELFQIPPELVLEVPIRHPKFDWFKDLGLKWYGLPAVSNMLLEIGGLEFSACPFSGWYMGTEIGVRDYCDNSRYNILEEVAKKMDLDMRKTSSLWKDQALVEINIAVLYSFQSDKVTIVDHHSATESFIKHMENEYRCRGGCPADWVWIVPPMSGSITPVFHQEMLNYRLTPSFEYQPDPWNTHVWKG
4ZBP , Knot 138 310 0.85 40 195 293
MGSHHHHHHHHGSDYDIPTTENLYFQGSMGTRAQQIPLLEGETDNYDGVTVTMVEPMDSEVFTESLRASLSHWREEGKKGIWIKLPLGLANLVEAAVSEGFRYHHAEPEYLMLVSWISETPDTIPANASHVVGAGALVINKNTKEVLVVQERSGFFKDKNVWKLPTGVINEGEDIWTGVAREVEEETGIIADFVEVLAFRQSHKAILKKKTDMFFLCVLSPRSYDITEQKSEILQAKWMPIQEYVDQPWNKKNEMFKFMANICQKKCEEEYLGFAIVPTTTSSGKESFIYCNADHAKRLKVSRDQASASL

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(7LQX_1)}(2) \setminus P_{f(4UPM_1)}(2)|=66\), \(|P_{f(4UPM_1)}(2) \setminus P_{f(7LQX_1)}(2)|=89\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:1100000000001111010011010111000100111011101011010001001001110010111111010100011000111010011011100110101011010100101110111010010110010110100011101010110011101101100100100010110110000010111101111100000111000001110100011011011011010010011101010101110100010110110100111001001111001111001101000101000011011001011110110000100010011110011111000101011101110110000001111100111110100010110011011110110011010101101011001010010110011100100000101100001100001101010
Pair \(Z_2\) Length of longest common subsequence
7LQX_1,4UPM_1 155 3
7LQX_1,4ZBP_1 202 7
4UPM_1,4ZBP_1 173 4

Newick tree

 
[
	4ZBP_1:98.92,
	[
		7LQX_1:77.5,4UPM_1:77.5
	]:21.42
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{873 }{\log_{20} 873}-\frac{422}{\log_{20}422})=120.\)
Status Protein1 Protein2 d d1/2
Query variables 7LQX_1 4UPM_1 154 151
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]