CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
9ITS_1 5ZUB_1 6NZT_1 Letter Amino acid
31 6 11 Q Glutamine
48 22 19 V Valine
2 3 4 C Cysteine
33 6 7 E Glutamic acid
48 17 13 L Leucine
19 13 8 K Lycine
8 2 4 M Methionine
30 6 19 T Threonine
34 3 14 R Arginine
29 7 7 D Aspartic acid
45 14 23 G Glycine
41 10 9 I Isoleucine
11 3 3 F Phenylalanine
26 6 11 P Proline
23 13 21 S Serine
1 0 2 W Tryptophan
56 18 15 A Alanine
16 7 4 N Asparagine
3 6 4 H Histidine
18 6 5 Y Tyrosine

9ITS_1|Chains A, B, C|ATP synthase subunit alpha|Chloroflexus aurantiacus J-10-fl (324602)
>5ZUB_1|Chain A|ORF1a|Middle East respiratory syndrome-related coronavirus (1335626)
>6NZT_1|Chain A|HCV NS3/4A protease|Hepatitis C virus (3052230)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
9ITS , Knot 208 522 0.83 40 224 481
MTTITEELIARLKQGITSGVDLQPRQVNVGTVIAVGDGVARLSGLDQVVASEIVEFPPKAGRNESIYGIALNLEQDSVAAIILGDDETIEEGDMVTSTGRVISVPVGQGLLGRVVNPLGQPIDGKGPIVYEKTRPIERIAPGVITRKSVDTPVQTGIIAIDALIPIGRGQRELIIGDRQTGKTAVAIDTILNQKGQGMVCIYVAIGQRRAQVAQVVGTLERFGAMEYTIVVSATASESAALQYIAPYAGCAMGEEIMENGVMLNGQLVKDALIVYDDLSKHAVAYRQVSLLLRRPPGREAYPGDVFYLHSRLLERAARLNEEYGGGSLTALPVIETQANDVSAYIPTNVISITDGQIYLESDLFNAGQRPALNVGISVSRVGGAAQTRAMRAVAGKLKGELAQFRDLAAFAQFASDLDATTKAQIERGQRLQELLKQPQYQPLPVEDQVAVLYAATNNYLDDVPVPLITKWRDDFLAFLRTAHPEVRKLIYDNRLDRKFPTPEVKEALEAAIKEFKATSNYS
5ZUB , Knot 79 168 0.80 38 116 158
GSHMPLSNFEHKVITECVTIVLGDAIQVAKCYGESVLVNAANTHLKHGGGIAGAINAASKGAVQKESDEYILAKGPLQVGDSVLLQGHSLAKNILHVVGPDARAKQDVSLLSKCYKAMNAYPLVVTPLVSAGIFGVKPAVSFDYLIREAKTRVLVVVNSQDVYKSLTI
6NZT , Knot 95 203 0.83 40 142 196
GSHMASMKKKGSVVIVGRINLSGDTAYAQQTRGEEGCQETSQTGRDKNQVEGEVQIVSTATQTFLATSINGVLWTVYHGAGTRTIASPKGPVTQMYTNVDKDLVGWQAPQGSRSLTPCTCGSSDLYLVTRHADVIPVRRRGDSRGSLLSPRPISYLKGSSGGPLLCPAGHAVGIFRAAVSTRGVAKAVQFIPVESLETTMRSP

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(9ITS_1)}(2) \setminus P_{f(5ZUB_1)}(2)|=142\), \(|P_{f(5ZUB_1)}(2) \setminus P_{f(9ITS_1)}(2)|=34\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:100100011101001100110101001011011111011101011001110011011101100001011110100001111111000010010110001011011110111101101110110101111000001100111111000010011001111101111110100011110000100111100110001011101011110001011011101001111000111010100011100111011011100110011110101100111100010001110001011100111001011011010001100110100001110101111100010010101100110100101010001101100111011101001111100011011110101011010011111011001010001010010010011001000111100011110110000100111111001000111110010101001100001000110101001101110010100000
Pair \(Z_2\) Length of longest common subsequence
9ITS_1,5ZUB_1 176 4
9ITS_1,6NZT_1 158 3
5ZUB_1,6NZT_1 146 4

Newick tree

 
[
	9ITS_1:86.87,
	[
		6NZT_1:73,5ZUB_1:73
	]:13.87
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{690 }{\log_{20} 690}-\frac{168}{\log_{20}168})=148.\)
Status Protein1 Protein2 d d1/2
Query variables 9ITS_1 5ZUB_1 184 120.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]