CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
7VEX_1 8JJN_1 1BDB_1 Letter Amino acid
33 54 34 A Alanine
22 7 11 N Asparagine
11 21 3 Q Glutamine
28 8 14 I Isoleucine
35 5 11 K Lycine
18 5 8 Y Tyrosine
19 22 25 V Valine
26 17 16 D Aspartic acid
1 6 2 C Cysteine
41 42 30 L Leucine
20 19 12 P Proline
25 10 10 T Threonine
3 5 1 W Tryptophan
31 11 15 E Glutamic acid
19 27 36 G Glycine
10 5 4 M Methionine
28 20 16 S Serine
11 29 13 R Arginine
10 11 5 H Histidine
26 4 11 F Phenylalanine

7VEX_1|Chain A|T-cell-specific guanine nucleotide triphosphate-binding protein 2|Mus musculus (10090)
>8JJN_1|Chains A, B, C|TIGR04348 family glycosyltransferase|Variovorax paradoxus (34073)
>1BDB_1|Chain A|CIS-BIPHENYL-2,3-DIHYDRODIOL-2,3-DEHYDROGENASE|Pseudomonas sp. (306)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
7VEX , Knot 176 417 0.84 40 214 394
GPMAWASSFDAFFKNFKRESKIISEYDITLIMTYIEENKLQKAVSVIEKVLRDIESAPLHIAVTGETGAGKSTFINTLRGVGHEEKGAAPTGAIETTMKRTPYPHPKLPNVTIWDLPGIGTTNFTPQNYLTEMKFGEYDFFIIISATRFKENDAQLAKAIAQMGMNFYFVRTKIDSDLDNEQKFKPKSFNKEEVLKNIKDYCSNHLQESLDSEPPVFLVSNVDISKYDFPKLETKLLQDLPAHKRHVFSLSLQSLTEATINYKRDSLKQKVFLEAMKAGALATIPLGGMISDILENLDETFNLYRSYFGLDDASLENIAQDLNMSVDDFKVHLRFPHLFAEHNDESLEDKLFKYIKHISSVTGGPVAAVTYYRMAYYLQNLFLDTAANDAIALLNSKALFEKKVGPYISEPPEYWEA
8JJN , Knot 138 328 0.81 40 174 298
MSNPSLVIVSPALPGANNGNWRTAQRWKALLSPVCSARVVQQWPDADASADTVMLALHARRSAESIAHWAHAHPGRGLGVVLTGTDLYQDIGSDPQAQRSLQLAQRLVVLQALGAEALPPECRAKARVVYQSTSARAELPKSARQLRAVMVGHLRQVKSPQTLFDAARLLCGREDIRIDHIGDAGDAGLGELARALASDCPGYRWLGALPHAQTRQRIQRAHVLVHTSALEGGAHVIMEAVRSGTPVLASRVPGNVGMLGNDYAGYFPHGDAAALAALLEACRAGQGSKDRAAGLLDSLRTQCALRAPLFDPRAEQAALFQLLNELQP
1BDB , Knot 118 277 0.79 40 165 264
MKLKGEAVLITGGASGLGRALVDRFVAEGAKVAVLDKSAERLAELETDHGDNVLGIVGDVRSLEDQKQAASRCVARFGKIDTLIPNAGIWDYSTALVDLPEESLDAAFDEVFHINVKGYIHAVKACLPALVASRGNVIFTISNAGFYPNGGGPLYTAAKHAIVGLVRELAFELAPYVRVNGVGSGGINSDLRGPSSLGMGSKAISTVPLADMLKSVLPIGRMPEVEEYTGAYVFFATRGDAAPATGALLNYDGGLGVRGFFSGAGGNDLLEQLNIHP

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(7VEX_1)}(2) \setminus P_{f(8JJN_1)}(2)|=116\), \(|P_{f(8JJN_1)}(2) \setminus P_{f(7VEX_1)}(2)|=76\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:111111001011100100000110000101110010000100110110011001001110111010011100011001011100001111011100010001010101101011011111000101000100101100011111010010000101101110111010110001000100000101001000011001000000010001000111111001010000110100011001110000110101001001010000001000111011011111011111110011001000101000011100101001100101010010101011011100000010001100100100101111111000011001001110011001111100011100011101001100101
Pair \(Z_2\) Length of longest common subsequence
7VEX_1,8JJN_1 192 3
7VEX_1,1BDB_1 145 4
8JJN_1,1BDB_1 159 4

Newick tree

 
[
	8JJN_1:92.76,
	[
		7VEX_1:72.5,1BDB_1:72.5
	]:20.26
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{745 }{\log_{20} 745}-\frac{328}{\log_{20}328})=114.\)
Status Protein1 Protein2 d d1/2
Query variables 7VEX_1 8JJN_1 150 132
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]