CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
1XUA_1 3LBN_1 4KQH_1 Letter Amino acid
11 4 13 N Asparagine
13 14 15 D Aspartic acid
19 11 18 I Isoleucine
9 4 17 M Methionine
19 3 22 P Proline
18 3 8 H Histidine
13 11 14 T Threonine
33 11 57 A Alanine
19 11 17 R Arginine
9 11 11 Q Glutamine
27 11 36 L Leucine
15 5 6 F Phenylalanine
2 0 1 W Tryptophan
4 9 4 Y Tyrosine
1 3 7 C Cysteine
16 13 14 E Glutamic acid
29 11 38 G Glycine
3 8 8 K Lycine
20 8 15 S Serine
18 15 35 V Valine

1XUA_1|Chains A, B|Phenazine biosynthesis protein phzF|Pseudomonas fluorescens (294)
>3LBN_1|Chain A|GTPase HRas|Homo sapiens (9606)
>4KQH_1|Chain A|Nicotinate-nucleotide--dimethylbenzimidazole phosphoribosyltransferase|Salmonella enterica subsp. enterica serovar Typhimurium (99287)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
1XUA , Knot 131 298 0.83 40 186 284
MGSSHHHHHHSSGLVPRGSHMHNYVIIDAFASVPLEGNPVAVFFDADDLPPAQMQRIAREMNLSESTFVLKPRNGGDALIRIFTPVNELPFAGHPLLGTAIALGAHTDNHRLYLETQMGTIAFELERQNGSVIAASMDQPIPTWTALGRDAELLKALGISDSTFPIEIYHNGPRHVFVGLPSIDALSALHPDHRALSNFHDMAINCFAGAGRRWRSRMFSPAYGVVEDAATGSAAGPLAIHLARHGQIEFGQPVEILQGVEIGRPSLMFAKAEGRAEQLTRVEVSGNGVTFGRGTIVL
3LBN , Knot 79 166 0.81 38 130 163
MTEYKLVVVGAGGVGKSALTIQLIQNHFVDEYDPTIEDSYRKQVVIDGETCLLDILDTAGQEEYSAMRDQYMRTGEGFLCVFAINNTKSFEDIHQYREQIKRVKDSDDVPMVLVGNKCDLAARTVESRQAQDLARSYGIPYIETSAKTRQGVEDAFYTLVREIRQH
4KQH , Knot 148 356 0.81 40 186 336
MQTLHALLRDIPAPDAEAMARAQQHIDGLLKPPGSLGRLETLAVQLAGMPGLNGTPQVGEKAVLVMCADHGVWDEGVAVSPKIVTAIQAANMTRGTTGVCVLAAQAGAKVHVIDVGIDAEPIPGVVNMRVARGCGNIAVGPAMSRLQAEALLLEVSRYTCDLAQRGVTLFGVGELGMANTTPAAAMVSVFTGSDAKEVVGIGANLPPSRIDNKVDVVRRAIAINQPNPRDGIDVLSKVGGFDLVGMTGVMLGAARCGLPVLLDGFLSYSAALAACQIAPAVRPYLIPSHFSAEKGARIALAHLSMEPYLHMAMRLGAGSGAALAMPIVEAACAMFHNMGELAASNIVLPEGNANAT

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(1XUA_1)}(2) \setminus P_{f(3LBN_1)}(2)|=120\), \(|P_{f(3LBN_1)}(2) \setminus P_{f(1XUA_1)}(2)|=64\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:1100000000001111010010001110111011101011111101001111010011001010000111010011011101101100111110111101111110000001010001101110100001011110100111010111001011011110000111010001100111111010110110100011001001110011111001000110110111001101011111110110010101101101101101101011110101010010010101011011010111
Pair \(Z_2\) Length of longest common subsequence
1XUA_1,3LBN_1 184 4
1XUA_1,4KQH_1 150 3
3LBN_1,4KQH_1 180 4

Newick tree

 
[
	3LBN_1:95.74,
	[
		1XUA_1:75,4KQH_1:75
	]:20.74
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{464 }{\log_{20} 464}-\frac{166}{\log_{20}166})=87.7\)
Status Protein1 Protein2 d d1/2
Query variables 1XUA_1 3LBN_1 113 87.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]