CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
6ILW_1 2LDI_1 3SPG_1 Letter Amino acid
7 6 17 K Lycine
5 0 5 W Tryptophan
19 2 12 N Asparagine
4 2 9 C Cysteine
24 9 18 G Glycine
12 4 26 I Isoleucine
11 9 35 L Leucine
8 2 12 M Methionine
17 4 12 P Proline
8 2 8 Y Tyrosine
13 6 19 R Arginine
9 1 20 D Aspartic acid
38 9 18 S Serine
15 7 28 V Valine
30 11 21 A Alanine
9 6 12 Q Glutamine
9 1 24 F Phenylalanine
20 8 17 T Threonine
4 10 22 E Glutamic acid
8 7 8 H Histidine

6ILW_1|Chain A|Poly(ethylene terephthalate) hydrolase|Ideonella sakaiensis (strain 201-F6) (1547922)
>2LDI_1|Chain A|Zinc-transporting ATPase|Synechocystis sp. PCC 6803 substr. Kazusa (1111708)
>3SPG_1|Chain A|Inward-rectifier K+ channel Kir2.2|Gallus gallus (9031)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
6ILW , Knot 117 270 0.80 40 161 254
MQTNPYARGPNPTAASLEASAGPFTVRSFTVSRPSGYGAGTVYYPTNAGGTVGAIAIVPGYTARQSSIKWWGPRLASHGFVVITIDTNSTLDQPSSRSSQQMAALRQVASLNGTSSSPIYGKVDTARMGVMGWSMGGGGSLISAANNPSLKAAAPQAPWDSSTNFSSVTVPTLIFACENDSIAPVNSSALPIYDSMSRNAKQFLEINGGSHSCANSGNSNQALIGKKGVAWMKRFMDNDTRYSTFACENPNSTRVSDFRTANCSHHHHHH
2LDI , Knot 55 106 0.80 38 79 101
PLKTQQMQVGGMRCAACASSIERALERLKGVAEASVTVATGRLTVTYDPKQVSEITIQERIAALGYTLAEPKSSVTLNGHKHPHSHREEGHSHSHGAGEFNLKQEL
3SPG , Knot 151 343 0.85 40 227 328
MARRKCRNRFVKKNGQCNVEFTNMDDKPQRYIADMFTTCVDIRWRYMLLLFSLAFLVSWLLFGLIFWLIALIHGDLENPGGDDTFKPCVLQVNGFVAAFLFSIETQTTIGYGFRCVTEECPLAVFMVVVQSIVGCIIDSFMIGAIMAKMAAPKKRAQTLLFSHNAVVAMRDGKLCLMWRVGNLRKSHIVEAHVRAQLIKPRITEEGEYIPLDQIDIDVGFDKGLDRIFLVSPITILHEINEDSPLFGISRQDLETDDFEIVVILEGMVEATAMTTQARSSYLASEILWGHRFEPVLFEEKNQYKVDYSHFHKTYEVPSTPRCSAKDLVENKFLLSNSLEVLFQ

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(6ILW_1)}(2) \setminus P_{f(2LDI_1)}(2)|=120\), \(|P_{f(2LDI_1)}(2) \setminus P_{f(6ILW_1)}(2)|=38\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:100010101101011010101111010010100101011101001001110111111111001000010111101100111110100000100100000001111001101010000110101001011111101111101101100101011110111000001001011011110000011110001111000100010011010110000100100001111001111100110000000011000100001001001000000000
Pair \(Z_2\) Length of longest common subsequence
6ILW_1,2LDI_1 158 4
6ILW_1,3SPG_1 194 3
2LDI_1,3SPG_1 210 3

Newick tree

 
[
	3SPG_1:10.43,
	[
		6ILW_1:79,2LDI_1:79
	]:28.43
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{376 }{\log_{20} 376}-\frac{106}{\log_{20}106})=82.8\)
Status Protein1 Protein2 d d1/2
Query variables 6ILW_1 2LDI_1 103 70
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]