CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
4OHZ_1 8RPE_1 3QNW_1 Letter Amino acid
18 6 5 N Asparagine
27 5 11 K Lycine
10 6 9 H Histidine
36 9 14 L Leucine
22 4 12 F Phenylalanine
42 12 10 V Valine
30 11 10 A Alanine
20 4 11 R Arginine
23 6 13 D Aspartic acid
36 6 10 E Glutamic acid
28 15 5 S Serine
28 6 6 T Threonine
11 7 4 Y Tyrosine
13 7 3 Q Glutamine
22 1 10 I Isoleucine
9 2 2 M Methionine
17 2 5 P Proline
9 2 6 C Cysteine
28 17 9 G Glycine
2 4 1 W Tryptophan

4OHZ_1|Chain A|Protein clpf-1|Caenorhabditis elegans (6239)
>8RPE_1|Chain A[auth B]|VH1|Lama (9839)
>3QNW_1|Chains A, C, E, G|Caspase-6|Homo sapiens (9606)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
4OHZ , Knot 180 431 0.84 40 241 413
GSHMSEENVQEFVLKEDCELRFAAGDDSDVCLELVKGYAEIFGTELLLNKKYTFPAKSRVAAFTWKGATIELVGTTESAYVAESTPMVIYLNIHAAMEEVRKKREEQAAGNSNKAKGPRLLLVGPTDVGKTTVSRILCNYSVRQGRTPIFVELDVGQNSVSVPGTVAAVLVQKTADVIDGFERNQPIVFNFGHTSPSANLSLYEALFKEMATTLNAQIQENDEAKIGGMIINTCGWVDGEGYKCIVKAASAFEVDVVIVLDHERLYSDLSKELPEFVRLTHVPKSGGVEQRTGQIRSKMRGENVHRYFYGTRANNLYPFTFDVSFDDVTLCKIGAEQLPDSCLPFGMEVENHETKLVIMEPSADIKHHLFAFSRSTKADENVLKSPVFGFCLVTEVDLEKRTMSILCPQRTIPSKVLVFSDITHLDDQIKR
8RPE , Knot 65 132 0.80 40 99 125
HHHHHHENLYFQGAEVQLVESGGGLVLPGGSLKLSCAASGFNFGSSDMNWVRQAAGKGPEWVASIERGAGGTDYADSVQGRFTVSRDNAKSTLWLQMNSLKAEDTAVYYCVVSDNSGYYKYWGQGTQVTVSS
3QNW , Knot 77 156 0.83 40 123 153
AFYKREMFDPAEKYKMDHRRRGIALIFNHERFFWHLTLPERRGTCADRDNLTRRFSDLGFEVKCFNDLKAEELLLKIHEVSTVSHADADCFVCVFLSHGEGNHIYAYDAKIEIQTLTGLFKGDKCHSLVGKPKIFIIQACRGNQHDVPVIPLDVVD

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(4OHZ_1)}(2) \setminus P_{f(8RPE_1)}(2)|=166\), \(|P_{f(8RPE_1)}(2) \setminus P_{f(4OHZ_1)}(2)|=24\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:10010000100111000001011110000101011010101110011100000111000111101011010111000010110001111010101110010000000111000010110111111001100010011000010010011110101100010111011111100010110110000111101100010101010011100110010101000001011111100011101010001101101101011111000010001000110110100110011100001010001010010001010010010110101010010100111001100011111010000001111010101000111100000100011001111101100101000010110100011001111001001000100
Pair \(Z_2\) Length of longest common subsequence
4OHZ_1,8RPE_1 190 4
4OHZ_1,3QNW_1 186 4
8RPE_1,3QNW_1 148 4

Newick tree

 
[
	4OHZ_1:99.78,
	[
		3QNW_1:74,8RPE_1:74
	]:25.78
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{563 }{\log_{20} 563}-\frac{132}{\log_{20}132})=126.\)
Status Protein1 Protein2 d d1/2
Query variables 4OHZ_1 8RPE_1 160 102
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]