CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
6MXR_1 1WJV_1 4GBD_1 Letter Amino acid
6 5 26 E Glutamic acid
4 1 13 M Methionine
6 3 13 F Phenylalanine
12 1 23 P Proline
8 4 32 D Aspartic acid
5 8 5 C Cysteine
7 2 21 Q Glutamine
11 2 23 H Histidine
6 1 7 W Tryptophan
11 3 7 Y Tyrosine
7 1 32 R Arginine
8 3 12 N Asparagine
24 12 37 G Glycine
5 3 22 I Isoleucine
16 1 56 L Leucine
21 5 32 V Valine
16 2 55 A Alanine
30 12 19 S Serine
20 1 15 T Threonine
13 9 8 K Lycine

6MXR_1|Chains A, C[auth H]|anti-VEGF-A Fab fragment bH1 heavy chain|Homo sapiens (9606)
>1WJV_1|Chain A|Cell growth regulating nucleolar protein LYAR|Mus musculus (10090)
>4GBD_1|Chains A, B|Putative uncharacterized protein|Pseudomonas aeruginosa (208964)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
6MXR , Knot 105 236 0.81 40 155 222
EVQLVESGGGLVQPGGSLRLSCAASGFNIKDTWIHWVRQAPGKGLEWVARIYPTNGYTRYADSVKGRFTISADTSKNTAYLQMNSLRAEDTAVYYCSRWGGMMFYAMDYWGQGTLVTVSSASTKGPSVFPLAPSSKSTSGGTAALGCLVKDYFPEPVTVSWNSGALTSGVHTFPAVLQSSGLYSLSSVVTVPSSSLGTQTYICNVNHKPSNTKVDKKVEPKSCDKTGHHHHHHHHG
1WJV , Knot 41 79 0.75 40 65 74
GSSGSSGMVFFTCNACGESVKKIQVEKHVSNCRNCECLSCIDCGKDFWGDDYKSHVKCISEGQKYGGKGYEAKSGPSSG
4GBD , Knot 187 458 0.83 40 224 426
MHHHHHHENLYFQGMPNVRNPFDLLLLPTWIVPVEPAGVVLRDHALGIRDGQIALVAPREQAMRHGATEIRELPGMLLAPGLVNAHGHSAMSLFRGLADDLPLMTWLQDHIWPAEGQWVSEDFIRDGTELAIAEQVKGGITCFSDMYFYPQAICGVVHDSGVRAQVAIPVLDFPIPGARDSAEAIRQGMALFDDLKHHPRIRIAFGPHAPYTVSDDKLEQILVLTEELDASIQMHVHETAFEVEQAMERNGERPLARLHRLGLLGPRFQAVHMTQVDNDDLAMLVETNSSVIHCPESNLKLASGFCPVEKLWQAGVNVAIGTDGAASNNDLDLLGETRTAALLAKAVYGQATALDAHRALRMATLNGARALGLERLIGSLEAGKAADLVAFDLSGLAQQPVYDPVSQLIYASGRDCVRHVWVGGRQLLDDGRLLRHDEQRLIARAREWGAKIAASDRS

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(6MXR_1)}(2) \setminus P_{f(1WJV_1)}(2)|=126\), \(|P_{f(1WJV_1)}(2) \setminus P_{f(6MXR_1)}(2)|=36\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:01011001111101110101001101101000110110011101101110101001000010010101010100000010101001010001100000111111011001101011010010001101111110000001101111011000110110101001110011001111100011001001101100011000010010001000010001010000001000000001
Pair \(Z_2\) Length of longest common subsequence
6MXR_1,1WJV_1 162 3
6MXR_1,4GBD_1 185 6
1WJV_1,4GBD_1 217 3

Newick tree

 
[
	4GBD_1:10.60,
	[
		6MXR_1:81,1WJV_1:81
	]:25.60
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{315 }{\log_{20} 315}-\frac{79}{\log_{20}79})=74.7\)
Status Protein1 Protein2 d d1/2
Query variables 6MXR_1 1WJV_1 92 61
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]