CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
1WBX_1 4CGW_1 1VBI_1 Letter Amino acid
26 10 43 L Leucine
11 11 11 K Lycine
15 2 26 P Proline
13 4 11 S Serine
17 6 9 T Threonine
16 5 6 Y Tyrosine
14 8 4 Q Glutamine
4 2 0 C Cysteine
28 13 29 E Glutamic acid
4 6 3 I Isoleucine
5 1 6 M Methionine
13 1 29 V Valine
12 5 15 D Aspartic acid
22 9 40 G Glycine
8 0 8 H Histidine
7 5 11 F Phenylalanine
9 5 3 N Asparagine
21 8 31 R Arginine
11 0 8 W Tryptophan
20 16 51 A Alanine

1WBX_1|Chain A|H-2 CLASS I HISTOCOMPATIBILITY ANTIGEN, D-B ALPHA CHAIN|MUS MUSCULUS (10090)
>4CGW_1|Chains A, B|RNA POLYMERASE II-ASSOCIATED PROTEIN 3|HOMO SAPIENS (9606)
>1VBI_1|Chain A|Type 2 malate/lactate dehydrogenase|Thermus thermophilus (274)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
1WBX , Knot 126 276 0.85 40 186 270
GPHSMRYFETAVSRPGLEEPRYISVGYVDNKEFVRFDSDAENPRYEPRAPWMEQEGPEYWERETQKAKGQEQWFRVSLRNLLGYYNQSAGGSHTLQQMSGCDLGSDWRLLRGYLQFAYEGRDYIALNEDLKTWTAADMAAQITRRKWEQSGAAEHYKAYLEGECVEWLHRYLKNGNATLLRTDSPKAHVTHHPRSKGEVTLRCWALGFYPADITLTWQLNGEELTQDMELVETRPAGDGTFQKWASVVVPLGKEQNYTCRVYHEGLPEPLTLRWEP
4CGW , Knot 57 117 0.77 36 91 112
STEGERKQIEAQQNKQQAISEKDRGNGFFKEGKYERAIECYTRGIAADGANALLPANRAMAYLKIQKYEEAEKDCTQAILLDGSYSKAFARRGTARTFLGKLNEAKQDFETVLLLEP
1VBI , Knot 136 344 0.77 38 158 309
MRWRADFLSAWAEALLRKAGADEPSAKAVAWALVEADLRGVGSHGLLRLPVYVRRLEAGLVNPSPTLPLEERGPVALLDGEHGFGPRVALKAVEAAQSLARRHGLGAVGVRRSTHFGMAGLYAEKLAREGFVAWVTTNAEPDVVPFGGREKALGTNPLAFAAPAPQGILVADLATSESAMGKVFLAREKGERIPPSWGVDREGSPTDDPHRVYALRPLGGPKGYALALLVEVLSGVLTGAGVAHGIGRMYDEWDRPQDVGHFLLALDPGRFVGKEAFLERMGALWQALKATPPAPGHEEVFLPGELEARRRERALAEGMALPERVVAELKALGERYGVPWRDDA

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(1WBX_1)}(2) \setminus P_{f(4CGW_1)}(2)|=129\), \(|P_{f(4CGW_1)}(2) \setminus P_{f(1WBX_1)}(2)|=34\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:110010010011001110010010110100001101000100100010111100011001000000101000110101001110000011100010010100110010110101011001000111000100101101110100001000111000010101001011000100101011000010101000100010101001111101101010101010010001011000111010100110111111000000001000111011010101
Pair \(Z_2\) Length of longest common subsequence
1WBX_1,4CGW_1 163 3
1WBX_1,1VBI_1 154 3
4CGW_1,1VBI_1 151 4

Newick tree

 
[
	1WBX_1:80.50,
	[
		1VBI_1:75.5,4CGW_1:75.5
	]:5.00
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{393 }{\log_{20} 393}-\frac{117}{\log_{20}117})=83.9\)
Status Protein1 Protein2 d d1/2
Query variables 1WBX_1 4CGW_1 108 75
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]