CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
6UBL_1 8GJG_1 8WMF_1 Letter Amino acid
11 33 51 E Glutamic acid
0 2 25 H Histidine
15 27 102 L Leucine
9 13 40 R Arginine
8 4 85 N Asparagine
22 5 63 P Proline
12 1 95 T Threonine
3 0 10 W Tryptophan
7 3 55 Y Tyrosine
13 8 59 D Aspartic acid
3 0 31 C Cysteine
5 1 60 Q Glutamine
10 5 11 M Methionine
6 4 72 F Phenylalanine
11 9 98 S Serine
19 9 90 V Valine
15 14 80 A Alanine
22 4 82 G Glycine
12 9 74 I Isoleucine
8 23 62 K Lycine

6UBL_1|Chains A, B|DynF|Micromonospora chersina (47854)
>8GJG_1|Chain A|gluc_A04_0005 Binder|synthetic construct (32630)
>8WMF_1|Chain A|Spike glycoprotein|Severe acute respiratory syndrome coronavirus 2 (2697049)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
6UBL , Knot 96 211 0.81 38 150 203
SMSTKSVLFGRPVQTEGVPNVYAGAPVVPWTPPEPGIDNLGINSIDTFAVPGVGEYTVAFDGWVRVVRSPSTSGEWADAEVYTNLIEMKMVGECEELGKITVTLNPDCLSAGQIRTPFDPYAGEGPSAKACRMAVGAIFDMPKLGLKLMNREPIILTIDDVRSIPPAGAPGKGQIYRMMPLLDVNDPDGQPVAYLTSLRFNMGGYLKPDQM
8GJG , Knot 74 174 0.73 36 92 152
MSGMLDELFSLLNKMFELSDKYRELRKELRKAIESGAPEEELRELLEKMLEIAKKLLELTKELKKLVEDVLKNNPDPVERAKAVLLYAVGVHILYSESSELEVIAERLGFKDIAEKAKEIADKARELKEEVKRKLREIREEVPDPEIRKAAEEAIEMLESNDKRLKEFRKLHSQ
8WMF , Knot 446 1245 0.85 40 329 1088
LLMGCVAETGSSQCVNLITRTQSYTNSFTRGVYYPDKVFRSSVLHSTHDLFLPFFSNVTWFHAIHVSGTNGTKRFDNPALPFNDGVYFASTEKSNIIRGWIFGTTLDSKTQSLLIVNNATNVVIKVCEFQFCNDPFLDVYQKNNKSWMESEFRVYSSANNCTFEYVSQPFLMDLEGKEGNFKNLREFVFKNIDGYFKIYSKHTPINLERDLPQGFSALEPLVDLPIGINITRFQTLLALHRSYLTPVDSSSGWTAGAAAYYVGYLQPRTFLLKYNENGTITDAVDCALDPLSETKCTLKSFTVEKGIYQTSNFRVQPTESIVRFPNITNLCPFHEVFNATTFASVYAWNRKRISNCVADYSVIYNFAPFFAFKCYGVSPTKLNDLCFTNVYADSFVIRGNEVSQIAPGQTGNIADYNYKLPDDFTGCVIAWNSNKLDSKPSGNYNYLYRLLRKSKLKPFERDISTEIYQAGNKPCNGVAGPNCYSPLQSYGFRPTYGVGHQPYRVVVLSFELLHAPATVCGPKKSTNLVKNKCVNFNFNGLTGTGVLTESNKKFLPFQQFGRDIADTTDAVRDPQTLEILDITPCSFGGVSVITPGTNTSNQVAVLYQGVNCTEVPVAIHADQLTPTWRVYSTGSNVFQTRAGCLIGAEYVNNSYECDIPIGAGICASYQTQTKSHGSAGSVASQSIIAYTMSLGAENSVAYSNNSIAIPTNFTISVTTEILPVSMTKTSVDCTMYICGDSTECSNLLLQYGSFCTQLKRALTGIAVEQDKNTQEVFAQVKQIYKTPPIKYFGGFNFSQILPDPSKPSKRSPIEDLLFNKVTLADAGFIKQYGDCLGDIAARDLICAQKFNGLTVLPPLLTDEMIAQYTSALLAGTITSGWTFGAGPALQIPFPMQMAYRFNGIGVTQNVLYENQKLIANQFNSAIGKIQDSLSSTPSALGKLQDVVNHNAQALNTLVKQLSSKFGAISSVLNDILSRLDPPEAEVQIDRLITGRLQSLQTYVTQQLIRAAEIRASANLAATKMSECVLGQSKRVDFCGKGYHLMSFPQSAPHGVVFLHVTYVPAQEKNFTTAPAICHDGKAHFPREGVFVSNGTHWFVTQRNFYEPQIITTDNTFVSGNCDVVIGIVNNTVYDPLQPELDSFKEELDKYFKNHTSPDVDLGDISGINASVVNIQKEIDRLNEVAKNLNESLIDLQELGKYEQYIASSGYIPEAPRDGQAYVRKDGEWVLLSTFLEGTKHHHHHH

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(6UBL_1)}(2) \setminus P_{f(8GJG_1)}(2)|=106\), \(|P_{f(8GJG_1)}(2) \setminus P_{f(6UBL_1)}(2)|=48\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:0100001111011000111010111111110110111001110010011111110001110111011001000101101010001101011100001101010101001011010011010110110101001111111011011101100011110100100111111110101001111101001010111010010101110101001
Pair \(Z_2\) Length of longest common subsequence
6UBL_1,8GJG_1 154 3
6UBL_1,8WMF_1 203 4
8GJG_1,8WMF_1 249 3

Newick tree

 
[
	8WMF_1:12.39,
	[
		6UBL_1:77,8GJG_1:77
	]:46.39
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{385 }{\log_{20} 385}-\frac{174}{\log_{20}174})=63.0\)
Status Protein1 Protein2 d d1/2
Query variables 6UBL_1 8GJG_1 82 70
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]