CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
5GKD_1 1NVF_1 4KHT_1 Letter Amino acid
54 29 2 V Valine
65 30 5 G Glycine
25 18 0 P Proline
53 27 1 T Threonine
19 10 0 Y Tyrosine
21 7 1 H Histidine
48 35 10 I Isoleucine
5 4 1 W Tryptophan
40 37 6 A Alanine
71 13 2 N Asparagine
44 17 0 D Aspartic acid
1 5 2 C Cysteine
44 25 1 S Serine
25 6 9 Q Glutamine
40 29 8 E Glutamic acid
38 23 9 K Lycine
8 8 0 M Methionine
35 24 2 R Arginine
55 37 8 L Leucine
35 9 0 F Phenylalanine

5GKD_1|Chains A, B, C, D|AlyGC|Glaciecola chathamensis (368405)
>1NVF_1|Chains A, B, C|3-DEHYDROQUINATE SYNTHASE|Emericella nidulans (162425)
>4KHT_1|Chain A|Gp41 helix|Human Immunodeficiency Virus 1 (11676)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
5GKD , Knot 271 726 0.82 40 264 643
ADLLVKTPEAYDQALKKAKPGDDIILANGTWRDFEVLFEAKGNENKPITLRGQTPGKVFLTGQSNLRLAGEHLIVSGLVFKDGYTPTGEVIAFRRNKDVLASHSRVTQVVIDNFSNPEKFEQDSWVMVYGRHNRFDHNHLVGKRNKGVTMAVRLTTESSQQNHHRIDHNYFGPRPILGSNGGETLRIGTSHHSLTDSFTLVENNYFDRCNGEVEIISNKSGKNSIRNNVFFESRGTLTLRHGNGNIVENNVFFGNGVDHTGGIRVINRDQIIRNNYLEGLTGYRFGSGLTVMNGVPNSKINRYHQVDNALIENNTLVNVEHIQFAAGSDKERSAAPINSNMNNNLIVNDQGTDGITAFDDISGIKFKDNLLNQDAKPSINKGFEQADITMQRHDNGLLYPEAKTQQKYGVSTQLEPIGKDEVGVSWYPKVEPDVAFGSGKHIAVSPGDNTLFDAIASAETGDVLVLQAGEYWVSKILSLDKTLTIRAQEKGSAVIFPQRSTLIEINNKGNLTLDGVYVDATNAPDAAGNTLIRTTRLPMQRNYRLAIKNSTFENLDINHSYHFFDAGNRSFADYIEVQDSQFKHITGDLFRLNKETDDLGIYNVEYLTIENSNVSDLQGAIAKVYRGGTDESTFGPHVVMNNNIFNEVGKGKRNKSAASLILHGTQVNKMTTNEFNNSAPIIFELTVGEPKTWVTGNVFEGTPEPVVRDLFPLSGATTTISGNTVL
1NVF , Knot 168 393 0.85 40 217 379
MSNPTKISILGRESIIADFGLWRNYVAKDLISDCSSTTYVLVTDTNIGSIYTPSFEEAFRKRAAEITPSPRLLIYNRPPGEVSKSRQTKADIEDWMLSQNPPCGRDTVVIALGGGVIGDLTGFVASTYMRGVRYVQVPTTLLAMVDSSIGGKTAIDTPLGKNLIGAIWQPTKIYIDLEFLETLPVREFINGMAEVIKTAAISSEEEFTALEENAETILKAVRREVTPGEHRFEGTEEILKARILASARHKAYVVSADEREGGLRNLLNWGHSIGHAIEAILTPQILHGECVAIGMVKEAELARHLGILKGVAVSRIVKCLAAYGLPTSLKDARIRKLTAGKHCSVDQLMFNMALDKKNDGPKKKIVLLSAIGTPYETRASVVANEDIRVVLAP
4KHT , Knot 27 67 0.56 30 42 55
GCCGGIKKEIEAIKKEQEAIKKKIEAIEKELSGIVQQQNNLLRAIEAQQHLLQLTVWGIKQLQARIL

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(5GKD_1)}(2) \setminus P_{f(1NVF_1)}(2)|=92\), \(|P_{f(1NVF_1)}(2) \setminus P_{f(5GKD_1)}(2)|=45\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:101110010100011001011001111010100101110101000011010100110111010001011100111011110010010101111000001110000100111001001001000011110100001000011100001101110100000000000100001110111100110010110000010001011000010000101011000010001000111000101010010101100011110110001110110000110000101101001101101101110001000001001110000110100101111000000111100010001110001001101100101101000110001010100110010101000001110101000000110001011100011101010101011110100111011000110111010010111101100110011010001010100010111110000110100010101011010100110111001100001110000011100001001010000011011000110010100001001010110100000011100100101000010010111101001100000111011100011001101000001101110100100100001000111110101101001101011010101110011110110001010011
Pair \(Z_2\) Length of longest common subsequence
5GKD_1,1NVF_1 137 4
5GKD_1,4KHT_1 238 3
1NVF_1,4KHT_1 203 4

Newick tree

 
[
	4KHT_1:12.42,
	[
		5GKD_1:68.5,1NVF_1:68.5
	]:52.92
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{1119 }{\log_{20} 1119}-\frac{393}{\log_{20}393})=190.\)
Status Protein1 Protein2 d d1/2
Query variables 5GKD_1 1NVF_1 239 183.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]