CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
6WLC_1 3QVC_1 6AQQ_1 Letter Amino acid
39 35 14 V Valine
19 33 19 N Asparagine
24 31 18 E Glutamic acid
23 28 31 I Isoleucine
12 22 12 Y Tyrosine
7 7 4 M Methionine
23 36 29 S Serine
22 31 10 T Threonine
22 36 17 F Phenylalanine
16 13 16 A Alanine
24 22 24 D Aspartic acid
12 7 13 H Histidine
25 47 22 K Lycine
31 40 32 L Leucine
14 16 5 P Proline
3 4 5 W Tryptophan
9 8 15 R Arginine
5 4 3 C Cysteine
16 7 22 Q Glutamine
24 24 18 G Glycine

6WLC_1|Chains A, B|Uridylate-specific endoribonuclease|Severe acute respiratory syndrome coronavirus 2 (2697049)
>3QVC_1|Chain A|Histo-aspartic protease|Plasmodium falciparum (5833)
>6AQQ_1|Chain A|Bifunctional ligase/repressor BirA|Staphylococcus aureus (1280)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
6WLC , Knot 153 370 0.81 40 207 346
MHHHHHHSSGVDLGTENLYFQSNMSLENVAFNVVNKGHFDGQQGEVPVSIINNTVYTKVDGVDVELFENKTTLPVNVAFELWAKRNIKPVPEVKILNNLGVDIAANTVIWDYKRDAPAHISTIGVCSMTDIAKKPTETICAPLTVFFDGRVDGQVDLFRNARNGVLITEGSVKGLQPSVGPKQASLNGVTLIGEAVKTQFNYYKKVDGVVQQLPETYFTQSRNLQEFKPRSQMEIDFLELAMDEFIERYKLEGYAFEHIVYGDFSHSQLGGLHLLIGLAKRFKESPFELEDFIPMDSTVKNYFITDAQTGSSKCVCSVIDLLLDDFVEIIKSQDLSVVSKVVKVTIDYTEISFMLWCKDGHVETFYPKLQ
3QVC , Knot 180 451 0.81 40 217 424
MNLTIKEEDFTNTFMKNEESFNTFRVTKVKRWNAKRLFKILFVTVFIVLAGGFSYYIFENFVFQKNRKINHIIKTSKYSTVGFNIENSYDRLMKTIKEHKLKNYIKESVKLFNKGLTKKSYLGSEFDNVELKDLANVLSFGEAKLGDNGQKFNFLFHTASSNVWVPSIKCTSESCESKNHYDSSKSKTYEKDDTPVKLTSKAGTISGIFSKDLVTIGKLSVPYKFIEMTEIVGFEPFYSESDVDGVFGLGWKDLSIGSIDPYIVELKTQNKIEQAVYSIYLPPENKNKGYLTIGGIEERFFDGPLNYEKLNHDLMWQVDLDVHFGNVSSKKANVILDSATSVITVPTEFFNQFVESASVFKVPFLSLYVTTCGNTKLPTLEYRSPNKVYTLEPKQYLEPLENIFSALCMLNIVPIDLEKNTFVLGDPFMRKYFTVYDYDNHTVGFALAKNL
6AQQ , Knot 142 329 0.83 40 195 314
MSKYSQDVLQLLYKNKPNYISGQSIAESLNISRTAVKKVIDQLKLEGCKIDSVNHKGHLLQQLPDIWYQGIIDQYTKSSALFDFSEVYDSIDSTQLAAKKSLVGNQSSFFILSDEQTKGRGRFNRHWSSSKGQGLWMSVVLRPNVAFSMISKFNLFIALGIRDAIQHFSQDEVKVKWPNDIYIDNGKVCGFLTEMVANNDGIEAIICGIGINLTQQLENFDESIRHRATSIQLHDKNKLDRYQFLERLLQEIEKRYNQFLTLPFSEIREEYIAASNIWNRTLLFTENDKQFKGQAIDLDYDGYLIVRDEAGESHRLISADIDFHHHHHH

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(6WLC_1)}(2) \setminus P_{f(3QVC_1)}(2)|=71\), \(|P_{f(3QVC_1)}(2) \setminus P_{f(6WLC_1)}(2)|=81\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:1000000001101100010100010100111011001010100101110110001000101101011000001110111011100010111010110011101110011100000111010011100100110010001011101110101010101100100111100101011010111001010110111011000100000101110011000100000100101000101011011100110000101011001101010000111101111110010001101001111000100011001001000010011011100110110000101100110101000010111100010100101010
Pair \(Z_2\) Length of longest common subsequence
6WLC_1,3QVC_1 152 4
6WLC_1,6AQQ_1 152 6
3QVC_1,6AQQ_1 162 4

Newick tree

 
[
	6AQQ_1:79.36,
	[
		6WLC_1:76,3QVC_1:76
	]:3.36
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{821 }{\log_{20} 821}-\frac{370}{\log_{20}370})=121.\)
Status Protein1 Protein2 d d1/2
Query variables 6WLC_1 3QVC_1 150 137
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]