CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
7DQV_1 1BCY_1 2DYK_1 Letter Amino acid
3 1 0 C Cysteine
15 9 2 N Asparagine
37 20 12 R Arginine
11 3 3 H Histidine
65 38 20 L Leucine
33 12 4 F Phenylalanine
18 4 8 P Proline
29 21 6 T Threonine
68 27 11 A Alanine
32 27 15 E Glutamic acid
18 22 13 K Lycine
6 1 3 W Tryptophan
53 16 19 V Valine
29 27 12 D Aspartic acid
61 23 15 G Glycine
38 19 5 I Isoleucine
15 8 1 M Methionine
44 19 6 S Serine
19 12 4 Y Tyrosine
26 10 2 Q Glutamine

7DQV_1|Chain A|Probable ATP-dependent transporter ycf16|Cyanidioschyzon merolae (strain 10D) (280699)
>1BCY_1|Chain A|ANNEXIN V|Rattus norvegicus (10116)
>2DYK_1|Chains A, B|GTP-binding protein|Thermus thermophilus (300852)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
7DQV , Knot 243 620 0.84 40 249 569
ASGPESAYTTGVTARRIFALAWSSSATMIVIGFIASILEVATLPAFAIVFGRMFQVFTKSKSQIEGETWKYSVGFVGIGVFEFIVAGSRTALFGIASERLARDLRVAAFSNLVEQDVTYFDRRKAGELGGKLNNDVQVIQYSFSKLGAVLFNLAQCVVGIIVAFIFAPALTGVLIALSPLVVLAGAAQMIEMSGNTKRSSEAYASAGSVAAEVFSNIRTTKAFEAERYETQRYGSKLDPLYRLGRRRYISDGLFFGLSMLVIFCVYALALWWGGQLIARGSLNLGNLLTAFFSAILGFMGVGQAAQVWPDVTRGLGAGGELFAMIDRVPQYRRPDPGAEVVTQPLVLKQGIVFENVHFRYPTRMNVEVLRGISLTIPNGKTVAIVGGSGAGKSTIIQLLMRFYDIEPQGGGLLLFDGTPAWNYDFHALRSQIGLVSQEPVLFSGTIRDNILYGKRDATDEEVIQALREANAYSFVMALPDGLDTEVGERGLALSGGQKQRIAIARAILKHPTLLCLDESTSALDAESEALVQEALDRMMASDGVTSVVIAHRLSTVARADLILVMQDGVVVEQGNHSELMALGPSGFYYQLVEKQLASGDMSAAGRDYKDDDDKHHHHHH
1BCY , Knot 134 319 0.80 40 178 291
MALRGTVTDFSGFDGRADAEVLRKAMKGLGTDEDSILNLLTARSNAQRQQIAEEFKTLFGRDLVNDMKSELKGKFEKLIVALMKPSRLYDAYELKHALKGAGTDEKVLTEIIASRTPEELRAIKQAYEEEYGSNLEDDVVGDTSGYYQRMLVVLLQANRDPDTAIDDAQVELDAQALFQAGELKWGTDEEKFITILGTRSVSHLRRVFDKYMTISGFQIEETIDRETSGNLENLLLAVVKSIRSIPAYLAETLYYAMKGAGTDDHTLIRVIVSRSEIDLFNIRKEFRKNFATSLYSMIKGDTSGDYKKALLLLCGGEDD
2DYK , Knot 76 161 0.80 38 115 158
MHKVVIVGRPNVGKSSLFNRLLKKRSAVVADVPGVTRDLKEGVVETDRGRFLLVDTGGLWSGDKWEKKIQEKVDRALEDAEVVLFAVDGRAELTQADYEVAEYLRRKGKPVILVATKVDDPKHELYLGPLYGLGFGDPIPTSSEHARGLEELLEAIWERLP

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(7DQV_1)}(2) \setminus P_{f(1BCY_1)}(2)|=103\), \(|P_{f(1BCY_1)}(2) \setminus P_{f(7DQV_1)}(2)|=32\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:10110010001101001111110001011111111011011011111111101101100000010100100011111111101111100011111100011001011110011000100100001101110100010110001001111110110011111111111111011111101111111110110101000000010101101110110010000110100000000100101100110000100111111011111010111111110111010101101101110111111111011011101001111110111110011000010111011001111001111001010010010101101101011010011111101110001101110100101011111110101110001011000111100011110101000110100010000110110010100111111011000110011110110000111101110010110100000110100011100110011100110011110010011010111110011110010000111111011000110001101010111000000000000000
Pair \(Z_2\) Length of longest common subsequence
7DQV_1,1BCY_1 135 5
7DQV_1,2DYK_1 176 4
1BCY_1,2DYK_1 135 4

Newick tree

 
[
	2DYK_1:81.73,
	[
		7DQV_1:67.5,1BCY_1:67.5
	]:14.23
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{939 }{\log_{20} 939}-\frac{319}{\log_{20}319})=166.\)
Status Protein1 Protein2 d d1/2
Query variables 7DQV_1 1BCY_1 203 150.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]