CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
3KQG_1 7PPQ_1 1JQS_1 Letter Amino acid
16 32 8 L Leucine
2 12 5 M Methionine
7 18 3 D Aspartic acid
4 3 1 C Cysteine
9 20 4 Q Glutamine
7 26 10 V Valine
11 23 10 G Glycine
9 22 8 T Threonine
7 4 0 W Tryptophan
9 24 5 F Phenylalanine
10 17 10 P Proline
17 27 6 S Serine
5 14 3 R Arginine
9 19 13 I Isoleucine
14 17 19 K Lycine
2 12 1 H Histidine
7 16 1 Y Tyrosine
13 25 18 A Alanine
13 11 4 N Asparagine
11 19 10 E Glutamic acid

3KQG_1|Chains A, B, C, D, E, F|C-type lectin domain family 4 member K|Homo sapiens (9606)
>7PPQ_1|Chains A, B, C, D|Histone-arginine methyltransferase CARM1|Mus musculus (10090)
>1JQS_1|Chain A|50S Ribosomal protein L11|Escherichia coli (562)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
3KQG , Knot 89 182 0.84 40 140 180
ASTLNAQIPELKSDLEKASALNTKIRALQGSLENMSKLLKRQNDILQVVSQGWKYFKGNFYYFSLIPKTWYSAEQFCVSRNSHLTSVTSESEQEFLYKTAGGLIYWIGLTKAGMEGDWSWVDDTPFNKVQSARFWIPGEPNNAGNNEHCGNIKAPSLQAWNDAPCDKTFLFICKRPYVPSEP
7PPQ , Knot 157 361 0.85 40 213 356
GHMGHTLERSVFSERTEESSAVQYFQFYGYLSQQQNMMQDYVRTGTYQRAILQNHTDFKDKIVLDVGCGSGILSFFAAQAGARKIYAVEASTMAQHAEVLVKSNNLTDRIVVIPGKVEEVSLPEQVDIIISEPMGYMLFNERMLESYLHAKKYLKPSGNMFPTIGDVHLAPFTDEQLYMEQFTKANFWYQPSFHGVDLSALRGAAVDEYFRQPVVDTFDIRILMAKSVKYTVNFLEAKEGDLHRIEIPFKFHMLHSGLVHGLAFWFDVAFIGSIMTVWLSTAPTEPLTHWYQVRCLFQSPLFAKAGDTLSGTCLLIANKRQSYDISIVAQVDQTGSKSSNLLDLKNPFFRYTGTTPSPPPG
1JQS , Knot 69 139 0.81 38 102 136
AKKVAAQIKLQLPAGKATPAPPVGPALGQHGVNIMEFCKRFNAETADKAGMILPVVITVYEDKSFTFIIKTPPASFLLKKAAGIEKGSSEPKRKIVGKVTRKQIEEIAKTKMPDLNANSLEAAMKIIEGTAKSMGIEVV

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(3KQG_1)}(2) \setminus P_{f(7PPQ_1)}(2)|=57\), \(|P_{f(7PPQ_1)}(2) \setminus P_{f(3KQG_1)}(2)|=130\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:10010101101000100101100010110101001001100000110110011001010100101110010010010100000100100000001100011111011110011101010110001100100101111101001100000101011010110011000011110001011001
Pair \(Z_2\) Length of longest common subsequence
3KQG_1,7PPQ_1 187 3
3KQG_1,1JQS_1 144 4
7PPQ_1,1JQS_1 185 3

Newick tree

 
[
	7PPQ_1:99.01,
	[
		3KQG_1:72,1JQS_1:72
	]:27.01
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{543 }{\log_{20} 543}-\frac{182}{\log_{20}182})=104.\)
Status Protein1 Protein2 d d1/2
Query variables 3KQG_1 7PPQ_1 135 100.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]