CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
6TTT_1 9BCH_1 3URP_1 Letter Amino acid
27 8 2 I Isoleucine
11 3 0 M Methionine
33 9 4 P Proline
35 16 6 T Threonine
8 4 1 W Tryptophan
13 2 4 C Cysteine
58 16 3 L Leucine
46 15 15 S Serine
36 20 2 K Lycine
15 8 4 F Phenylalanine
32 13 8 V Valine
42 9 7 A Alanine
34 1 1 R Arginine
16 18 9 N Asparagine
38 13 6 E Glutamic acid
36 17 12 G Glycine
43 11 6 D Aspartic acid
29 2 2 Q Glutamine
19 6 3 H Histidine
9 7 9 Y Tyrosine

6TTT_1|Chain A|N6-adenosine-methyltransferase catalytic subunit|Homo sapiens (9606)
>9BCH_1|Chain A|Membrane protein|Corynebacterium diphtheriae NCTC 13129 (257309)
>3URP_1|Chain A|Guanyl-specific ribonuclease T1|Aspergillus oryzae (5062)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
6TTT , Knot 231 580 0.84 40 260 535
MSDTWSSIQAHKKQLDSLRERLQRRRKQDSGHLDLRNPEAALSPTFRSDSPVPTAPTSGGPKPSTASAVPELATDPELEKKLLHHLSDLALTLPTDAVSICLAISTPDAPATQDGVESLLQKFAAQELIEVKRGLLQDDAHPTLVTYADHSKLSAMMGAVAEKKGPGEVAGTVTGQKRRAEQDSTTVAAFASSLVSGLNSSASEPAKEPAKKSRKHAASDVDLEIESLLNQQSTKEQQSKKVSQEILELLNTTTAKEQSIVEKFRSRGRAQVQEFCDYGTKEECMKASDADRPCRKLHFRRIINKHTDESLGDCSFLNTCFHMDTCKYVHYEIDACMDSEAPGSKDHTPSQELALTQSVGGDSSADRLFPPQWICCDIRYLDVSILGKFAVVMADPPWDIHMELPYGTLTDDEMRRLNIPVLQDDGFLFLWVTGRAMELGRECLNLWGYERVDEIIWVKTNQLQRIIRTGRTGHWLNHGKEHCLVGVKGNPQGFNQGLDCDVIVAEVRSTSHKPDEIYGMIERLSPGTRKIELFGRPHNVQPNWITLGNQLDGIHLLDPDVVARFKQRYPDGIISKPKNL
9BCH , Knot 96 198 0.85 40 147 192
SEEVKNADLYWGFSGSSHHKYDHNGPKFEKAGKGAELTNIDAASAYAETFKKGVFPNNKREKSDILVFHNGEVKTETNHSSYQINWPGEVTMKLGYGDGLVIKDLNLMLKNGNMGELKATVGENSNITLFDVQEYSVSDNTITVTPKIPPCTTGTWKPWHNDLTSKLGSLKSVFFESYTCNNDDIAKKPLPLTVVLNG
3URP , Knot 56 104 0.83 38 86 101
ACDYTCGSNCYSSSDVSTAQAAGYKLHEDGETVGSNSYPHKYNNYEGFDFSVSSPYYEWPILSSGDVYSGGSPGADRVVFNENNQLAGVITHTGASGNNFVECT

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(6TTT_1)}(2) \setminus P_{f(9BCH_1)}(2)|=151\), \(|P_{f(9BCH_1)}(2) \setminus P_{f(6TTT_1)}(2)|=38\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:1000100101000010010001000000001010100101110101000011101100111010010111011001010001100100111011001101011100101110001100110011100110100111000101011001000010111111100011101110101000010000001111100110110001001100110000001100101010011000000000000100011011000010000110010001010100100010000010100100100010100110000000110001100010100000100010101000111000001000111000111000100111101100010010101110111111011101010110101000010010111100011111110101101100010111000100111100001001100100101100100001111010101100110001111010000001001011100101100010111010010101101100101101101011101000010111001001
Pair \(Z_2\) Length of longest common subsequence
6TTT_1,9BCH_1 189 4
6TTT_1,3URP_1 234 3
9BCH_1,3URP_1 145 3

Newick tree

 
[
	6TTT_1:11.44,
	[
		9BCH_1:72.5,3URP_1:72.5
	]:42.94
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{778 }{\log_{20} 778}-\frac{198}{\log_{20}198})=161.\)
Status Protein1 Protein2 d d1/2
Query variables 6TTT_1 9BCH_1 204 136
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]