Help in Bioinformatic

Help in Bioinformatic.

Homework Assignment 3: Chapter 3 St. Clair &Visick, Putting your skills into practice, problem 5

Tuesday, October 28 Homework Assignment 3 will be due Tuesday, November 4.

What changes are needed to construct a semi-global alignment like in the third homework assignment? The global alignment works pretty well on sequences that are nearly the same length. Let’s try another example where the sequence lengths are more disparate.

$ rubyglobal.rb -d cgctatagcta

Dynamic programming table:

|      |   c  |   g  |   c  |   t  |   a  |   t  |   a  |   g  |

—-+——+——+——+——+——+——+——+——+——+

|      |      |      |      |      |      |      |      |      |

|    0 |<  -1 |<  -2 |<  -3 |<  -4 |<  -5 |<  -6 |<  -7 |<  -8 |

—-+——+——+——+——+——+——+——+——+——+

|    ^ |     |      |     |      |      |      |      |      |

c  |   -1 |    1 |<   0 |<  -1 |<  -2 |<  -3 |<  -4 |<  -5 |<  -6 |

—-+——+——+——+——+——+——+——+——+——+

|    ^ |    ^ |     |     |     |      |     |      |      |

t  |   -2 |    0 |    1 |<   0 |    0 |<  -1 |<  -2 |<  -3 |<  -4 |

—-+——+——+——+——+——+——+——+——+——+

|    ^ |    ^ |   ^ |     |     |     |      |     |      |

a  |   -3 |   -1 |    0 |    1 |<   0 |    1 |<   0 |<  -1 |<  -2 |

—-+——+——+——+——+——+——+——+——+——+

Alignment 1

cgctatag

__c__ta_

Alignment 2

cgctatag

c____ta_

Alignment 3

cgctatag

__ct__a_

Alignment 4

cgctatag

c__t__a_

Alignment 5

cgctatag

__cta___

Alignment 6

cgctatag

c__ta___

The 5th alignment really looks better here even though they all 6 scored the same -2. The problem is that terminal gaps are scored the same as internal gaps. If we are trying to see if a short sequence lines up best with a similar sized piece that is somewhere inside the longer sequence, internal gaps need to have a larger penalty than terminal gaps. If the terminal gap penalty was reduced to 0 while the other scoring stayed the same, that should get the desired result where the 5th alignment is clearly the best with a score of 3. Simply modifying how the global alignment program fills in the outside rows and columns of the dynamic programming table should be all that is required to do a semi-global alignment.

$ ruby semi-global.rb -d cgctatagcta

Dynamic programming table:

|      |   C  |   G  |   C  |   T  |   A  |   T  |   A  |   G  |

—-+——+——+——+——+——+——+——+——+——+

|      |      |      |      |      |      |      |      |      |

|    0 |<   0 |<   0 |<   0 |<   0 |<   0 |<   0 |<   0 |<   0 |

—-+——+——+——+——+——+——+——+——+——+

|    ^ |     |     |     |     |     |     |     |   ^ |

C  |    0 |    1 |<   0 |    1 |<   0 |    0 |    0 |    0 |    0 |

—-+——+——+——+——+——+——+——+——+——+

|    ^ |   ^ |     |   ^ |     |      |     |     |   ^ |

T  |    0 |    0 |    1 |<   0 |    2 |<   1 |    1 |<   0 |    0 |

—-+——+——+——+——+——+——+——+——+——+

|    ^ |     |   ^ |     |    ^ |     |      |      |      |

A  |    0 |<   0 |<   0 |    1 |<   1 |    3 |<   3 |<   3 |<   3 |

—-+——+——+——+——+——+——+——+——+——+

Alignment 1

Sequence 1 0001 CGCTATAG

|||

Sequence 2 0001 __CTA___

For Second part on Homework Assignment 3, add numbers to make longer alignments more readable something like this:

$ ruby semi-global.rb acalifornia2009.fasta acalifornia2009m1.fasta

Alignment 1

Sequence 1 0001 TAGATATTAAAGATGAGTCTTCTAACCGAGGTCGAAACGTACGTTCTTTCTATCATCCCG

||||||||||||||||||||||||||||||||||||||||||||||||

Sequence 2 0001 ____________ATGAGTCTTCTAACCGAGGTCGAAACGTACGTTCTTTCTATCATCCCG

 

Sequence 1 0061 TCAGGCCCCCTCAAAGCCGAGATCGCGCAGAGACTGGAAAGTGTCTTTGCAGGAAAGAAC

||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

Sequence 2 0049 TCAGGCCCCCTCAAAGCCGAGATCGCGCAGAGACTGGAAAGTGTCTTTGCAGGAAAGAAC

 

Sequence 1 0121 ACAGATCTTGAGGCTCTCATGGAATGGCTAAAGACAAGACCAATCTTGTCACCTCTGACT

||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

Sequence 2 0109 ACAGATCTTGAGGCTCTCATGGAATGGCTAAAGACAAGACCAATCTTGTCACCTCTGACT

 

Sequence 1 0181 AAGGGAATTTTAGGATTTGTGTTCACGCTCACCGTGCCCAGTGAGCGAGGACTGCAGCGT

||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

Sequence 2 0169 AAGGGAATTTTAGGATTTGTGTTCACGCTCACCGTGCCCAGTGAGCGAGGACTGCAGCGT

 

Sequence 1 0241 AGACGCTTTGTCCAAAATGCCCTAAATGGGAATGGGGACCCGAACAACATGGATAGAGCA

||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

Sequence 2 0229 AGACGCTTTGTCCAAAATGCCCTAAATGGGAATGGGGACCCGAACAACATGGATAGAGCA

 

Sequence 1 0301 GTTAAACTATACAAGAAGCTCAAAAGAGAAATAACGTTCCATGGGGCCAAGGAGGTGTCA

||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

Sequence 2 0289 GTTAAACTATACAAGAAGCTCAAAAGAGAAATAACGTTCCATGGGGCCAAGGAGGTGTCA

 

Sequence 1 0361 CTAAGCTATTCAACTGGTGCACTTGCCAGTTGCATGGGCCTCATATACAACAGGATGGGA

||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

Sequence 2 0349 CTAAGCTATTCAACTGGTGCACTTGCCAGTTGCATGGGCCTCATATACAACAGGATGGGA

 

Sequence 1 0421 ACAGTGACCACAGAAGCTGCTTTTGGTCTAGTGTGTGCCACTTGTGAACAGATTGCTGAT

||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

Sequence 2 0409 ACAGTGACCACAGAAGCTGCTTTTGGTCTAGTGTGTGCCACTTGTGAACAGATTGCTGAT

 

Sequence 1 0481 TCACAGCATCGGTCTCACAGACAGATGGCTACTACCACCAATCCACTAATCAGGCATGAA

||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

Sequence 2 0469 TCACAGCATCGGTCTCACAGACAGATGGCTACTACCACCAATCCACTAATCAGGCATGAA

 

Sequence 1 0541 AACAGAATGGTGCTGGCTAGCACTACGGCAAAGGCTATGGAACAGATGGCTGGATCGAGT

||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

Sequence 2 0529 AACAGAATGGTGCTGGCTAGCACTACGGCAAAGGCTATGGAACAGATGGCTGGATCGAGT

 

Sequence 1 0601 GAACAGGCAGCGGAGGCCATGGAGGTTGCTAATCAGACTAGGCAGATGGTACATGCAATG

||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

Sequence 2 0589 GAACAGGCAGCGGAGGCCATGGAGGTTGCTAATCAGACTAGGCAGATGGTACATGCAATG

 

Sequence 1 0661 AGAACTATTGGGACTCATCCTAGCTCCAGTGCTGGTCTGAAAGATGACCTTCTTGAAAAT

||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

Sequence 2 0649 AGAACTATTGGGACTCATCCTAGCTCCAGTGCTGGTCTGAAAGATGACCTTCTTGAAAAT

 

Sequence 1 0721 TTGCAGGCCTACCAGAAGCGAATGGGAGTGCAGATGCAGCGATTCAAGTGATCCTCTCGT

|||||||||||||||||||||||||||||||||||||||||||||||||||

Sequence 2 0709 TTGCAGGCCTACCAGAAGCGAATGGGAGTGCAGATGCAGCGATTCAAGTGA_________

 

Sequence 1 0781 CATTGCAGCAAATATCATTGGGATCTTGCACCTGATATTGTGGATTACTGATCGTCTTTT

 

Sequence 2 0760 ____________________________________________________________

 

Sequence 1 0841 TTTCAAATGTATTTATCGTCGCTTTAAATACGGTTTGAAAAGAGGGCCTTCTACGGAAGG

 

Sequence 2 0760 ____________________________________________________________

 

Sequence 1 0901 AGTGCCTGAGTCCATGAGGGAAGAATATCAACAGGAACAGCAGAGTGCTGTGGATGTTGA

 

Sequence 2 0760 ____________________________________________________________

 

Sequence 1 0961 CGATGGTCATTTTGTCAACATAGAGCTAGAGTAAAAAACTAC

 

Sequence 2 0760 __________________________________________

From global to local by comparing the recursion functions:

Comments on early submissions:

score = (0..last_row).inject([]) {|s,e| s << [(e == 0  ? e:e = 0) * Sigma]}

# This might work by accident but is very unclear.

# If what you want to say with the ternary operator is the score is 0

# in the first column then you don’t need the ternary operator at all

# s << [0]

# I suppose you could use

# s << [(e == 0 ? e : 0)*Sigma]

# but it is redundant.

# Putting an assignment inside a ternary operator is a bad idea.

# If you really need to allow a side effect like that then perhaps

# you should use a short cut logical operator like &&.  From

# the standpoint of clear, functional programming even this is a kludge.

Click here to have a similar paper done for you by one of our writers within the set deadline at a discounted

Click here to get this paper done by our professional writers at an affordable price!!

Help in Bioinformatic

Posted in Uncategorized

Leave a Reply