Sample Project #3

Project Title

Plasmid Finder, by Susan Chen, December 2012

Project Objective

Create a graphical user interface that facilitates searches for plasmids and primers in our lab.

Problem: A graduate student had created spreadsheets of plasmids and primers as well as plasmid sequences in fasta files, but the fasta files were not linked to any the spreadsheets and required tedious searches by hand to find the needed data. There was no apparent way to quickly search for a primer or plasmid with all the attributes one needs for experimental work.

Solution: Create a Python program to read and pull up fasta files. I also wrote a script that searches for the fasta files and integrates various search requirements and then returns the information that matches these requirements. Lastly, I developed a graphical user interface to package these searches in a user friendly way.

Vision: I had originally envisioned a gui where a user can choose to search all primers or plasmids. If the primers button is clicked, the user should be able to enter information in any one or all combinations of fields. The information is then used to search the primer list and any hits then are displayed. On the plasmid side, the user should be able to specify any combination of attributes such as promoter, fluorophore, or gene present in the plasmid, and the identity and sequence of the plasmid hits should then be displayed.

Reality: Code always takes longer to write then you think it will. For the final project I have been able to extract information from the data source and store it in two files. Code has been implemented to search the file containing primer information and obtain the desired results. These actions have been implemented in a GUI that users can interact with. Missing is the plasmid side of things. I planned on doing something similar to what I did for the primer side (i.e., doing a combination of searches). Also, the plasmid sequence display should have highlighted features that make it easier to distinguish what the sequences actually are. The error handling and taking into account different user input possibilities is also lacking. The GUI is also limited in that users cannot click on the outputs of their search for further information.

Lessons Learned: Being able to write your own programs to further your personal research efforts is really useful. But I also have to think about what I should do myself. Is worth spending time and effort to write my own programs? There's lots of code out there, and so I don't want to reinvent the wheel! I'd like to do more programming, and especially want to be able to control laboratory instruments on the equipment I work with.

Program Details

There are three source files: JSO_MasterList_121212, JSO_Plasmids.fasta, fluorescent_sequence.fasta:

plasmid_list.py and primer_list.py read in the source data and rearranges/organizes them and then writes the output to a text file. There are no inputs for these two files since they are just scripts. After running them, there should be two text files: compiled_info_plasmid.txt and compiled_info_primer.txt.
plasmid_process.py and primer_process.py are the search scripts that are hooked up on the back end of the GUI. For plasmid_process.py, the "match_id" function takes user input "input" and also the dictionaries "description, selection, sequence, etc" generated from the "make_dictionaries" function. It then returns the values of the dictionaries that match the user input.
For primer_process.py, the search is broken into three different functions: "id_search" finds the sequences that match the id and returns the description and sequence of the match. "substr_search" matches a partial sequence from user input, and finds all sequences that matches the subsequence. And "desc_search" matches a fragment of the description with the description of the primer and returns the primers that match the search. Lastly, "combo" works when the user enter two fragments of information about the sequence and description of the primers and searches for primers that match both requirements and then returns the result.

Instructions

Run primer_list.py and plasmid_list.py to generate the files to search on
Run HESdnaReader.py to start the GUI
On the plasmid side:
-pressing update with no entry results in an error message
-entries should be in the form JSO#, such as JSO665
-if an entry is not present, such as JSO10000, then an error message results
On the primer side:
-pressing update with no entry results in an error message
-search on id will only occur if the other two fields are empty. Ids should be of the form: 27.0
-search on a description will only occur if the other two fields are empty. Descriptions should be lower case, and any substring of the description will work, such as yegfp. If the description does not exist, an error message will result.
-search on sequences will only occur if the other two fields are empty. A sequence entry should be lower case. Any substring of the sequence will work, such as gggccc. If the sequence does not exist, an error message will result.
-if the id is not known, and the user desires to enter information about the sequence and description, the intersection of these two inputs is found. If only the sequence information is true, the primers with that sequence information will be displayed. Likewise, if only the description information is true, the primers with that description information will be displayed. If both information is wrong, then an error message will result.

Project Title

Project Objective

Program Details

Instructions

Program Source Code

Data Files Used