We've addressed here the most common questions we receive about our technology and services. But don't hesitate to contact us if you have another question we haven't answered.
My expression system is codon optimized, it can’t get any better than that. Why do I need your service to make enhanced promoters?
Codon optimization is meant to optimize protein translation efficiency, ideally translating one protein or peptide molecule from one mRNA transcript. If your intent is to produce more protein, then your host cell needs more mRNA transcripts. Our enhanced promoters help your host cells make more mRNA transcripts so that your codon optimized expression system can make even more protein.
I already have a 1Kb promoter cloned into my expression plasmid, why do I need your promoter(s)?
Large promoter “regions” cloned upstream of a transcription start site are not optimized for spacing between the transcription factor binding site (TFBS) and the TATA box. As a result, these “regions” often show limited benefit in optimizing protein expression. TFBS are short sequences of DNA, usually 6-25nt, but promoters are typically thought of as a “region” because the specific binding sequence is unknown.
The Circularis method measures the activity of specific sequences that function as regulatory elements. Comparing the sequence back to a reference genome, we can exactly define the binding sequence, the spatial relationship to the TATA box, and the relative activity generated from that sequence. Moreover, promoters arranged in large, 1Kb regions are consensus sequence and are not optimized for transcriptional activity.
If I already have a known promoter, can I perform error-prone PCR and test myself?
With no knowledge of a transcription factor binding site, random mutagenesis analysis is limited by exponential explosion. For example, if we randomized a suspected promoter sequence of 1000 nucleotides, the total number of possible sequences is 4^1000 = 1.15x10^602, a ridiculous number by any standard. The longer the sequence to be mutagenized, the bigger the exponential problem. To make this more manageable, researchers select strategic regions to mutagenize and reduce the total number of sequences to experimentally evaluate. This reduction forces a trade-off between a manageable number of sequences to evaluate and the statistical likelihood that a novel promoter with specific activity will actually be found.
Let’s say you have a known TFBS of 10nt that you want to optimize. The total number of possible sequences is 1,048,576. The workflow to evaluate these potential sequences includes reporter gene (eg gfp) construction and cloning followed by flow sorting to narrow the analysis to the high expressers. Then re-culture to stabilize if your cells are still viable, followed by cloning your gene of interest into the expression plasmid. You may want to sequence the plasmids after sorting, in which case you would clone the promoters in with your gene of interest. Depending on your transformation/transfection efficiency, you’ll lose anywhere between 10-50% of your transcripts. Transform/transfect into your host, culture (1-3 days for bacteria, 4-6 weeks for mammalian cell culture), spread onto plates, pick colonies based on the detection scheme you have in your expression construct, then send the picked colonies out for sequencing. This process has been published many times, and involves grants no less than $500K and takes multiple years to complete. The process is capital equipment intensive, exceptionally tedious, extremely slow, requires multiple highly skilled scientists, and shows very poor efficiency.
Once an expression construct has been made (see question below), the Circularis method takes approximately 4-6 weeks and requires no flow cytometry, HPLC, mass spectrometry, automated cloning, and no colony picking. Our method requires a single reaction in a single population of cells. Culture times can vary between a few hours for bacteria or a few days for mammalian cells.
Can I combine RNA-seq with WGS and look at mutation sites in non-coding regions?
While the accepted consensus is that 90% of disease SNP’s occur in non-coding regions, identification of a SNP does not confirm that SNP is involved with transcription. This analysis gives no insight into causal variation, and provides even less information regarding the regulatory elements involved in a disease pathway. RNA-seq (or DNA expression arrays) will provide insight into differential gene expression levels, but provides no insight into the cause of the expression level changes.
The Circularis method is able to define differential promoter activity when comparing disease/healthy samples. We are able to point to exact sequences that are at the source of the changes in gene expression.
How does the Circularis method compare to ChIP-seq and similar methods?
Chromatin Immuno-Precipitation sequencing (ChIP-seq) combines immunoprecipitation with DNA sequencing to identify binding sites of DNA associated proteins. ChIP-seq is limited in a few different ways. First, the immunoprecipitation step requires an antibody with specificity for protein-DNA interactions. These antibodies are very rare, the ENCODE project approved only 20% (44/227) of commercially tested ChIP antibodies. The second limitation is that most of these antibodies have specificity for DNA-protein interactions that involve chromosome unraveling (eg histone acetylation and methylation). While this is valuable information regarding the first steps of gene regulation, this information provides no insight into TFBS involved with the actual transcription step. Another limitation to ChIP-seq is that the specificity of the binding site is limited to the binding epitope of the antibody, which can include a sequence as long as 100bp. While that helps, it still creates an exponential explosion problem when trying to enhance a promoter. See question above.
How do we tag the different promoter sequences ahead of time so we can match the high promoter strength results with the right sequence?
We actually tag the transcript with the promoter's own sequence so we get the promoter sequence and its strength from the same set of sequence data. We don’t use any fluorescent molecules, all in-vivo activity proceeds naturally.
How do we find all the promoters in the genome without prior knowledge? Do we have to start with a specific protein/gene of interest and then throw a bunch of promoters at it to see which one works?
We can build a library of genomic (or synthetic) DNA in one of our core constructs by fragmentation, linker addition and conventional cloning. We have used the average size of promoter (or enhancer) elements in the organism of interest as a guideline for deciding on the DNA size we want to focus on. When the library is introduced into the cells, fragments that contain regulatory elements make our special transcript, which can be recovered. Those that do not contain promoter sequences, do not make transcripts and are not represented in the recovered library. This allows us to find the promoter sequences and strengths without foreknowledge of the genomic sequence.
How do we handle new strains or species? Are there any special requirements?
If we do not have an expression system for a particular species, we build a new construct by placing our ribozyme core into your expression plasmid. To proceed with that part of a service, we require your plasmid map, transformation protocol, and any other special handling instructions. Once we’ve reviewed your map, we’ll review with our customers any required changes to allow the new expression system to incorporate our ribozyme. Once we’ve built the new construct, we test it in your host before moving forward with the project (ie discovery or optimization).
If we’re working with a species and a low efficiency transformation protocol, how does that affect the results? And how does that affect a project?
Efficiency of delivering a library into the host (eg transformation or transfection) is an important part of our workflow. In simple terms, the more fragments we can get inside the host the more unique transcripts we’ll get out of the host. The more unique transcripts we get out of your host, the greater the likelihood we’ll find the promoter(s) you need for your protein. After we’ve built your custom construct, we perform a test transformation to determine the efficiency. Customers can, at this point, decide to add additional transformations to the project, which will add time and cost to a project (specifics vary for every project).
An alternative is to generate transcriptionally active cell lysates from your host to perform in-vitro transcription.
How do we measure the protein levels to validate the strength of each discovered/optimized promoter?
After recovering and measuring the promoter sequences, we use GFP to measure protein expression at levels below 200% and ELISA’s (either maltose binding protein or beta-gal)for expression level above 200%. If protein validation is included as part of a service, we will use our customers protocol to measure their specific protein of interest.
Can you help me understand more about the plasmid construct and what allows it to circularize after transcription to protect itself? What are its limitations?
The plasmid used depends on the application. We have used a small (~2Kb), high copy plasmid from E. coli for most of our experiments so far. This can be used directly in E. coli and also allows us to produce enough plasmid for mammalian experiments. For our ongoing yeast proof of concept experiments, we have used a 2 micron circle episome. We have also built a low copy plasmid for use in E. coli and other related prokaryotes. In all of these plasmids is our proprietary ribozyme construct, which is what allows the RNA to circularize. This circularization is an essential part of our technology and is what allows us to selectively recover promoter sequences. As to limitations, a relative of our first generation construct has previously been used to generate circles with an insert of approximately 1000 base pairs. Due to its design, this construct is probably best for promoters below 1000 bp. Our second generation construct isolates the circularization motif from the inserts and should allow larger inserts to be used, more than 1Kb without problems. We have a third generation construct that uses a "variation on the theme" of our core technology that should work with promoters that are substantially larger (>5 to 10Kb). This third generation construct is currently in development, we expect it to be available for service use by the end of 2017.
In terms of multiple promoters, how can we do pathway discovery and optimization?
Once we have identified a set of promoters, pathway discovery can be done by knocking out specific promoters using silencing technology and looking for differences in the desired pathway. Alternatively, over-expression or silencing of transcription factors can be used to derive the relationship between these factors and the promoter's activity. This helps us understand how the promoter functions and how we can control it.
Pathway optimization can be done by methods such as adjusting pathway protein levels to match enzyme characteristics to increase protein concentrations of those enzymes that represent the rate limiting step(s). Alternatively, substitution of the promoters for these enzymes with a "tuned" set of promoters to produce a range of protein concentrations followed by a high throughput analysis to find the best set of promoters could be done.
Isn’t that was researchers do today for metabolic engineering and optimization?
The short answer is that current methods for pathway optimization and metabolic engineering involve a lot of guess-work of trial and error because researchers don’t have the tools to control the expression of each component in a pathway. Our tuned library of optimized promoters is that tool set.
Unfortunately, evolutionary forces were not focused on your protein production efficiency, they were focused on allowing the organism to adapt to its environment and survive. To optimize a pathway composed of (n) enzymatic steps, the expression level of each pathway component has to be randomized concomitantly to identify the optimal metabolic output for your protein. The number of different expression levels (x) per pathway component determines the resolution of the search for the optimal metabolic output. The total number of possible expression level combinations is x^n. For a 3 step enzymatic pathway, evaluated at 5 different expression levels, that’s a total of 125 cloning and expression experiments. With automation that is a very manageable experiment that can be performed in 1-2 weeks, but only if you have the tools to control expression. That’s how we speed up the process.
What are the licensing terms for Circularis promoters?
Novel promoters purchased as part of our commercially available libraries (eg T7, CMV, etc.) includes a one time, global license to use the promoters for research purposes only. Commercial use of promoters requires a license from Circularis. Contact us for more details.
Novel promoters generated as part of a custom service also include a one time, global license to use the promoters for research purposes only. Commercial use of promoters requires a license from Circularis.