Note 1: Experimental design: power calculations
At the beginning of the planning phase of an experiment, consider the resolving power required of it to answer your biological question. This will allow you to design cost- and labour-efficient experiments. In the simplest case, you might only be interested in the binary classification of growth/no growth, e.g. when testing the essentiality of genes in specific contexts. In another setting, the goal might be to characterise growth differences down to a couple of percentage points, e.g. when identifying genetic variants of wild strains which have subtle effects on growth. The number of replicates required to address these two questions differs drastically. You can design your experiment appropriately by calculating the number of required replicates based on the minimal effect size you are trying to detect, the noise of the method (5-10% is a good, conservative estimate in general) and the desired statistical power (chance of correctly rejecting the null hypothesis). This can be done using the stats.power module of the python statsmodels package or the power.t.test function in R, or other online tools. For example, to have an 80% chance of detecting a 5% difference in growth at 5% noise (standardised effect size = 1), with a p-value cutoff of 0.05, you would require 17 samples in each group (two-sided Student's t-test). You also need to consider that correcting for multiple testing will in the end decrease the statistical power of your test. This effect is of course stronger the larger your strain library is. In the case of the Bonferroni correction (which is not recommended), you can easily calculate this loss of power by dividing your alpha by the number of tests. For preferred FDR methods, this is not straightforward and requires you to make an educated guess.
Note 2: Experimental design: plate layouts
T-tests generally require the samples to be independent. What this means in practice is often not entirely clear, and it will be up to you to decide the details of your experiment design. In general, we would not consider replicates located next to each other on the same plate as independent as they are subject to the same pinning errors as well as local nutrient, moisture and temperature regimes. It is therefore not recommended to generate replicates using 1-to-4 or 1-to-16 multiplexing pinning programmes where a single 96 plate is replicated into 384 or 1536 respectively. Instead, replicates should be obtained by recombining library plates in different combinations into the assay plate, mixing up the extract location and neighbours.
Colonies grow in competition and lack of neighbours usually results in increased growth. This is commonly observed as an edge effect where plates on the borders grow bigger. For the same reason, colony screens are sensitive to empty areas of the plate, effectively creating internal edges. If library plates are only partly filled, we recommend filling empty spots with control strains or random strains. This is less important if empty positions are scattered and more so if entire corners, areas, rows or columns are empty.
Despite generally discouraging empty positions, we strongly recommend leaving one individual, unique position per library plate empty. Such footprints serve two important purposes. Firstly, they are negative control positions which should always be empty, if they are not this indicates a source of contamination. Secondly, they aid identification of specific plates in the case of multiple assay plate layouts. Images will not contain the labelling information written on the side of the plates, so in case of a mix-up, the footprints can provide crucial clues to the identity of the plate.
Note 3: Preparation of layout files
Preparing the layout of your assay plates is a key task that will require some programming/data processing, at least for larger experiments. This is best done with layouts in long format, i.e. using a table with at least three columns (Row, Column, Strain) where every line of the table describes a single colony position. For every library plate in 96 format, take a note of the position into which it is pinned onto the combined plates. When pinning from 96 to 1536 format, there are 16 of these positions (rows 1-4 and columns 1-4). Let’s call the row position pr and the column position pc. The row and column position of a colony on the assay plate (1≤ar≤32 and 1≤ac≤48) is then related to the position (1≤sr≤8 and 1≤sc≤12) on the source plate by:
ar = 4*(sr-1) + pr
ac = 4*(sc-1) + pc
Using this formula, transform the row and column values for each 96 library plate depending on their target position. Create a layout for the grid plate and include it too. Then, concatenate the tables for all plates and sort by row and column.
Note 4: Preparation of suitable agar plates
To achieve high data completeness and low technical noise, it is crucial that assay plates are flat, without bubbles and of suitable dryness. For this, ensure the following:
- Always let the media cool down to approximately 60°C before pouring plates. Otherwise, the contraction of the agar during cooling will result in unwanted ripples on the surface. When potentially temperature-sensitive drugs are to be included in the assay plates, add these when the media has cooled down and right before pouring to minimise the exposure to high temperatures.
- Always pour plates on a level and even surface. In our experience, this can be done on a standard lab bench without a sterile environment, as long as the plate lid is immediately replaced after pouring.
- Always add a consistent amount (we use 40ml) of agar medium to each plate. This will result in plates with a consistent height and also avoid other artefacts. Thicker plates mean more nutrients are available to each colony which will change the colony size. Use a serological 50ml pipette and take up 5ml more than required. This will prevent bubbles. If you spot any bubbles, suck them back up with the pipette.
- Plates should be dried for a consistent time (we use 45 minutes without lids) before use. Alternatively, closed plates can be left unwrapped on the bench overnight which in our experience results in plates with a suitable dryness for immediate use. Plates can be stored in the fridge/cold room if not used immediately but will require extra drying before use.
Note 5: Preparation of the EDT table
Preparing a correct and complete experimental design table (EDT) is a key requisite for obtaining results with pyphe-analyse. All plates in your experiment should have a unique ID and this should be clearly written on the edge of the plate (not on the bottom where it would show up in the scanned images). For example, plate IDs could follow the format date_layout_condition_replicate. While scanning, take a note of which plate ID corresponds to which image in the scan sequence in table format. This table can then later easily be transformed into the EDT required by pyphe by adding extra columns with additional meta-data. The final table must be in csv format and the first column must contain the unique plate IDs. There must also be a column called ‘Data_path’ which points to the image data file produced by pyphe-quantify. Any additional meta-data, such as date, condition, library versions, comments, batch information, can be included and will be carried through to the colony report. Please see the Documentation folder in the pyphe github repository for an example file. Please note that there is no need for all the data files to be located in the same folder, which is convenient if you have large experiments containing several batches. Generally, file paths should not contain spaces or non-standard characters or those with special meanings in the terminal (%,>,?,/,”,*,&, etc.). Use _ or . or - to separate words.
Note 6: Quality control: pinning
Even with precise robots, the transfer of colonies by pinning can be prone to errors which, if unspotted, will result in missing or wrong data. A common problem that occurs when target plates are uneven is that entire areas or corners of the plate have no colony inoculums. Such pinning errors are dangerous as they could result in colonies which are absent for technical reasons to be interpreted as unviable phenotypes. Pyphe helps spot these errors by detecting missing control colonies and setting all neighbouring colonies to NA, but this only occurs after the damage is done. We therefore strongly advise to check every plate for missing corners and correct pinning by eye. Have one or two spare plates at hand to repeat transfers that have failed. If the problem persists, it can help to increase the target plate pinning pressure. With the Singer RoToR, it helps to avoid pinning errors if plates are consistently placed and pushed into the same corner in their holder.
Note 7: Quality control: image analysis
Pyphe-analyse produces a qc image for every image analysed. It is crucial to look at the images, even for huge datasets, to make sure colonies have been correctly identified and correctly matched to their grid positions. Issues with colony detection can usually be remedied by adjusting the threshold parameter (using the --t parameter), setting a hard threshold (using the --hardImageThreshold parameter) or using local thresholding (activated with the --localThresh parameter). The last is especially recommended for images with uneven brightness, e.g. those obtained with the Singer Phenobox. By default, pyphe-analyse excludes very small objects. If your colonies are very small, please adjust this exclusion parameter using the --s or --hardSizeThreshold parameters. For gridding issues, it can be worth to switch from automatic grid placement to manual one. Please see the pyphe-analyse documentation.
Note 8: Normalisation strategies
Pyphe-analyse gives the user several options for normalisation strategies. If neither the --rcmedian nor --gridnorm options are set, no normalisation is performed and the raw data from the plates is simply aggregated and summarised. For screens without a reference grid, pyphe can still be used (with row/column normalisation only or no normalisation). However, lowest noise values are obtained if both options are set. In that case, grid normalisation is performed followed by row/median column normalisation. This second normalisation can correct an artefact of the grid normalisation method which slightly over-corrects colonies next to the edge. This is essentially because colonies just off the edge are compared to colonies on the edge and therefore appear relatively smaller (see Figure 1—figure supplement 2B and Appendix 2 of (7)).
However, row/column median normalisation can and should only be used if the majority of strains in each row and column show no effect (i.e. the null effect can be reliably estimated by taking the median). This is usually the case for library screens where most of the mutants behave like wild-type and there are only a few outlying ‘hits’. For wild strain libraries, this case is harder to argue but it could still work if your strains are arranged randomly. It certainly will not work if your grid strain grows differently to the rest of your library (because only some rows/columns have a lot of replicates of the grid strain). In those cases, performing row/column normalisation after grid normalisation will do more damage than good. Pyphe-analyse produces qc plots for every plate analysed and you should inspect these carefully to check that the normalisation is working as expected.
Note 9: Quality control: biological and technical noise
Biological noise, technical noise and experimental errors (incorrect plate preparation, mis-labelling, pinning errors) will impact your data and can result in wrong conclusions if they go undetected. We therefore highly recommend to perform extensive quality control to quantify the unexplained variation and spot experimental errors. The use of negative-control positions (footprints) is key and plates with contaminated footprints should be discarded from the analysis. Furthermore, we include a number of control strains on every assay plate. One easy way to achieve this is to include an additional 96 grid of control strains on each assay plate. These colonies are not used during reference grid correction but are expected to have fitness values of 1. Based on these internal controls, it is possible to calculate two key noise indicators: First, the coefficient of variation (CV), the ratio of the standard deviation of the corrected fitness of these control colonies to their mean is an excellent indicator of the level of noise present in the assay. We usually exclude all plates which exceed a certain CV threshold (e.g. 10%). Secondly, the fraction of unexplained variance (FUV) is the ratio of the variance of the control strains and the variance of all strains. I.e. an FUV of 1 indicates that the spread of values is equally broad in the control as it is across the library, which can indicate that the observed variation across the library strains is purely technical. A suitable cut-off for exclusion of individual plates and conditions will depend strongly on your library. Certainly, an FUV of greater than 1 would be highly unusual and deserved further investigation. It can also be of value to exclude plate and conditions which show very small uncorrected colony sizes. This would indicate that the stressor included was too strong or the nutrients did not support any growth. In these cases, grid correction can introduce artefacts as small colony sizes are extremely noisy. These QC steps differ greatly between experiments, so they need to be performed manually on the pyphe-analyse long data report, removing spurious lines or setting them to NA. Once completed, you should proceed with hit calling using pyphe-interpret.