Our approach to the collection of metadata was driven by the following principles: convenience for the experimentalist and seamless integration into the experiment workflow; benefit to the experimentalist by reducing the need for making notes on paper or other media and providing automatic performance of common calculations; reducing errors and fatigue in data entry by providing controlled lists from which the experimentalist can select variable values; and linking sample metadata in the spreadsheets with experimental data in a directory structure through an automated naming convention. For this kind of study, it is also important that the experimentalist has a place for descriptive notes in an unstructured text format for information that is not easily captured as machine-readable variables.
3.1 Data entry worksheets.
The experimentalist enters values for variables into two different interconnected worksheets as shown in Fig. 2, one that is templated as a calendar (the Calendar worksheet, shown in Fig. 4) and is used to capture daily activities performed on cell populations that result from a transfection, and another for input of transfection variables (the Metadata Template worksheet, shown in Fig. 5).
From the Calendar worksheet, when the user chooses the Activity Transfect, a Metadata Template is presented and data corresponding to the transfection are entered by the user. Every transfection results in a new worksheet. As specific data for a transfection are entered into the Metadata Template, a unique designator for that transfection is created by concatenation of key data entered on the worksheet according to the established naming convention, and the resulting transfection metadata worksheet tab is updated with the unique transfection designator. This new sample name automatically populates another worksheet, the Cell Samples list. For each subsequent new transfection, a new transfection metadata worksheet is created, resulting in another tab named with the new transfection designator, and the addition of a new cell sample name to the Cell Samples worksheet. The updated Cell Samples worksheet list populates the drop-down list of Cell Sample Name on the Calendar worksheet. Figure S1 shows a view of the Cell Samples worksheet containing cell sample names, and a view of the Calendar worksheet showing the drop-down list of cell sample names that can be chosen. If a new transfection is initiated that is similar to a previous one, the user can select an existing transfection metadata worksheet to serve as a template. Only new information needs to be entered into the new transfection metadata worksheet, and the result is an additional and uniquely named transfection metadata worksheet tab. Previously transfected cell populations can be selected from the Cell Sample Name drop-down list on the Calendar worksheet as activities are performed on them; some of those activities (Designate a New Clone, and Sort) result in appending the cell sample name to indicate the activity and assign a new cell sample; the new cell sample name is automatically added to the drop-down list. Thus, the drop-down list for Cell Sample Name includes all unique cell samples derived from every transfection. The most recently named cell sample appears at the top of the drop-down list.
The Calendar worksheet (Fig. 4) guides the entry of information about daily activities carried out with each cell population and serves as a planning and organizational tool for the experimentalist. The user can choose any day, current, past, or future in which to make entries. The Metadata Template (Fig. 5) captures information about the editing reagents and the transfection protocol and provides space for unstructured notes and images. In addition, functionality was embedded in the Metadata Template to provide some calculations that guide the experimentalist in setting up the transfection reaction conditions.
We found that interacting with the Calendar and Metadata Template worksheets required no more time or effort than entering written information into a notebook. Many features of the program simplified the recording of data and the organizing of the workflow including the use of drop-down lists for selecting options, the use of previously filled Metadata Template worksheets for similar transfection experiments, the inclusion of automated calculations in the Metadata Template worksheet, and the automated compilation of information from multiple worksheets in the form of report worksheets using built-in software functions.
Both the Calendar worksheet and the Metadata Template (including the metadata worksheets created for specific transfections) have space for the experimentalist to add free form text, so all information about the activities can be captured even if there is not a specific worksheet cell for it. In that way, all information deemed important by the experimentalist can be included in the worksheet for that experiment. In addition, data and information in the form of images, graphs, and screenshots from instruments or other sources, can be pasted into the Metadata Template worksheet. Free form text also can be saved as Notes on the Calendar worksheet and these notes then appear on the Data worksheet associated with the Calendar entry.
3.2 Restricted data entry
Both the Calendar worksheet and the transfection Metadata Template worksheet make use of drop-down lists which control and facilitate data input by limiting the values the experimentalist can select. The drop-down lists are pre-configured and reside in other worksheets. The purpose of the drop-down lists is to prevent the entry of unintended and erroneous values into the worksheet and to maintain consistent vocabulary. Drop-down lists can also simplify the experimentalist’s job of data entry. For example, the activities that can be chosen on the Calendar are limited to the entries on the ActivityList worksheet which are shown in Table 3.
Table 3
The activities that can be selected for any cell sample on any day from the drop-down list in the Activity column in the Calendar sheet.
Activity |
Transfect |
Feed w Ri |
Image |
Passage |
Sort |
Freeze |
Thaw |
Extract DNA |
Feed w mTeSR + + Ri |
Feed w mTeSr+ |
Discontinue |
Send out for analysis |
Drop-down lists can be modified in a controlled manner in several different ways. The drop-down list for Activity on the Calendar worksheet can be altered from the ActivityList worksheet simply by removing or adding activities to the existing list; the new activities will show up in the drop-down list in the Calendar.
The Cell Samples worksheet, which provides a drop-down list from which to choose the Cell Sample Name on the Calendar worksheet as described above, is modified and controlled through the program in response to user input as described in detail in Section 3.1 above. When cell samples are acted on in ways that modify them, their names are appended with information about the activity Transfect or Sort, or when the Designate New Clone button is activated in the Calendar worksheet. The continuously updated list of cell sample names is available in the Cell Samples worksheet and appears as a drop-down list in the Calendar worksheet (figure S1).
The entries for drop-down lists for the Metadata Template worksheet are contained in the Data validation criteria worksheet (figure S2). Figure S2 shows some of the lists of terms from which the experimentalist selects entries. We designed the Data validation criteria worksheet to contain a number of variables that we anticipated would be important to record, and for each variable we developed a list of likely choices. Transfection variables such as which guide RNA was used, or which electroporation program was used could be controlled by highlighting the appropriate cell on the Metadata Template and choosing from the toolbar Data / Data tools / Data Validation to select the list of options to appear in a drop-down list for that cell. Additions can be made to those lists when necessary for inclusion in the drop-down lists in the Metadata Template worksheet in a controlled fashion using the Protect and Unprotect functions.
Worksheets and cells within worksheets can be protected with a password to prevent unintended changes, and unprotected to allow intentional changes to be made. Protection can be managed from the Excel toolbar at the top of the page by choosing the toolbar tab Review / Protect Sheet. One can select which cells in a worksheet are unlocked, allowing the user to input text or values into those cells, select inputs from drop-down lists, or add unstructured text or images. In the example Metadata Template worksheet, the unlocked cells are colored orange as shown in Fig. 5. Indicated with blue arrows in Fig. 5 are input fields for free text notes and for pasting images of data, for example from a cell counter or flow cytometer. Some cells in the worksheet that are locked contain the results of calculations using numbers that are entered into other cells, allowing, for example, the calculation of solution volumes based on input values for concentrations, and concentrations of stock solutions required to achieve appropriate reaction volumes. The entries in these cells are in red font. Other locked cells contain formulas or logic tests that are used to provide warnings if, for example, the volumes of reactants are greater than can be accommodated in the transfection vessel. Other cells are locked for the convenience of the experimentalist because they contain values that are important variables associated with the protocol but aren’t expected to change frequently. Controlling the locked or unlocked state of individual worksheets cells is achieved by unprotecting the workbook with a password, and highlighting the cells and accessing the Protection tab in the Format Cells function with a right mouse click. The worksheet can then be protected again to avoid unintentional changes.
More details about controlling data entry are provided in README at https://github.com/usnistgov/TransfectionTracker.
3.3 Worksheets for organizing and comparing data.
Data for each transfection are saved on separate worksheets. All cell processing data for all transfections are compiled on the Data worksheet. Additional worksheets can be created to organize and report compiled data with filter, sorting, and query tools from the toolbar. Worksheets can be hidden using the Hide and Unhide function accessed with a right-click on the worksheet tab. This feature hides worksheets from view, allowing the user to focus on the current worksheets while still being able to easily access the data by unhiding worksheets. The Data worksheet (figure S3) organizes all data collected for all samples and activities entered in the Calendar worksheet in a table format. This feature facilitates converting the spreadsheet data into XML, JSON or CSV format for query in a dedicated database. It contains dates, cell sample names, activities, numbers of wells and well-plate size when appropriate, notes, and indication of the user and the user computer address as provided by the operating system if desired (see the Documentation worksheet in the TransfectionTracker1221.xlsm Excel file at https://github.com/usnistgov/TransfectionTracker for details of how to activate this feature).
Queries can be easily performed, and reports can be generated, by selecting the entries in the Data worksheet and choosing the toolbar function Data. A Query Editor dialog box appears and simple queries based on a selection from the Activity column can return the appropriate records on a new worksheets such as TransfectionsReport, FreezeReport, DiscontinueReport as shown in the example workbook, TransfectionTracker1221.xlsm, at https://github.com/usnistgov/TransfectionTracker. These worksheets contain all entries for all cell samples that have undergone these activities. By selecting from the Cell Sample Name column, worksheets can be generated that include all entries for all cell samples resulting from a particular transfection, as in the 20200113mChOCT4sg2Report worksheet. These worksheets provide a list of all activities performed over time on cell samples derived from a particular transfection. These report worksheets can be updated as new data are added to the Calendar (and therefor to the Data worksheet) using the toolbar option Data / Refresh. Two of these worksheets are shown in figure S4.
Alternatively, the data can also be organized by copying and pasting them into a new worksheet and then sorting and filtering the entries from the toolbar alphabetically, by date, or by other contents of the columns. A useful manipulation is to calculate the numbers of passages a cell sample has undergone. Five worksheets with names beginning with Passage#s are in the example workbook, each containing data from a different clone from one of the 3 transfections. From the …Report worksheet for each transfection, the rows of data that follow the Activity Transfect for a particular transfection date were copied into the worksheet. The data rows were then filtered based on entries in the Activity column that were relevant to passaging, namely Passage, Freeze, and Thaw. These entries provided a count of how many passages had occurred for this sample when it was frozen, thawed, or sent out for analysis. An example of this is shown in table S1. The passage number of a frozen bank and of samples used for genomic analysis is unambiguous as a result, or if ambiguities are discovered, they can be tracked down or noted. The numbers of passages that different cell lines have undergone can be easily compared to one another.
A worksheet entitled ClonesSummaries is part of the example workbook and the contents are presented as table S2. This worksheet collates many of the results associated with the clones created. Some of the data in the table were added through queries, other data were manually entered. The worksheet allows identification of which clones were analyzed and which were discontinued, the date that genomic analysis was initiated and the passage number of the cell sample at that time, and results such as copy number of the fluorescent protein sequence, copy number variants, results of PCR analysis in the insertion site, qualitative appearance of the cell line, and other analyses such as for mycoplasma and for STR markers. By adding the URL where the data for each clone is saved, it is easy to identify and access the primary data for each clone unambiguously. As the worksheet table shows, of 15 clones that were isolated, 5 were successfully carried through enrichment and expansion. Two of those were confirmed to have a large number of extra bases inserted into the genome downstream of the intended edit. Those two cell lines and a third had an observed copy number variant. One clone that appeared to have a normal genome required a greater amount of sorting and passaging to achieve a purified population profile, and had a tendency to differentiate. Since the example presented here contains data for just 3 replicate transfections, it is a too small a dataset to reliably indicate relationships between experimental details and cell line characteristics. However, it is clear that the level of detail of information that can be tracked facilitates comparison of clones with respect to characteristics such as numbers of passages, numbers of days between passaging, phenotypic properties, genomic abnormalities, etc. With a larger dataset, one could evaluate the predictability of the number of clones derived from a transfection protocol, and compare the effect of protocol changes, such as guide RNA, or parent cell line, on transfection efficiency. One could also assess, for example, whether larger numbers of passages or a higher rate of genomic editing in the population is associated with a greater frequency of genomic abnormalities.