Part of being a FileMaker developer includes safeguarding data against deletion. In addition to disabling the dreaded "Delete All" command, one of the first things we do as developers is to find safe methods of deleting data. In a previous article, we discussed what it means to delete data in a system as well as the various ways to account for and recover that data. This article will discuss a few more ways to account for that data, and look into streamlining the data recovery process.
Before we get into deletion techniques, we need to first take a look into how we can account for data. This is not just limited to when a user deletes data, but also when they change it, who changes it, when they changed it, how often the data was changed, etc. There are myriad solutions for auditing, and the one we've found most effective is using a Changelog. This constitutes an auto-enter field using the custom function UltraLog().
In order for this custom function to audit the data, we need to specify what data it should audit. This means we need a dynamic list of fields that are being edited. The easiest solution to this is to use the FieldNames() function. Specifying the current layout will grab all the fields on the layout available for editing, and you can even further filter to get only the local fields. Another solution would be to use the Get(ModifiedFields) function introduced in FileMaker 13. This would only grab the fields that were actually edited, speeding up the process. Further, this also accounts for related records in their own tables. Unfortunately, neither of these methods are sufficient. The FieldNames() function only gets the fields from the current layout, so any fields that were set via the Set Field script step that are not on the layout won't be audited. The Get(ModifiedFields) improves on this, but misses auditing records in a found set when using the Replace All command, and also fails to audit the initial record commit (after all, if a record doesn't exist yet, it isn't being "modified", so the function returns nothing). Instead, we're going to go right to the source: the schema.
In a wonderful article by Andrew Duncan, he explains how to use the ExecuteSQL() function to query the Filemaker schema directly. This method doesn't require fields to be on a layout, evaluates on the initial commit, and doesn't require updating except for very specific use cases. Let's take a look at the function we're currently using in some of our solutions:
Let ( [
_dataFields = ExecuteSQL ( "
WHERE Lower(TableName) = Lower(?)
AND FieldClass = 'Normal'
AND FieldType NOT LIKE 'global%'
AND FieldName NOT LIKE 'zz_%'
ORDER BY FieldName ASC
" ; "" ; "" ; cf_GetTableName ( Self )
cf_UltraLog ( Self ; zz_ModifyTimestamp ; _dataFields )
In this SQL query, we take the simple Field Name from the list of all fields, filtered by the current table, Normal class only, excluding globals, and fields starting with "zz_". The current table is obtained from the custom function GetTableName(), which simply uses the Substitute() and GetFieldName() functions to obtain the simple Table name. The FieldClass column designates if a field is a normal field (holds data), a summary field, or a calculated field. While calculated fields can be stored, they derive from Normal fields, so we only need to include Normal for auditing. FieldType is the SQL equivalent of the data type (text = varchar, number = decimal, container = binary, etc). The FileMaker schema prefixes this column with the word "global" if it's global, so we remove all fields starting with that. For FieldName, we exclude any extraneous fields that we don't want to include for auditing. These are things like Modification Account/Timestamp, the Changelog itself, or possibly other data that will change every time someone commits a record. In our case, we have a convention to prefix these fields with "zz_".
All in all, this leaves us with only the fields that actually contain data. This list of fields changes with the schema, does not rely on layouts, and best of all works for the record's own context. This means it will work for related records as well as inactive records (such as when performing a Replace All).
Now that we have an auditing process, we can look at how to handle archiving data. There are two main methods of safeguarding data on deletion: log the data before deleting, or perform a soft-delete (hide from the user).
A soft-delete is implemented by creating a boolean (or flag) field, and adjusting privilege sets so that they cannot view records where the field = 1. This is the simplest method because the data remains right where it is, and to "recover" deleted data only requires the field be set back to 0. The complication comes in when dealing with scripts and relationships when Full Access privileges are enabled (either for a Full Access user or scripts which have elevated privileges). When building a system, it is necessary to develop with the soft-delete field in mind. All relationships for the table must include this field as a match, and every script (including non-FullAccess scripts which might be called as subscripts) will also need to account for it, such as when performing a Find. Users will also see that the Total Record count differs from the Found Count, even when attempting to view all records. This might result in confusion for the users.
Data logging is implemented by gathering relevant data and storing them somewhere else prior to deleting the record. This method is somewhat more complicated, but is substantially cleaner, can be developed into a dynamic design to ease the burden on the developer, and also leaves open the potential for automated data recovery. Unlike the soft-delete method which leaves records in the source table, no additional development considerations are necessary when creating relationships and scripts. Instead, the heavy lifting is done in the deletion script. This is the safeguarding method that we will be using here.
What data should be logged? This solution actually uses the same function as the Changelog with some slight modifications. First, the "FieldName NOT LIKE 'zz_%'" condition is removed, since modification data and Changelog should be included. Second, we'll return the "_dataFields" directly, instead of the UltraLog() function. This list of fields then can be passed into a recursive custom function to produce a list of the fields and their data:
Let ( [
thisDataField = GetValue ( dataFields ; 1 )
; restDataFields = cf_FullTrim ( RightValues ( dataFields ; ValueCount ( dataFields ) - 1 ) )
cf_PassParameter ( thisDataField ; Substitute ( GetField ( thisDataField ) ; "¶" ; "\¶" ) )
& If ( ValueCount ( dataFields ) > 1 ; ¶ & cf_GetDataDictionary ( restDataFields ) )
This is the definition for the recursive function called GetDataDictionary(). Based on the parameter "dataFields", it processed the first field in the list. The process creates a dictionary entry using the PassParameter() custom function using the field name itself as the entry and the field value as the entry value. It then passes the remaining values to the recursive function, restarting the process. When the recursion finishes the last field, the function passes everything back as a completed dictionary:
<:zz_CreateTimestamp:=5/5/2016 2/:51/:26 PM:>
<:zz_ModifyTimestamp:=5/6/2016 1/:27/:41 PM:>
This dictionary represents the entire record's data in text form. From here, the safeguarded data can be maintained in a variety of ways. The simplest is to add it to a system log, a table in the file that represents a transactional log. The deletion script simply creates a new record in the SystemLog table and stores the DataDictionary in one of the fields. After logging the data, script goes back and deletes the record.
The above process can be streamlined into dynamic scripts so that they can be reused for each table without having to create a new set of scripts each time. In order to do so, the data to be logged must be able to be generated dynamically from any context. We do this by convention. Each table has a field called "zz_DataDictionary" which is an unstored calculation of the recursive function above. Thus, to get the Data Dictionary for the current record, evaluate the following calculation:
If you go this route and have a lot of "data revenue" where records are routinely deleted, it might not be a bad idea to offload the SystemLog table to a separate file to keep your file size in check.
GetField ( Get ( LayoutTableName ) & "::" & cf_GetFieldNameShort ( _Template::zz_DataDictionary ) )
As above with the GetTableName() custom function, the GetFieldNameShort() custom function uses the Substitute() and GetFieldName() functions to obtain the simple field name. Then the Data Dictionary for the current record is extrapolated using the combination of the GetField() function, the current context's Table name, and the Field name from convention. For deleting portal records, follow the same process, but instead of using the Get(LayoutTableName) function, pass in the related table's name and evaluate using that instead.
One final aspect to consider is cascade deletion. While this solution does account for that, it must be manually maintained. The deletion script would need a section checking for specific table occurrences and grab the related Data Dictionary for each related record of each cascade delete relationship from it. Getting the related dictionary from each record is as simple as a List() function over the Data Dictionary field. For the relationships themselves, it's something that must be maintained by the developer, since relationships and their properties can't be queried or extrapolated.
Having a backup of data is good and all, but what's it all for? Why should we back up deleted data? Isn't that what my scheduled database backups are for? All are good questions, and whether implementing this is beneficial will really depend on your company or client's business structure and how they use their custom software application.
Full database backups are good, but there's a limit to how much can be kept on hand at once. Most companies will only keep backups for a certain time period. Depending on how far removed that data is, it could be long gone by the time anyone realizes they needed it. Having that data archived in some capacity will help in such a scenario. Furthermore, backups only happen at set intervals. A user could create and delete important data very easily within the span of an hour, and if the database hasn't been backed up in the time between, the data is indeed lost.
The most dynamic benefit of this method is that it opens the door for automated recovery. Requesting a developer to help get back some accidentally deleted data can be a difficult process, and while this implementation makes it very easy to find that data, automated processes can make it even easier. Such an implementation is beyond the scope of this article, but it is something we can delve into in future articles. The Data Dictionary implementation here incorporates the record data into a dictionary format, rather than simply as viewable text. This means that you could write a script to loop through the values, extrapolate the table and fields, and recreate the record, all from the click of a button on the system log. The benefits and complications of this will depend highly on the type of system and internal processes the company uses, but could potentially save hours of downtime over a database rollback.