Data recovery from a failing Aggregate server

Hi, @Mark_Schormann1!

I've written a basic post with some of the processes and tools I've used to help you with your Aggregate deployment. I'll be expanding on that topic there.

Regardless, I wanted to comment here some specifics about the problem you were having and the way I used some of the tools.

The main problem you were reporting was that you wanted to extract data from incomplete submissions. Normally, one would have used Aggregate to mark the submissions as complete and used Briefcase to pull/extract them, but you were experiencing issues with Aggregate that were preventing you from achieving that, and these issues were related to some structural flaws in the DataStore.

This is a rough list of steps we have taken to solve this situation:

  1. Make a complete backup of the DataStore data to a Google Storage bucket

  2. Use DSIO to extract all data from incomplete submissions:

    1. Get a list of all kinds in the DataStore with SELECT * FROM __kind__
    2. For all 4 forms present in Aggregate, download to a CSV file all rows from the main _CORE kind that had their _IS_COMPLETE field with a false value.
      GQL: SELECT * FROM %some_kind_prefix%_CORE WHERE _IS_COMPLETE = false
    3. Open those CSV files in LibreOffice and extract all different values of the _URI field
    4. Get all the rows of non-main kinds for all the different URIs that we got in step 2.
      GQL: SELECT * FROM %some_kind% WHERE _TOP_PARENT_URI = '%some_URI%'

    At this point, we have a complete backup and CSV files of the submissions we wanted to safeguard. We can move on to riskier actions. We focus on artificially marking all those submissions as complete.

  3. Use DSIO to extract to YAML files all incomplete submissions (only main kinds)

  4. Add a _MARKED_AS_COMPLETE_DATE: datetime field to the scheme in the YAML file's header block, in a new line before the _IS_COMPLETE: boolean field definition.

  5. Add a _MARKED_AS_COMPLETE: 2018-07-23T00:00:00.000Z field and value to all the entities in the file.

  6. Change the _IS_COMPLETE field's value to true of all entities in the file.

  7. Use DSIO to upsert the changed YAML file.

This updated the DataStore in a way that tricked Aggregate into thinking that those submissions were complete. This didn't stabilize Aggregate in a way that could be normally used but opened the option of using Briefcase to pull and export/push all these submissions, which was a huge win at this point.

4 Likes