Google Sheet Publisher Issue - Somehow DB entities for publisher got into problem state

Hi @Jeff_Davids. I'm sorry that you are having problems! Can you please add yanokwa@nafundi.com as an owner on the Google App Engine account. Once I have access, I can take a quick look and see what is going wrong.

Hello @yanokwa. Thanks so much for being willing to look at this! I just added you as an owner on the Google App Engine account. I look forward to seeing what the issue is. Best.

A post was split to a new topic: Aggregate publisher problem

@Jeff_Davids, I'd recommend to back up your data using Briefcase as soon as possible. That way, you could even remove the form and submit it again, and then push the submissions back to the server

@yanokwa could you search for a full stacktrace on the logs? Also, if @Jeff_Davids can point you to a submission that isn't on the google sheets, could you locate it on the Data Store and see if there's any structural problem with it? (comparing to a submission that got to the google sheets)

So, I've traced the exception in the code to these lines in FormServiceCursor.constructExternalService(FormServiceCursor fsc, IForm form, CallingContext cc):

try {
  // ... some stuff
  return new GoogleSpreadsheet(fsc, form, cc);
  // ... more stuff
} catch (Exception e) {
  throw new ODKEntityNotFoundException("Somehow DB entities for publisher got into problem state", e);
}

Unfortunately, we can't pinpoint the source of this exception at this moment because upstream try blocks lose the cause of this exception.

Digging a little deeper, I can identify two broad possible causes for the exception:

  1. Problems with Google Sheet API
    This could be due to many things like exceeding quotas, network problems, expiration of credentials... you name it. It would be helpful to know if you have experienced any problem at all while working with Google tools lately.

    Worst case scenario, it could be due to irregularities in Google's service. I don't have specific prior experience with Google Cloud services but I've dealt with this kind of situations on Amazon and other cloud services and normally this is solved within 24h...

  2. Problems with data in the Data Store
    This could be due to exceeding quotas as well, but you mentioned that you've activated the billing plan. Worst case scenario, we can detect anomalies in the data itself that could justify an error.

    A quick test for this could be to pull submissions using Briefcase. If this works it's a sign that there's no problem in the Data Store

Hi @ggalmazor thanks so much for working on this! I'm pulling the intances right now to my computer using Briefcase so at least we will have a backup. Right now I'm at 40 or 9000+ so it might be a while... So far so good, but I'll let you know how it finishes.

I have several other forms that are properly publishing to both Fusion Tables and Google Sheets, so that makes me think the problem isn't with Google Sheets API, but I definitely will defer to you guys on this one! In fact, I just looked, and the Google Fusion Table publisher for the same form that we are getting the error on (S4W_Nepal_v1.01) is still being updated successfully.

Sounds like worse comes to worse, I'll have to (1) finish the Briefcase backup, (2) delete the form, (3) reload the form, (4) push the submissions from Briefcase, and (5) restart new publishers? Does that sound about right?

@ggalmazor and @yanokwa I'll defer to your judgement with regards to when it makes sense to try those steps vs. doing additional troubleshooting.

@Jeff_Davids Briefcase has an experimental pull option in the settings that can drastically speed up pulls. I'd encourage you to use that setting. If you don't see in it settings, update to the latest version of Briefcase: ODK Briefcase v1.9.

I agree with @ggalmazor that this problem is either an issue with the data store or the API. I've looked at the data store and it looks fine to me, so it's very likely the API (or how we handle intermittent errors in the API).

I don't think you'll have to do anything as intensive as delete all the data and re-upload. The back up is just so we know the data is safe on multiple servers. I'm optimistic that this is an issue that can be fixed without touching the data. We'll know more as we keep troubleshooting...

1 Like

Hi @Jeff_Davids, @yanokwa!

I wanted to write my insights after taking care of this problem for some days.

  • We found that a duplicated image was causing a crash on the service that makes the uploads for the publishers
  • There was a glitch on GAE/DataStore that would prevent me to find the duplicated rows. @yanokwa reports that this has happened to him as well with large datasets ¯\_(ツ)_/¯
  • We created a pair or publishers so that:
    1. one would publish only new submissions
    2. the other would publish old and new submissions.
  • We verified that the second publisher wouldn't go further than 700ish rows. We checked that we were getting crash reports on the logs whenever the publisher would retry uploading submissions.
  • We removed the oldest duplicated row
  • We verified that the second publisher now would continue its work until all submissions (8000ish) were uploaded to the Google Spreadsheet.
  • We also detected some TaskLock error messages but checked that those wouldn't prevent the publishers to complete their tasks

Insights:

@Jeff_Davids, could you confirm that everything goes well now?

Also, we can transfer you the ownership of the Google Spreadsheets we have created. That way you will save yourself the effort of repeating the process (it took more than 24hours).

1 Like

Hi @ggalmazor!

Thanks for all your work on this. Indeed, your summary and insights looks spot on.

I just took a look at the publishers and everything seems to still be working. If you could transfer ownership of the Google Sheet that has the "BOTH Upload Existing & Stream New Submission Data" action to s4w.nepal@gmail.com that would be great. Let me know if you need me to do anything for this process.

Hi @Jeff_Davids!

I transferred the ownership of the spreadsheet some days ago, could you confirm?

Hi @ggalmazor ! Yes, the spreadsheet is now available in our s4w.nepal@gmail.com account, but in ODK Aggregate it is still showing you as the owner. This isn't a big deal, but I wanted to see if this is how it will remain:

image

Hmmmm, I'll take a look. Maybe we can simply edit that information on DataStore to update the email address that's shown. It feels like that email there is just for show once the publisher has created the Google Spreadsheet...

OK, I have updated the field and now it shows your email.

1 Like

Sounds good. If it is just a cosmetic thing then it isn’t a big deal really.

1 Like

We are running into the same issue which explains the high charges in our other ticket.

The crash is happening so often that I can't even open the published forms pane for the particular form on the ODK Aggregate server.

I believe the part of your solution that fixes this is to 'remove the oldest duplicated row'. Is that all that is needed for the fix? This will need to be done in the GCP DataStore.

Also, when we publish data in the future, what can we do to prevent the issue from happening again?

Thanks!
Erik

Hi, @esurface! Yes, we can only provide this workaround so far. Unfortunately, we still haven't been able to determine how and why these duplicates are created, although it looks like AppEngine deployments using big forms with many repeat groups tend to be the ones that suffer this the most.

Hello there. Sorry to hear esurface that you are having this same issue. Yes, my recollection was that you delete the oldest duplicate record and that took care of it.

Ultimately, we ended up moving to a JSON publisher and built our own custom webapp with Python and MongoDB to avoid these issues. I'm not sure what the long term solution is for you.

Good luck!

jeff

Thanks @ggalmazor, @Jeff_Davids,

The Aggregate server cannot load the fusion table entries in the publish webpage so it is popping up the error message, making it impossible to remove the broken publisher.

Is it safe to delete the entries in the Datastore? We have both _fusion_table_2 and _fusion_table_2_repeat items. What do I need to delete there?

Is there another way you suggest deleting the publisher?

Thanks,
Erik

I answered my question by trying it out in a test environment and found that it is not safe to delete the item from the fusion table datastore. I deleted a different entity in the datastore and have replicated (instead of fixed) the issue

Moral: If you delete the fusion table in the datastore, ODK Aggregate will report that the datastore is corrupt (because it is!).

This is a separate issue it seems than the one for this thread. I created a new thread with the problem.

1 Like