Proposal: better large attachments handling via presigned upload urls for S3

Thanks for taking the time to write this up in detail, @mlazowik!

For context, I want to crosslink to an earlier post where you described this media you're collecting:

@spwoodcock what do you see as the biggest benefit of this approach for your needs? I don't think you have very large files, right?

I realize this is a bit of an unusual architecture! Here are some of the things that led us down that path:

  • Typical binaries that we are aware of are quite small. We've seen video up to several dozen megabytes but attachments larger than that have not been a target use case. We'll make sure to document this assumption more clearly and now that you've raised the issue I'm sure if others have this need they will chime in here.
  • Desire to keep the changes needed isolated to the server. We currently maintain several clients with a lot on their roadmaps and don't want to take on additional complexity there.
  • Desire to reduce the impact of issues with connection to the storage service. Because everything gets submitted to Central first, client users can't get error messages from both Central and the storage service. If something goes wrong with the connection to the storage service, only the Central administrator is affected. Because binaries continue to be ingested in the database, there is minimal time pressure to address issues related to storage service downtime (not an issue with S3 but could be with different systems).

Absolutely. We haven't considered this a major issue because as I mentioned above, very large files have not so far been a target use case. That said, are you expecting to support point-in-time recovery for the entire duration of your project? I'd generally expect a limited window for storing WALs, maybe augmented with periodic snapshots. You may also want to look into the UNLOGGED directive which was introduced a few postgres versions ago, that could be appropriate for the blobs table when using S3.

That sounds right to me. We'll discuss whether there might be a straightforward path to not having to keep entire files in memory. Like you said, this has implications for projects with small media files being concurrently submitted (much more typical usage).

Yes, agreed that this would be really nice to have.

While I completely see the appeal of this approach, it's not something that we are open to considering for the core at this time. It doesn't fit in thematically with our current priorities (see the roadmap) and implies future maintenance we don't want to take on at this time.

That said, those priorities may change over time, especially if we learn of other projects with large file needs.

You could consider doing this work in forks. If it ends up looking clean and maintainable with limited risk, it could shift our thinking on taking it on (but no guarantees).

Interesting! Software is hard!