An interjection here based on my use of Collect and camera apps - not aimed at being negative, but a word of caution...
In the past, by default I included geotags with images that I took when I was out in the field, so that if I needed to use them later I could import them to QGIS. I found that sometimes the location data was inaccurate - a number of images clustered together when they were 100s of metres apart. I eventually decided that I had probably taken the image before the camera app had a 'good' fix on my location (assuming it took the last known location?).
I stopped doing that and usually ran a GPS track while I was in the field and used Geosetter to geolocate images when I got back. The down side of that was battery life - running a track for 8 hours is a big ask, especially when it is cold. Keeping a battery bank going in the wet becomes challenging...
So now I use Collect for all fieldwork and in addition to my context-specific forms have a very simple form that just has Location, Image, Notes. Which works as a 'backstop' for any situation where I need to record something that I maybe wasn't expecting to.
This is a long way round (as usual for me) for exploring the potential downside of relying on geotags within the image, rather than using a geopoint from Collect. I am all in favour of 'fewer steps to get your data', but the advantage of using a geopoint widget as well as the Image is that you can be certain of the accuracy - and you could even relocate the image to the correct place if your GPS is dodgy (e.g. in dense tree cover, inside a building). It is also simple to import the data into your GIS. In QGIS I just need to add a delimited text layer and hey-presto my images can be viewed in situ [for QGIS: in the layer properties, set the form attributes for the image field to 'attachment' and give it the path to the images' folder - then you can use the identify tool to open the record form and view the image and the associated record] - obviously I'd need to add the relevant EXIF tags if I want the image to be genuinely getotagged).
Is there are situation where a camera app could provide an accurate location but a geopoint can't - maybe that would give you rogue data, if you assume that the EXIF data is correct but the phone hasn't got an accurate fix?
Apologies if this sounds like I'm trying to discourage innovation, but please bear in mind that some enumerators may not behave in the required manner to gather quality data (speaking from painful experience). If the enumerator assumes that the image will be geotagged but doesn't appreciate that it is rarely 'instant' location fix, you might end up with poor data, without being able to mitigate that. If you had a way of preventing an image being recorded until the location was fixed, that might be good (probably way beyond Collect's pay-grade?). Granted, new phones are much quicker at getting a GPS fix, but again, don't assume that all users of ODK are using the latest equipment (me for one).
This is what I would imagine is a 'safer' way to solve the feature request: add an option to process images in Collect to add geopoint data (if the form includes it) when the image is saved or geopoint collected - l think there are a few case-specific scenarios / pitfalls that would need to be thought through, but this may be a different way of getting a similar result, but always having an accuracy component (i.e. quality assurance) to your dataset.