Documentation Plans

A new Documentation project has recently been started that will eventually replace the docs and information that are currently in several different places (odk site, wiki, mailing list archives, people's brains, etc.).

I've been hired by Nafundi to manage this new documentation project.

Here's a little bit of the current plan...

  • We will follow the Docs as Code philosophy

  • Docs will be written in restructured text, a lightweight markup language similar to markdown (Why aren't we using markdown?)

  • Docs output will be built using Sphinx, a static site generator specifically designed for complex documentation projects like this

  • Doc source files will live in a github repo

  • One unified set of docs for all the components of ODK

  • Over the next couple weeks I will have a more formal plan along with a "how to contribute" doc, which will be the first new docs published with the new doc platform.

  • We'll start moving over old docs that are still relevant, useful, and accurate. As we do this we'll find plenty of things to fix, rewrite, or remove.

  • Contributions and GH issues welcome.



I'd like to volunteer to help with a translation workflow if that is helpful.

1 Like

That would be great!

What's your experience with this?
Do you have expertise (and bandwidth) to take the lead on it?

(My two cents on this:
Sphinx has a weird approach to dealing with translations,
and I can imagine a better way,
but I'd love someone who knows what they're doing...)

So I've been involved in this for a couple different projects (InaSAFE and LearnOSM). I wouldn't say the past ones have gone amazingly (as in I'm 100% satisfied), but LearnOSM has translations in 19 languages now. I actually need to set this up for Cadasta in the next week or two. I'm doing a different workflow for the actual doc writing using Gitbooks. My plan was to use CrowdIn for the actual translation so I can report back on how that goes. I've worked on projects that used Transifex before and the translators were generally not all that happy with it.

Not to throw a wrench in things, is there a reason not to use something like Gitbooks for the process? I appreciate the idea of "Docs as Code" and think the ideas behind it are reasonable. I'm just wondering what the best process is to ensure people that aren't highly technical could still possibly get involved writing docs.

I can take the lead as far as setting up a translation workflow since I'm exploring this for another project now.

1 Like

is there a reason not to use something like Gitbooks for the process?

Primarily because I think Sphinx is the best option available for complex, semantic, highly interrelated documentation.


Since it's still early on, I'm open to looking at other options. Gitbook looks attractive, but I'm personally wary of "we make it easy and do everything for you" solutions. Maybe it's the pedantic nerd in me, but I feel like that always ends with frustration and "it's not really able to do that."

IF there's a strong story around translation with gitbook, that would be worth looking at. I'd also like to know how things like glossaries, internal references, semantic markup, prose linting and so forth are (or could be) handled.

've been involved in this for a couple different projects

What tools and toolchains are being used? You mentioned gitbooks, but are you then pulling things out into something else for translators. How automated is the process?

(amw) Sphinx has a weird approach to dealing with translations,

Sphinx does this insane thing where you treat paragraphs/blocks as localizable strings -- as if a giant collection of prose is an app UI.

Here is an example of how this looks in a real repo and here is a video from the maintainer of that repo, talking about the process.

(amw) ...and I can imagine a better way

I think a sane approach would be to have a directory for each document, with files: en.rst, sp.rst, ru.rst, etc, and then some scripting that would build these into separate Sphinx (or whatever) sources as part of a build process.

This "ideal" process is unideal in that it requires, I imagine, more development than I imagine. (It could quickly turn into its own project. A project which should probably exist, but which I certainly don't have time to build, manage, and maintain.)

So we used the Sphinx workflow with InaSAFE. The translators hated it, because they were missing the full context of what they were translating. The Gitbooks/Crowdin workflow is used by DjangoGirls and they have been successful with their translations which was why I was looking into it.

Who is the audience for the documentation? Users or Developers? I can be a pedantic nerd myself, but I think it depends on the docs and if the main audience is devs or users.

i'm not convinced the user/developer distinction is terribly meaningful, but to the extent it is, the answer is: both.

The docs need to serve:

  • implementers (people who install Aggregate on a server, set up forms, and manage the process within an org that uses ODK)
  • users (people who use Collect in the field, and people who use the analytics and visualization functions of Aggregate)
  • developers (people writing code for our repos, people writing code that connects with our apps, people writing alternative implementations for our standards, people writing extensions and customizations)

Each individual doc artifact will be more or less useful to each of these groups, but we need a unified set of documentation that all of them can refer to.

Perhaps I should have asked "who is expected to contribute to this documentation?". The people reading it probably don't care too much about the magic happening in the background.

The people reading it probably don't care too much about the magic happening in the background.

Yes. But it is a question a lot of people always want to ask about docs. So I thought I'd just go ahead and say: I think that's not a useful question because the answer is, "everybody."

Perhaps I should have asked "who is expected to contribute to this documentation?"

Aha. Yes. This is a good question.

I honestly don't know.

I suspect developers and implementers will be the most likely to contribute, along with some (power) users. And I expect that no matter what we do to make things easy and encourage contribution, the bulk of the docs will be written by me. (Though I would, of course, be happy to be surprised on this point.)

I don't see Sphinx as a hindrance to contributors, but I could be wrong about that. I know there are markdown partisans who claim that anything else is too hard or onerous. But the structure provided by rst and Sphinx is (I think) needed on a project this large and complicated.

I wouldn't see Sphinx necessarily itself as a hindrance to contributors the two possible pain points I see:

  • People managing to make pull requests in Github (though I think this can be mitigated by simply having people use the web interface for small changes and have instructions on this). I think most contributions of this nature might just be simple corrections
  • Translations: As mentioned before translators in my experience don't like the way Sphinx does it because they lose the context.

I think the translation thing is something to think about. In the ICT4D space they are more crucial in my experience than the general tech space. ODK's use all over the world in a variety of these. I don't know much about the current translation community around ODK though. @LN would you be able to enlighten us on that?

1 Like

@danbjoseph mentioned here that a rough outline/structure would help with addressing the repo count question and I think it would be helpful for all these great questions @wonderchook is asking, too.

@adammichaelwood what do you think of doing a little bit of outline/structure iteration as an input to some of these discussions? I think the challenge with the current documentation is the lack of "glue" between the different parts. There are a number of different ways those connections could be established and those might affect the desired tooling. For example, I can imagine a structure in which the Collect widgets are documented alongside the XLSForm syntax that produces them. Alternately (or additionally?) I can imagine a Collect user guide that is targeted at someone who trains enumerators and doesn't write forms. That narrative could be entirely separate from the form building guide. In the later case, maybe something more approachable like gitbooks is the right tool for the purely user-facing component.

@wonderchook, to answer your question about current translation efforts, there are about 140 people involved in ongoing translations of Collect. Some languages are more active than others but I'm sure many would be interested in translating documentation.

That said, I'm not entirely sure what docs would be high-value to translate. My experience is that many organizations want training guides that relate to their specific forms so there may not be a ton of value to translating user-facing documentation. It may be the case that, for example, translating the XLSForm spec would be highest value.


Whatever the current state of docs translations is, I definitely want to make translation as easy as possible. I'm open to gitbooks, if the other things I mentioned above can be handled.

I'm working on a getting started guide, and then will try to get a sketch of the larger "outline" out. Soon.

1 Like

I'm also not entirely opposed to figuring out how to do translations the right way in Sphinx. It's definitely a need in the documentation community.

@wonderchook Can you point to a public repo (source, not built output) of a gitbook driven doc set w/ translations, as an example?

1 Like

I think it would be useful to find a right way to do the translations in Sphinx. I personally don't see myself having a ton of time to help with that.

Here is the Django Girls Repo.

1 Like

I think investing some time in exploring possibilities for translation with Sphinx and seeing what's possible with gitbooks would both be great.


I've spent time in the last week or so playing with Sphinx translation and also gitbook.

I have a hard time liking gitbook. Markdown is lighter weight and a little easier to write than rst, but markdown and gitbook:

I basically knew that going in, but I wanted to spend a little time playing in gitbook to see how annoying that is. I think it will end up being a big problem, one that increases over time.

So -- What about translation in Sphinx?

  • The conventional option is well tested and debugged, but has some annoyances for the translators.
  • I am working on an option that I think will be better for translators, but will (as any software project would) require a little bit of work to iron out. Early experiments are promising, though. (I need to put my experiment up on GH so you can see what I'm doing. Shortly...)

My suggestion, therefore:

  • I finally forge ahead with writing a few pages of real docs in the new repo, in Sphinx.
  • Once I have a handful of interlinked and interdependent pages, lets fork the repo and try to get the simplest possible translation toolchain up and running and accomplished for each option.
  • Then we can assess what the problems, inconveniences, etc are and see which ones we can mitigate, live with, etc.


1 Like

I like this plan, Adam. Translation is important, but simply having good cohesive docs will be a huge win and I don't want to prevent that work from going forward. I vote that we keep going and we'll try your translation toolchain with real translators and adjust accordingly.

Sounds reasonable to me. I think as long as there is a plan at some point to make the translation process easier for people. Otherwise we'll likely end up with weird translation forks of the documentation. Perhaps not the end of the world, but I've worked on projects where it got a bit unmanagable.

1 Like