Datasets in Central are now Entity Lists, please help translate!

We have removed the word "Dataset" in Central and replaced it with "Entity List." This means much of the text related to Entities will need to be translated again and we would greatly appreciate your help doing this before the next release in two weeks.

When updating text for French, I copied the previous translation and then only had small edits to make.

I found it generally easier to express the concepts using "Entity List" rather than "Dataset" and I hope you will too! (@mathieubossaert and other French speakers, feel free to make edits or discuss in this thread)

You should not feel like you have to literally translate "Entity List". In particular, if your language does not have a convention of capitalizing words to make concepts stand out, please don't match capitalization. In French, the most comfortable translation I found is closer to "list of entities."

We decided to make this change after many demos and conversations about this functionality. We initially believed that the very genetic word "Dataset" would be an asset but the feedback we've received is that it was difficult to connect meaningfully to "Entities." We previously had a broader view of what "Datasets" might do in the ODK world and we also thought we would differentiate more between Entities that are the subjects of forms (e.g. a tree you're collecting data about) and lists of values that act like metadata (e.g. a list of counties that trees could be located in). We now believe that focusing on lists of Entities no matter how they're used will make the concepts more approachable.

We will make a companion change to XLSForm to alias list_name in the entities tab to dataset. Forms that use dataset will continue working without any change.

These text updates are only end-user-facing. We will continue using dataset in the form specification, the Central API, and the internal Central implementation. We will update corresponding documentation to make it clear that end-user-facing systems in the ODK world use "Entity List" for this concept. Any other software that implements the specs could choose to continue using the more generic "dataset" or introduce other specialized language like "register", "task list", etc.

Thank you!

6 Likes

I do @ln . As you said, it is easier : only one new concept (the "entity") to consider individually or in a list (instead of a "dataset" that might be understood as another concept)
Maybe other French speakers (@thalie , @dickoah, @tgachet , @GuilhemD ...) could express their points of view :wink:

3 Likes

Hi!
I also validate the term "Liste d'entités", as @mathieubossaert said, "Dataset" ("Jeu de données" in french) can be confusing!
I haven't tested this concept yet because I would like to be able to update the entity list only via an import as an admin (not via forms) but I'm going to try it anyway :slightly_smiling_face:

2 Likes

If you haven't already, make sure to provide feedback in our poll about updating entity lists via import, @tgachet! Central Entity uploads from file

If you're feeling really impatient, I have bulk added entities with a Python script like this one using pyodk:

from pyodk.client import Client
import csv
import json

client = Client()

# The filename/path of a CSV. It must contain a column __id with version 4 uuids and a label column with the desired entity labels
# Other column headers must exactly match the names of properties in the entity list specified below
ENTITIES_CSV = "participants.csv

# The ID of a project on your server that contains an entity list with name matching the name below
PROJECT_ID = 1

# The name of an existing entity list that you want to populate
ENTITY_LIST = "participants"

with open(ENTITIES_CSV) as entities_csv:
    csv_reader = csv.reader(entities_csv)

    header = next(csv_reader)
    
    for row in csv_reader:
        body = dict()
        body["data"] = dict()
        for item in list(zip(header, row)):
            if item[0] == "__id":
                body["uuid"] = item[1]
            elif item[0] == "label":
                body["label"] = item[1]
            else:
                body["data"][item[0]] = item[1]
        
        r = client.post(f"/projects/{PROJECT_ID}/datasets/{ENTITY_LIST}/entities", json=body)
        if r.status_code != 200:
            print(r.text)        
Quick script with explicit property names

import csv
import json
import uuid
from pyodk.client import Client

client = Client()

with open('entities.csv', encoding='utf-8-sig') as f:
reader = csv.DictReader(f)
for row in reader:
first_name = row['First']
last_name = row['Last']

    entity = {'uuid': str(uuid.uuid4()), 'label': first_name + " " + last_name,
                'data': {'first_name': first_name, 'last_name': last_name}}
    print(entity)
    r = client.post('projects/<projectid>/datasets/users/entities', json=entity)
    print(r.text)

You could also make it dynamically use the column header names (done above) and update entities in a similar way using this endpoint. Note that the API will continue using dataset!

4 Likes

Thanks @LN for all these resources! I'm going to test all of this :grinning:

1 Like

Is this script still relevant ?

Yes! We are still a few days away from releasing bulk entity upload from the frontend. Even when doing that is possible, there will still be cases in which it's convenient to automate entity creation in this way. Note that you can do updates in a very similar way using the update endpoint.

We're always interested in learning more about how projects are making use of ODK broadly but particularly entities since they are in very active development. Consider taking a moment to introduce yourself and describe what you're working on!

1 Like

@LN thank you for sharing the two options! for me it worked the second one! I will be trying to make one for Update entities

1 Like