Black Lives Matter. Educate yourself & friends. Support your local anti-racist action. ✊🏿 ✊🏾 ✊🏽

Words are very necessary

Web Development–

Notes on building a dictionary

Words are very necessary – Notes on building a dictionary

This is almost a transcript of the talk Words are very necessary I gave at the JAMstack_Berlin meetup. Almost? Yes.

There is is no recording from which I could transcribe, and I was sick in the week before I gave the talk, which is why it was a bit confusing and rough around the edges. So this is more like the better version of the talk, but it is not a talk anymore. Let’s get started.


This article is about a dictionary. A new one. A special one.

I will explain why I feel that it is kinda special in a second.

First things first and in case of my talks that’s often a disclaimer: I don’t want to take any credit for the content or the idea of the dictionary in question; or whatever, really. While I am working on this project it is not a project that is made for a privileged white dude.

As a start I will give a brief overview of the idea behind Self-Defined. Afterwards, I will talk a bit about the technical challenges of dictionaries in their data structure and how they can be mitigated. It will talk about Markdown and why it’s limited but great. About YAML and about compile steps to link definitions underneath each other.

I will finish with an outlook on future features of the project, going beyond a static dictionary.

What is Self-Defined?

Self-Defined seeks to provide more inclusive, holistic, and fluid definitions to reflect the modern world.

There are plenty of dictionaries. What makes Self-Defined so special? Most are describing the status quo with the words of those in positions of power.

Self-Defined seeks to change this. Created by Tatiana Mac, Self-Defined defines itself (sorry) as «a modern dictionary about us». The idea here is to reclaim language, to take words and describe them based on your, lived own experience. But also, to make explicit if a word is just to discriminate, to spread hatred.

It’s clear that no single human can, or should, do this work. It has to be the effort of a community of writers. And the definitions have to leave room for consideration and nuance. It is one of the core philosophies of the project that the «dictionary includes nuances and expresses that not everyone ascribes to terms exactly as they are».

This a grand task. A grand task that requires active contributions from people, usage and refinement.

In the beginning there was HTML

Have you ever had a project where you spent so much time on trying to figure out the tech stack in minute detail only to never build anything and have the domain you bought lying around for years and years? I sure have. Tatiana, luckily, was a bit more pragmatic.

Pragmatism, though, has its limits. When I started working on a build process the index.html was clocking in at about 930 lines of hand-written and/or -copied HTML. It worked, but it needed work.

Inclusionary Practices

It needed work because it suffered from two mayor problems: Creating new definitions involved quite a lot of manual work. You would need to find an existing definition being similar to your own, copy its markup, paste it at the correct alphabetical position somewhere in the file. Oh, don’t forget to link your entry in the table of content, or link other definitions if they might exist. Also, tabs or spaces? Both.

These steps are not only error-prone. In and off itself they are unnecessary to be done by humans. Computers are very good at sorting words alphabetically, or linking data.

We were forcing contributors to have technical understanding. And with it we made the self-confidence of editing HTML and so forth a barrier to entry. This is an antagonism to the ideal version of Self-Defined.

Reducing these barriers and working towards a an inclusive version of proposing definitions is one of the main goals we are currently trying to achieve.

To be honest: The current state I am going to present here is a step. A first step at that. Working with Markdown and – especially – YAML still requires domain-knowledge. The complete submission process and creation of definitions happens in a Git repository, hosted on GitHub. Contributors need to know hat Git works, and have a GitHub account, fork the repo, create a PR.

Still a lot of manual, technical work. Expect things to evolve here in the coming weeks.

But now let’s take a look at the steps we took already:

Building Blocks

I’ll start by showing our current tech stack. If you are accustomed to reading front-end job postings you might feel that it is a little to small. It isn’t.

Now, I am talking at a JAMStack meet-up and not a linguistics seminar. I have talked a bit about Markup already, I will do so a bit more. But this time in context of a static-site generator called Eleventy. Actually, Eleventy is our stack currently.

There is no API just yet, making this more of a JM stack. But that doesn’t sound nice, really. Hence I will talk about the API side of things in the feature outlook.

Eleventy

Eleventy is a static site generator invented by Zach Leatherman. It bills itself as «a simpler static site generator». That’s one reason why we chose it. The second one is that Eleventy takes no decisions for you how to build your site, you can use basically every templating language there is, or add any client-side framework—if you need it. Heck, you can even use any folder structure as long as you tell it where to look. Besides all that, Zach is a super-friendly person and using projects of super-friendly persons is a good fit for a project like Self-Defined.

In short, to keep a minimal code surface, having the project maintainable, while also making it extensible, Eleventy seemed to be like a reasonable choice. I say seemed to be like there is a catch. Cliffhanger.

Before we dive into our own implementation, I want to take a quick look at some of the core concepts that are used heavily in Self-Defined. This will not be an exhaustive tutorial, this would need to be a talk in its own right. The official documentation is rather handy if you want to dig deeper. Another resource I can recommend is Jérôme Coupé’s course Introduction to Eleventy.

Collections

The most core of core concepts is probably the collection. A collection bundles together a group of, say, blog posts that share a tag. To quote the docs, «a collection allows you to group content in interesting ways».

A very basic example that returns all posts might look like this:

eleventyConfig.addCollection("myCollectionName", function(collection) {
  // get unsorted items
  return collection.getAll();
});

Well, okay. That’s nice and all, but doesn’t do much. In Self-Defined we have to main collections. One for the table of contents and the other one for you might have guessed it, definition. I will talk about the table of contents later. But let’s have a look at the definitions collection:

config.addCollection("definedWords", collection => {
  return collection
    .getFilteredByGlob("./11ty/definitions/*.md")
    .filter(word => word.data.defined)
    .sort((a, b) => {
      return a.data.title
        .toLowerCase()
        .localeCompare(b.data.title.toLowerCase());
    });
});

Under the hood collection are arrays. They don’t have to be, but that’s their default state of existence, so to speak. Which makes them pretty familiar to work with, if you have worked with arrays in JavaScript before, as you can use all the array methods you came to like.

For the definitions collection I first get all Markdown files that are located in the definitions folder. Once done, I call .filter() to remove all definitions that are defined and are not in e.g. a draft state. As this is work for a dictionary I finally sort the items alphabetically using the localeCompare String method.

Once defined, a collection can be used in templates. In the case of Self-Defined, we use Nunjucks. But, as mentioned above, Eleventy is incredibly flexible regarding templating languages.

<div class="auto-grid">
  {% for definition in collections.definedWords %}
    {% include 'components/definition.njk' %}
  {% endfor %}
</div>

This makes up the main body of the dictionary. But there are some cases were we need to manipulate data before we display it. This is were shortcodes and filters come into play.

Shortcodes & Filters

Both are very similar in usage and outcome. Think of them as a friendly pair of siblings for your data processing needs in templates. Or something like this.

Both are defined, as collections are, in the Eleventy config file:

eleventyConfig.addFilter("makeUppercase", function(value) {
 return value.toUpperCase();
});

In Nunjucks you can use a filter by using the pipe operator (|):

<h1>{{ name | makeUppercase }}</h1>

The only difference from a syntax perspective is that you pipe the value into a filter, whereas you append the value to a shortcode:

{% user firstName, lastName %}

I will show some more involved examples in the Advanced Definition Content section below.

But before we dive into advanced topics, lets take a moment to look at the basics.

Basic Definition Content

As it turns out, the data structure of a dictionary is complicated. Especially since Self-Defined tries to provide nuance and fine-grained information.

The definition content itself was straight-forward to add via Markdown. After all, that’s plain text. Markdown is very good at handling plain text.

We have decided on a basic structure that makes up the markdown part: From one sentence up to multiple paragraphs for the basic definition. Let’s look at a simple example. This is the current definition of cisgender:

of, relating to, or characterised by being a gender that matches the gender they were assigned at birth.

The opposite of [transgender](/#transgender).

Rendered, it looks like this:

Definitions might also include sections which provide more information. This is the current definition of crazy:

mentally deranged; demented; insane.

#### Issues

Crazy is very commonly used as an adjective to embody a vast array of ideas, often not specifically. It is used so frequently that it sometimes is a filler. Crazy can also be used in a derogatory manner for someone with mental or psychiatric disabilities.

#### Impact

By using ableist language, we are perpetuating violence against people who experience mental or psychological disabilities. Using this language perpetuates those systems and language of harm, regardless of our intent.

#### Usage Tip

Be more specific. Typically we can find an alternate definition by simply reflecting on what emotion we're really feeling.

It might be here that the true value of Self-Defined shines: encouraging positive change. If a word is used in a discriminating way or context it isn’t helpful to just say «Stop». You’ll need to explain the impact, if possible provide alternatives, that people can use instead.

We want to provide a resource that is useful to educate, to provide alternatives where needed. And make it good-looking, too.

Advanced Definition Content

Some of this usefulness can not be expressed with Markdown itself. Or, it probably can. But one objective I tried to solve with the discussed infrastructure to have less hacks. Not more. So, Markdown in and off itself was out of the picture.

But there is a solution in Markdown. It is called Front Matter: a block of YAML data prepended to the file.

From a usability perspective, working with Front Matter is certainly not the holy grail. But, it allows us to have a layer that is mostly pure data and build the presentation layer upon this.

To illustrate this I have chosen two key features. They are not the only features we solve with Front Matter. To get the complete picture, take a look at our Front Matter documentation page.

Getting back to the rendered version of crazy above, there are two elements I have omitted in the Markdown snippet. The flag at the top, and the list of alternative words at the bottom.

Flags

Flags are prominent visual clues that help to mark some words. Currently there are three levels.

The first one, is the one we already saw. A big red «avoid».

I think this flag is pretty self-explanatory. Do not use this word. It causes harm.

The next one is a tool in one or maybe many oppressive tactics. Here, as an example, is the definition of White Fragility, a behaviour linked to white supremacy. These tools are not dangerous as words, you shouldn’t stop using them, they are even helpful in an analysis. But you should certainly stop to act that way.

And lastly, as Self-Defined tries to be a helpful resource, we also provide a hint if something is an alternative for a wird that you should avoid, labelling it as a better alternative.

The flag for crazy, in Front Matter looks like this:

flag:
  level: avoid
  text: 'Ableist Symbol'

Splitting the text and the level gives us greater freedom in working with the levels and separate content from design.

Having the flags as data points rather than text makes rendering them a whole lot easier. We use a shortcode for this, which maps the level to a CSS class and concatenates the text of the flag with a defined text of the level.

config.addShortcode("definitionFlag", flag => {
  const cleanText = new Map([
    [
      "avoid",
      {
        class: "avoid",
        text: "Avoid"
      }
    ],
    [
      "better-alternative",
      {
        class: "better",
        text: "Better alternate"
      }
    ],
    [
      "tool",
      {
        class: "tool",
        text: ""
      }
    ]
  ]);

  if (flag) {
    const info = cleanText.get(flag.level.toLowerCase());

    // use an em dash to separate both texts 
    const sep = flag.text && info.text ? "—" : "";
    const text = flag.text ? [info.text, flag.text].join(sep) : info.text;

    return `<p class="word__signal word__signal--${info.class}">${text}</p>`;
  }

  return '<p class="word__signal"></p>';
});

Let’s break this down a bit. At first I created a Map in which I store some information about the single flags.

Afterwards I get the information corresponding to the current flag (cleanText.get(flag.level.toLowerCase());) and construct the inner text of the flag element, e.g. «Avoid—Ableist Slur», as we have seen above.

Alternative Words

Now for alternative words. On the surface, it’s simply a list. But, they have one special feature: auto-linking! If one word in this list exists in our definitions, a link should be added to the list item. Again, a thing you could do by hand. If you want to get mad.

In the end the list will look like this:

From a data perspective it’s a list. As we don’t want to maintain the links manually we omit them in the definition:

alt_words:
  - abundant
  - bizarre
  - enormous
  - ludicrous
  - outlandish
  - ridiculous
  - unbelievable
  - unexpected
  - unfamiliar
  - unreal
  - scary
  - shocking
  - strange
  - wicked

As I’ve shown in the intro to Eleventy above, we have defined a collection that contains only defined words. These collections are arrays and arrays can be searched. To achieve this we use a function that looks like this:

const findExistingDefinition = (word, collection) =>
  collection.find((item) => item.data.title === word);

But where does the collection come from? And the word?

The template part for this list looks like this:

<h4>Alt Words</h4>
<ul class="list-semicolon">
  {% for word in definition.data.alt_words %}
    <li>{{ word | linkIfExistsInCollection(collections.definedWords) | safe }}</li>
  {% endfor %}
</ul>

linkIfExistsInCollection is a custom filter we’ve added to our Eleventy config.

There’s nothing completely complicated happening in it:

config.addFilter('linkIfExistsInCollection', (word, collection) => {
  const existingDefinition = findExistingDefinition(word, collection);

  if (existingDefinition) {
    return `<a href="${makeItemLink(
      existingDefinition.data.slug
    )}">${word}</a>`;
  }

  return word;
});

After passing in in a word and a collection (in this case definedWords), the filter searches the collection. If the word in question is defined the returned markup is a link to the item, otherwise the filter returns the word unchanged.

The same technique is applied for the Table of Content, which currently includes many undefined words.

To wrap up the technical part, I will take a closer look at it.

Table of Content

This was probably the most complicated part, because this is the place where all moving parts connect. We have a list of words, some of which are not defined. We show flags for words that should be avoided. The index has to be sorted alphabetically, and also splitted into sub-sections. And then there are words which have sub terms, which are a list in the list. Fasten your data-seatbelts. This is going to be fun. Or I have become mad. I don’t know.

Let’s start by taking a look at sorting and splitting items.

The code making this work has this comment above it:

// NOTE (ovlb): this will not be remembered as the best code i’ve written. if anyone seeing this has a better solution then the following to achieve sub groups of the definitions: i am happy to get rid of it

Luckily, while it might be horribly inefficient, no user will ever suffer from it, as the good thing about static sites it that it will run at build time.

Documentation

As always, speaking about documentation is speaking about things that are not quite there yet. Especially examples and a concise guide for contributions are currently missing.

At least that makes for a good bridge to the last part, in which I will talk about things that aren’t there at all.

Feature Outlook

Self-Definitions

What I just showed you is a dictionary, it is nice but it is missing its key feature: Definitions of self. Our goal is that users are able to collect the words that describe them and get a link they could share, add to Twitter or Instagram bios, About Me pages.

The design and UX of this feature is still in progress. You will, most likely, be able to collect and search for words that define your identity and be able to share it an URL like https://www.selfdefined.app/defined?by=bisexual&depression&anti-fascist.

API

How do we protect this? This is not only a question we’ll have to ask ourself while working with the API, but for the project in general. There is an ever more hateful, loud and organised racist, anti-feminist movement on the Internet; in our society at large. They will not be pleased once they find out about this project.

Twitter Bot

I have, briefly, discussed the importance of providing information and context for words above. The bot idea aims in this direction: In heated (or normal, if there is such a thing) online discussions there often is no time to explain basics or have a shared definition of words in the first place.

Tweeting «@SelfDefinedBot define racism» will return a link to the definition making shared communication easier. Hopefully.

The same pattern might also be useful as a Slack bot, which listens for slurs and provides better alternatives.

… and beyond

We as a team are constantly learning and can be incredibly happy that there is a steady influx of people contributing – many among them first time contributors to open source projects. We are quite certain that they will propose not only definitions, but also features that we can’t even think of.

Outro

With that I am almost done. Just one, or two, last things:

If you want a word defined, please help out. If you know someone for whom this might be useful resource, please share.

And if you encounter discriminatory behaviour in your social context, please speak up. Changing humans is a daunting task, but one in which we as the privileged people in this society will have to take our part. It’s too late to stand on the sidelines and hope things will work out. They won’t. At least not for the better.

Thank you.

Other things I’ve written

Previous post

Inclusive Inputs