December 28, 2020

Parsing text to render custom react components

As I was building out my tips page, I ran across the need to parse some text (specifically shortcut sequences) to be able to display them with Chakra's kdb component.

The goal really was simple. Given some text like cmd+option+c I want to render it inline like this.

notion image

As mentioned, the keys themselves are just rendered with Chakra's kdb component like <Kdb>⌘</Kdb>. The + sign would need to be rendered manually between each character.

Setting out to do this, I really needed to do a few things

  1. Locate any shortcut patterns in the given text
  1. If shortcuts exist, parse out the shortcuts from the normal text
  1. Render the regular text and shortcut patterns with Kdb interlaced with + signs

Locating the shortcut

Note: If you consider yourself pretty experienced with regex, you probably won't get much out of this section. Feel free to skip.

Locating instances of shortcuts seems like a good job for regex. To start with, given that I'm not trying to do anything overly complex with the regex, I don't care about having capture groups that grab each of the individual keys. Splitting on the + after we find the shortcut is easier and will keep the overall regex simpler. To constrain the problem down a bit further I'll define a shortcut as one or more modifier keys (things like shift, command, option, etc) joined by + to a postfix key.

Let's just start with a matcher for the modifiers

/(command|cmd|alt|ctrl|shift|option|opt)/

That should match against our modifiers. It'll need a + after it so we'll add that and updated it so that we can capture multiple potential modifiers.

/((command|cmd|alt|ctrl|shift|option|opt)\+)+/

Adding \+ matches against the literal character + and surrounding that whole group in another set of parenthesis with a trailing + means it'll match against one or more modifiers.

The last part is adding the postfix key. There's a wide range of potential matches here, so again I'll avoid trying to be completely exhaustive and mostly focus on what I think will be most common. An important note is that this is representative of the key being pressed, not necessarily the character. So for example, the + character itself would never be represented because shift+= is what's actually pressed to generate +.

Like before we'll create a simple capture group with a list of patterns to match for the keys.

/(del|space|enter|esc|`|=|-|f1[0-2]|f[0-9]|[A-Z]|[a-z]|[0-9])/

Hopefully this group is relatively straight forward. I'll note a few special things here:

  1. I try to match against f10-f12 first so there's not a partial match against f1
  1. I've included a-z as well as A-Z just so a simple typo one way or the other wouldn't mess up the match

Putting these groups all together we get the final regex

/((command|cmd|alt|ctrl|shift|option|opt)\+)+(del|space|enter|esc|`|=|-|f1[0-2]|f[0-9]|[A-Z]|[a-z]|[0-9])/g

Here's a visualization of the final regex if you're into that kind of thing:

Regex railroad diagram generated from
Regex railroad diagram generated from https://regexper.com

Parsing text with shortcuts

For this next part, it's useful to have some text to test with. Let's use this as an example:

Pressing cmd+D in vscode will select the next occurence of the highlighted text

Just visually parsing this there are three sections:

  • The text "Pressing "
  • The shortcut "cmd+D"
  • The text " in vscode will select the next occurence of the highlighted text"

What'd I'd like to get is a highly simplistic AST with each of these sections separated out. The end result should be something like

[
	{"type": "text", "text": "Pressing "},
	{"type": "shortcut", "text": "cmd+D"},
	{"type": "text", "text": " in vscode will select the next occurence of the highlighted text"}
]

I could write a custom parser to generate out this AST (which is a fun exercise if you've never tried), but I happened to stumble upon a library called simple-text-parser which will do this exact thing!

First, we need to instantiate the parser

import { Parser } from "simple-text-parser";

const parser = new Parser();

Afterwards we need to add a custom rule for how to handle our shortcut. Rules in this library are (as the name suggests) pretty simple. We call parser.addRule with the pattern we want to parse (our regex in this case) and a callback for how to handle the parsed text. The callback can either return a string that's been transformed by the parsing rule or a custom AST node that can be used for more processing later. The latter is what I want to use. AST nodes defined by this library are incredibly simple, having only two required properties: type and text. The default type is text and the text property just contains a stringified version of the node. Other properties can optionally be added to the node if desired.

Let's put this all together and define the shortcut rule.

// Assume ShortcutRegex is the regex discussed above
parser.addRule(ShortcutRegex, shortcut => {
	const shortcutDisplay = shortcut
		.replace(/command|cmd/, '⌘')
		.replace(/option|opt/, "⌥")
	return {
		type: "shortcut",
		text: shortcutDisplay,
		keys: shortcutDisplay.split('+')
	}
})

The callback for addRule is only doing two things

  1. Doing some stylistic text updates to swap cmd/option for their prettier text counterparts
  1. returning the new shortcut node

The last thing to be done is to parse the text. We'll do so using the toTree method from the library (so that we get the raw AST to handle later w/ React)

const result = parser.toTree("Pressing cmd+D in vscode will select the next occurence of the highlighted text")
console.log(result) // produces...
/*
[
 { type: "text", text: "Pressing "},
 { type: "shortcut", text: "⌘+D", keys: ["⌘", "D"]},
 { type: "text", text: " in vscode will select..." }
]
*/

Rendering the results

The rest should be pretty easy to wrap up! We want to render the provided AST via react into something that looks decent for the end user.

import { Kdb } from '@chakra-ui/react'

export const TextWithShortcuts = ({ parsedResults }) => (
	<span>
		{parsedResults.map(node => {
			switch(node.type) {
				// If the node is text, just return it
		    case "text":
					return node.text
				// For shortcuts, map over the keys and build up their text
				case "shortcut":
					return node.keys.map((key, index) => (
						<>
							<Kdb>{key}</Kdb>
							{/* Only include the + if it's not the last key of the list */}
							{index < keys.length - 1 && "+"}
						</>
					))
				default:
					throw new Error("Unknown node type")
		  }
		})}
	</span>
)
 

That about sums it up! If you want to see how I used this technique, checkout the implementation of my tips page on github.

 
;