Case Study: Creating Data Generator

4 minute read

Published:

A custom tool built off of multiple python scripts was created for an in-house UX team to make it easier to create data-heavy prototypes.

Github location: Github link

NOTE: Updated 6 October 2022.

Scenario

When creating prototypes for data intensive products and services, the responsibility for the creation of dummy data falls on the UX architect. This makes sense because the UX architect knows the context for the prototype, such as:

  • Who are the users?
  • What are their needs?
  • What problem(s) are we solving for them?

Issue

Creating dummy data can be very time-consuming depending on the types of data needed and the number of data points needed. Small amounts of data points are fairly quick and easy to create versus larger amounts of data points. For example, 10 names or six-character alphanumeric IDs are easier to create than 30 names or six-character alphanumeric IDs. More complex data points such as VINs and vehicle descriptions are also difficult to create even in smaller amounts. Creating these more challenging data points may require:

  • Using client data from the system, which can be a legal concern if that data falls under the category of personally identifiable information (PII).
  • Using multiple tools across the internet to create realistic-seeming data points.

Solution

To solve the issues above, I created a tool for my team that generates sets of dummy data for their prototypes. The solution offers:

  1. One stop shop for generating a variety of data:
    • Names
    • Contact information
    • Dates
    • IDs, more often than not there are only 3 types of IDs that appear in our prototypes. However, some products will have 5.
  2. Low chancs of exactly replicating PII sets that ppear in the company’s systems
  3. Eliminates copying and pasting of blocks of data where repetition could be missed and break the immersion for those interacting with the prototype.

This single solution enables our UX architects to create prototypes faster and with fewer criticisms and cycles of correction than previously.

Tool Anatomy

Data Generator is a tool based on multiple python scripts that runs on the command line interface (CLI). It produces a CSV file containing the following data points:

  • Company Name
  • Vehicle ID
  • Make
  • Model
  • Year
  • VIN
  • License Plate
  • Fleet ID
  • Division ID
  • First Name
  • Last Name
  • Email Address
  • Phone Number
  • Driver ID
  • Street Address
  • City
  • State/Province
  • Zipcode/Postal Code

Components

Data Generator relies on 2 python scripts and 6 text files.

Python Scripts

ScriptDescriptionDependencies
data_generator.pycontrols the creation of dummy data and its compilation into a CSV file.pandas, random, string, random_names
random_names.pygenerates names and locations.pandas, random, string, datetime

Text Files

File NameDescription
can_provinces.txtcombinations of 30 major cities in their provinces and with postal codes.
first_name.txt2000 names, 1000 per gender.
last_name.txt1000 common last names in North America.
street_names.txt52 common North American street names.
us_states.txtcombinations of 150 major cities in their states and with zipcodes.
vehicles.txt131 combinations of vehicle models with their respective manufacturers.

Remaining Issues

Depending on the service, some of the individual data points may need to be concatenated. However, our team prototypes in Axure, so for some widgets (such as repeaters) it’s easy enough to do this.

Example 1: First and last names. In a repeater, ensure first and last name have their own columns in the underlying table. If they are named FirstName and LastName, respectively, you can concatenate these fields like [[Item.FirstName]] [[Item.LastName]]

__ Example 2:__ Year, make, and model. These can be joined together in a similar manner as described for first and last names.

Some products also use more detailed vehicle descriptions beyond the combination of year, make, and model. These typically include information like number of doors, trim level, or engine type. Data Generator does not currently support that.

Some products may require additional ID numbers and descriptions not covered by Data Generator:

  • Request numbers
  • Request categories and subcategories
  • Order numbers
  • File attachment names and types (ex.: XLS, PDF, CSV, DOC)