Case Study: Creating Data Generator
Published:
A custom tool built off of multiple python scripts was created for an in-house UX team to make it easier to create data-heavy prototypes.
Github location: Github link
NOTE: Updated 6 October 2022.
Scenario
When creating prototypes for data intensive products and services, the responsibility for the creation of dummy data falls on the UX architect. This makes sense because the UX architect knows the context for the prototype, such as:
- Who are the users?
- What are their needs?
- What problem(s) are we solving for them?
Issue
Creating dummy data can be very time-consuming depending on the types of data needed and the number of data points needed. Small amounts of data points are fairly quick and easy to create versus larger amounts of data points. For example, 10 names or six-character alphanumeric IDs are easier to create than 30 names or six-character alphanumeric IDs. More complex data points such as VINs and vehicle descriptions are also difficult to create even in smaller amounts. Creating these more challenging data points may require:
- Using client data from the system, which can be a legal concern if that data falls under the category of personally identifiable information (PII).
- Using multiple tools across the internet to create realistic-seeming data points.
Solution
To solve the issues above, I created a tool for my team that generates sets of dummy data for their prototypes. The solution offers:
- One stop shop for generating a variety of data:
- Names
- Contact information
- Dates
- IDs, more often than not there are only 3 types of IDs that appear in our prototypes. However, some products will have 5.
- Low chancs of exactly replicating PII sets that ppear in the company’s systems
- Eliminates copying and pasting of blocks of data where repetition could be missed and break the immersion for those interacting with the prototype.
This single solution enables our UX architects to create prototypes faster and with fewer criticisms and cycles of correction than previously.
Tool Anatomy
Data Generator is a tool based on multiple python scripts that runs on the command line interface (CLI). It produces a CSV file containing the following data points:
- Company Name
- Vehicle ID
- Make
- Model
- Year
- VIN
- License Plate
- Fleet ID
- Division ID
- First Name
- Last Name
- Email Address
- Phone Number
- Driver ID
- Street Address
- City
- State/Province
- Zipcode/Postal Code
Components
Data Generator relies on 2 python scripts and 6 text files.
Python Scripts
Script | Description | Dependencies |
---|---|---|
data_generator.py | controls the creation of dummy data and its compilation into a CSV file. | pandas, random, string, random_names |
random_names.py | generates names and locations. | pandas, random, string, datetime |
Text Files
File Name | Description |
---|---|
can_provinces.txt | combinations of 30 major cities in their provinces and with postal codes. |
first_name.txt | 2000 names, 1000 per gender. |
last_name.txt | 1000 common last names in North America. |
street_names.txt | 52 common North American street names. |
us_states.txt | combinations of 150 major cities in their states and with zipcodes. |
vehicles.txt | 131 combinations of vehicle models with their respective manufacturers. |
Remaining Issues
Depending on the service, some of the individual data points may need to be concatenated. However, our team prototypes in Axure, so for some widgets (such as repeaters) it’s easy enough to do this.
Example 1: First and last names. In a repeater, ensure first and last name have their own columns in the underlying table. If they are named FirstName and LastName, respectively, you can concatenate these fields like [[Item.FirstName]] [[Item.LastName]]
__ Example 2:__ Year, make, and model. These can be joined together in a similar manner as described for first and last names.
Some products also use more detailed vehicle descriptions beyond the combination of year, make, and model. These typically include information like number of doors, trim level, or engine type. Data Generator does not currently support that.
Some products may require additional ID numbers and descriptions not covered by Data Generator:
- Request numbers
- Request categories and subcategories
- Order numbers
- File attachment names and types (ex.: XLS, PDF, CSV, DOC)