2020. március 19., csütörtök

Exploring Monster Taming Mechanics In Final Fantasy XIII-2: Data Validation And Database Import

Continuing on with this miniseries of exploring the monster taming mechanics of Final Fantasy XIII-2, it's time to start building the database and populating it with the data that we collected from the short script that we wrote in the last article. The database will be part of a Ruby on Rails project, so we'll use the default SQLite3 development database. Before we can populate the database and start building the website around it, we need to make sure the data we parsed out of the FAQ is all okay with no typos or other corruption, meaning we need to validate our data. Once we do that, we can export it to a .csv file, start a new Rails project, and import the data into the database.


Validating a Collection of Data

Considering that we parsed out 164 monsters with dozens of properties each from the FAQ, we don't want to manually check all of that data to make sure every property that should be a number is a number and all property names are correctly spelled. That exercise would be way too tedious and error prone. This problem of validating the data sounds like it needs an extension to our script. Since we have the data in a list of hash tables, it should be fairly straightforward to create another hash table that can be used to validate each table in the list. The idea with this hash table is to have a set of valid properties as the keys in the table, and the values are regexes that should match each property value that they represent. These regexes will be more specific to each property, since those properties have already matched on the more general regexes that were used to collect the data in the first place. Additionally, every key in each monster hash should be in this template hash, and every template hash key should be in each monster hash. We could get even more detailed with our checks, but this validation should be enough to give us confidence in the data.

To get started, we'll build up the first couple entries in the template hash and write the validation loop. Once it's working, we can fill out the rest of the entries more easily. Here are the name and minimum base HP entries along with the validation loop:
PROPER_NAME_REGEX = /^\w[\w\s]*\w$/
NUMBER_REGEX = /^\d+(:?,\d{3})?$/

VALID_MONSTER = {
"Name" => PROPER_NAME_REGEX,
"Minimum Base HP" => NUMBER_REGEX
}

data.each do |monster|
VALID_MONSTER.each do |key, regex|
if monster.key?(key)
unless monster[key] =~ regex
puts "Monster #{monster["Name"]} has invalid property #{key}: #{monster[key]}."
end
else
puts "Monster #{monster["Name"]} has missing property #{key}."
end
end

monster.each do |key, value|
unless VALID_MONSTER.key?(key)
puts "Monster #{monster["Name"]} has extra property #{key}: #{value}."
end
end
end
This is a fair amount of code, so let's take it in parts. First, we define two regexes for a proper name and a number. The proper name regex is the same as part of our previous property value regex in that it matches on multiple words separated by whitespace, but it has two extra symbols at the beginning and end. The '^' at the beginning means that the next character in the pattern has to appear at the start of the string, and the '$' at the end means that the last character that matches has to be at the end of the string. Together, these symbols mean that the entire string needs to match the regex pattern.

The number regex is similar to the proper name regex, except that it matches on numbers instead of words. The (:?,\d{3}) group matches on a comma followed by three digits because the {3} pattern means that the previous character type, in this case a digit, must be repeated three times. This group is optional, so the regex will match on 1234 as well as 1,234. The number regex is also wrapped in a '^' and a '$' so that the entire string must match the pattern.

The next constant is simply the start of our monster template hash with "Name" and "Minimum Base HP" entries. What follows is the validation loop, and it is laid out about how it was described. First, we iterate through each monster in the data list that we have already populated with the monsters from the FAQ. Within each monster we iterate through every entry of the valid monster template. If the monster has the property we're looking at, we check if the property value matches the regex for that property. If it doesn't, we print out an error. If the property doesn't exist, we print out a different error. Then we iterate through every property of the monster, and if a property doesn't exist in the template, we print out another error.

If we run this script now, we end up with a ton of errors for extra properties because we haven't added those properties to the template, yet. However, from looking at the first few monster's outputs, it appears that the other checks are working, so we can start filling out the rest of our template. We can quickly add in the obvious properties, checking the script periodically to make sure we haven't gone astray. The mostly finished template looks like this:
PROPER_NAME_REGEX = /^\w.*[\w)!%]$/
NUMBER_REGEX = /^\d+(:?,\d{3})?$/
SMALL_NUMBER_REGEX = /^(\d\d?\d?|N\/A)$/
PERCENTAGE_REGEX = /^(\d\d?\d?%|N\/A)$/
LIST_REGEX = /^((:?All )?\w+(:?, (:?All )?\w+)*|N\/A)$/
FREE_TEXT_REGEX = /^\S+(?:\s\S+)*$/
TIME_REGEX = /^\d\d?:\d\d$/

VALID_MONSTER = {
"Name" => PROPER_NAME_REGEX,
"Role" => PROPER_NAME_REGEX,
"Location" => PROPER_NAME_REGEX,
"Location2" => PROPER_NAME_REGEX,
"Location3" => PROPER_NAME_REGEX,
"Max Level" => SMALL_NUMBER_REGEX,
"Speed" => SMALL_NUMBER_REGEX,
"Tame Rate" => PERCENTAGE_REGEX,
"Minimum Base HP" => NUMBER_REGEX,
"Maximum Base HP" => NUMBER_REGEX,
"Minimum Base Strength" => SMALL_NUMBER_REGEX,
"Maximum Base Strength" => SMALL_NUMBER_REGEX,
"Minimum Base Magic" => SMALL_NUMBER_REGEX,
"Maximum Base Magic" => SMALL_NUMBER_REGEX,
"Growth" => PROPER_NAME_REGEX,
"Immune" => LIST_REGEX,
"Resistant" => LIST_REGEX,
"Halved" => LIST_REGEX,
"Weak" => LIST_REGEX,
"Constellation" => PROPER_NAME_REGEX,
"Feral Link" => PROPER_NAME_REGEX,
"Description" => FREE_TEXT_REGEX,
"Type" => PROPER_NAME_REGEX,
"Effect" => FREE_TEXT_REGEX,
"Damage Modifier" => FREE_TEXT_REGEX,
"Charge Time" => TIME_REGEX,
"PS3 Combo" => FREE_TEXT_REGEX,
"Xbox 360 Combo" => FREE_TEXT_REGEX,
"Default Passive" => PROPER_NAME_REGEX,
"Default Skill" => PROPER_NAME_REGEX,
"Special Notes" => FREE_TEXT_REGEX,
}
Notice that the PROPER_NAME_REGEX pattern had to be relaxed to match on almost anything, as long as it starts with a letter and ends with a letter, ')', '!', or '%'. This compromise had to be made for skill names like "Strength +10%" or constellation names like "Flan (L)" or feral link names like "Items Please!" While these idiosyncrasies are annoying, the alternative is to make much more specific and complicated regexes. In most cases going to that extreme isn't worth it because the names that are being checked will be compared against names in other tables that we don't have, yet. Those data validation checks can be done later during data import when we have the other tables to check against. Waiting and comparing against other data reduces the risk that we introduce more errors from making the more complicated regexes, and we save time and effort as well.

The location property has an odd feature that makes it a bit difficult to handle. Some monsters appear in up to three different areas in the game, but it's only a handful of monsters that do this. Having multiple locations combined in the same property is less than ideal because we'll likely want to look up monsters by location in the database, and we'll want to index that field so each location value should be a unique name, not a list. Additionally, the FAQ puts each location on a separate line, but not prefixed with the "Location-----:" property name. This format causes problems for our script. To solve both problems at once, we can add "Location2" and "Location3" properties anywhere that a monster has a second or third location by directly editing the FAQ.

This template covers nearly all of the monster properties, except for the level skill and passive properties. We'll get to those properties in a second, but first we have another problem to fix. It turns out that the two location properties we added and the last three properties in the template don't always occur, so we have to modify our check on those properties slightly:
# ...
elsif !["Location2", "Location3", "Default Passive", "Default Skill", "Special Notes"].include? key
puts "Monster #{monster["Name"]} has missing property #{key}."
end
# ...
We simply change the else branch of the loop that checks that all properties in the template are in the monster data so that it's an elsif branch that only executes if the key is not one of those optional keys.

Now we're ready to tackle the level properties. What we don't want to do here is list every single level from 1 to 99 for both skill and passive properties. There has to be a better way! The easiest thing to do is add a check for if the key matches the pattern of "Lv. XX (Skill|Passive)" in the loop that checks if each monster property exists in the template, and accept it if the key matches and the value matches the PROPER_NAME_REGEX. This fix is shown in the following code:
LEVEL_PROP_REGEX = /^Lv\. \d\d (Skill|Passive)$/
# ...
monster.each do |key, value|
unless VALID_MONSTER.key?(key)
if key =~ LEVEL_PROP_REGEX
unless value =~ PROPER_NAME_REGEX
puts "Monster #{monster["Name"]} has invalid level property #{key}: #{value}."
end
else
puts "Monster #{monster["Name"]} has extra property #{key}: #{value}."
end
end
end
# ...
I tried to make the conditional logic as simple and self-explanatory as possible. I find that simpler is better when it comes to logic because it's easy to make mistakes and let erroneous edge cases through. If this logic was any more complicated, I would break it out into named functions to make the intent clearer still.

With this addition to the data validation checks, we've significantly reduced the list of errors from the script output, and we can actually see some real typos that were in the FAQ. The most common typo was using "Lvl." instead of "Lv." and there are other assorted typos to deal with. We don't want to change the regexes to accept these typos because then they'll appear in the database, and we don't want to add code to the script to fix various random typos because that's just tedious nonsense. It's best to fix the typos in the FAQ and rerun the script. It's not too bad a task for these few mistakes.

Exporting Monsters to a CSV File

Now that we have this nice data set of all of the monster properties we could ever want, we need to write it out to a .csv file so that we can then import it into the database. This is going to be some super complicated code. Are you ready? Here it goes:
require 'csv'
opts = {headers: data.reduce(&:merge).keys, write_headers: true}
CSV.open("monsters.csv", "wb", opts) do |csv|
data.each { |hash| csv << hash }
end
Honestly, Ruby is one of my favorite languages. Things that you would think are complicated can be accomplished with ease. Because we already structured our data in a csv-friendly way as an array of hashes, all we have to do is run through each hash and write it out through the CSV::Writer with the '<<' operator.

We need to take care to enumerate all of the header names that we want in the .csv file, and that happens in the options that are passed to CSV.open. Specifically, headers: data.reduce(&:merge).keys tells the CSV::Writer what the list of header names is, and the writer is smart enough to put blank entries in wherever a particular header name is missing in the hash that it is currently writing out to the file. The way that code works to generate a list of header names is pretty slick, too. We simply tell the data array to use the Hash#merge function to combine all of the hashes into one hash that contains all of the keys. Since we don't care about the values that got merged in the process, we simply grab the keys from this merged hash, and voila, we have our headers.

The .csv file that's generated from this script is a real beast, with 204 unique columns for our 164 monsters. Most of those columns are the sparsely populated level-specific skills and passive abilities. We'll have to find ways to deal with this sparsely populated matrix when using the database, but it should be much better than dealing with one or two fields of long lists of abilities. At least, that's what I've read in books on database design. I'm learning here, so we'll see how this goes in practice.

Importing Monsters Into a Database

This part isn't going to be quite as easy as exporting because we'll need to write a database schema, but it shouldn't be too bad. Before we get to that, we need to create a new Ruby on Rails project. I'll assume Ruby 2.5.0 or higher and Rails 6.0 are installed. If not, see the start of this Rails Getting Started guide to get that set up. We start a new Rails project by going to the directory where we want to create it and using this Rails command:
$ rails new ffxiii2_monster_taming
Rails generates the new project and a bunch of directories and files. Next, we descend into the new project and create a new model for monsters:
$ cd ffxiii2_monster_taming
$ rails generate model Monster name:string
In Rails model names are singular, hence "Monster" instead of "Monsters." We also include the first database attribute that will be a part of the migration that is generated with this command. We could list out all 204 attributes in the command along with their data types, but that would be terribly tedious. There's an easier way to get them into the migration, which starts out with this code to create the Monster table:
class CreateMonsters < ActiveRecord::Migration[6.0]
def change
create_table :monsters do |t|
t.string :name

t.timestamps
end
end
end
All we have to do is add the other 203 attributes along with their data types and we'll have a complete table ready to generate, but how do we do this efficiently? Conveniently, we already have a list of the attribute names as the header line in the monsters.csv file. We just have to copy that line into another file and do some search-and-replace operations on it to get the list into a form that can be used as the code in this migration file.

First, we'll want to make a couple changes in place so that the .csv header has the same names as the database attributes. This will make life easier when we import. All spaces should be replaced with underscores, and the periods in the "Lv." names should be removed. Finally, the whole line should be converted to lowercase to adhere to Rails conventions for attribute names. Once that's done, we can copy the header line to a new file, replace every comma with a newline character, and replace each beginning of a line with "      t.string " to add in the attribute types. They are almost all going to be strings, and it's simple to go back and change the few that are not to integers, floats, and times. I did this all in Vim, but any decent text editor should be up to the task. Now we have a complete migration file:
class CreateMonsters < ActiveRecord::Migration[6.0]
def change
create_table :monsters do |t|
t.string :name
t.string :role
t.string :location
t.string :location2
t.string :location3
t.integer :max_level
t.integer :speed
t.string :tame_rate
t.string :growth
t.string :immune
t.string :resistant
t.string :halved
t.string :weak
t.string :constellation
t.integer :minimum_base_hp
t.integer :maximum_base_hp
t.integer :minimum_base_strength
t.integer :maximum_base_strength
t.integer :minimum_base_magic
t.integer :maximum_base_magic
t.string :feral_link
t.string :description
t.string :monster_type
t.string :effect
t.float :damage_modifier
t.time :charge_time
t.string :ps3_combo
t.string :xbox_360_combo
t.string :default_passive
t.string :default_skill
t.string :special_notes
t.string :lv_02_passive
t.string :lv_02_skill
#...
# over a hundred more lv_xx attributes
#...
t.string :lv_99_passive
t.string :lv_99_skill

t.timestamps
end
end
end
Now, we can run this migration with the command:
$ rails db:migrate
And we have the beginnings of a monster table. We just need to populate it with our monsters. Rails 6.0 makes this task quite simple using a database seed file, and since we have the same names for the database attributes as the .csv file column headers, it's dead simple. In the lib/tasks/ directory, we can make a file called seed_monsters.rake with the following code:
require 'csv'

namespace :csv do

desc "Import Monster CSV Data"
task :import_monsters => :environment do

csv_file_path = 'db/monsters.csv'

CSV.foreach(csv_file_path, {headers: true}) do |row|
Model.create!(row.to_hash)
puts "#{row['name']} added!"
end
end
end
When we run this task, the code is going to loop through each line of the .csv file (that we make sure to put in db/monsters.csv), and create a monster in the database for each row in the file. We also print out the monster names so we can see it working. Then it's a simple matter of running this command:
$ rails db:seed
And we see all of the monster names printed out to the terminal, and the database is seeded with our 164 monsters.

We've accomplished a lot in this post with running some validation checks on the monster data, exporting it to a .csv file, creating a database table, and importing the monsters.csv file into that table. We still have plenty to do, creating and importing the other tables and relating the data between tables. That will be the goal for next time.