When a geological rift forms, scientists see discovery and possibility. What will you discover?

Generating Sample Data for Testing

Throughout the process of developing a solution, testing each aspect of your file is vital.  For me, the problem I always used to run into, was generating enough sample data to properly test my scripts, calculations and performance.  Well, that was until I found Fake Name Generator.

What Fake Name Generator Does

Fake Name Generator lets you generate from 1 to 50,000 sample records…with just about every option you could possibly think of.   Choose your Name Set, Country, Gender ratio (how many male vs female names in the set), even the age range.  Select which fields you want to include in the records (currently, they have 18 different fields available).  And you can choose between 11 different Output formats. The various formats include variations of .txt, .sql, .csv and .htm.

Why Sample Data is So Valuable

While this article is not intended to be the “be all and end all” on the subject, I often receive sample files from people asking for help with a performance problem…and those files have no data in them, or only a few records.  Looking at the file, I can see that as soon as the person puts this file up on FileMaker server through their hosting provider and/or add a few thousand records, there is going to be a painful slow-down in performance.

Yes, testing with a few records is better than not testing at all.  However, if you find aspects of your database solution that are causing bottlenecks, you can identify it immediately and fix it…before you get to far into the project.  Changes near the end of a development cycle can potentially cause a ton of extra work having to rework or rewrite a large portion of the solution.  All this, possibly, avoidable if you could have tested it’s functionality earlier in the development process.

An example:

After setting up your tables and fields, based on the structure you decided to go with after doing some brainstorming with paper and pencil, you add a portal to your main table.  That portal is based on a related table that will eventually hold a large number of related records for each record in the main table.  For this example, let’s say the main table is Employees and the related table is Expenses.  You have your relationship setup to sort based on the Expenses transaction date.

Depending on the industry or type of business you are work in, that Expenses table may hold thousands of reimbursed expenses for each employee.  Ok, you throw in 10 Employee records and, maybe…5 Expense records for each Employee to test the portal.  Awesome, everything works perfectly…at least locally.

Fast forward a few months, development is just about complete, you convert the existing data from the old solution and import it into your newly developed solution file(s).  You open the file to demo it to your boss, and almost every record you look at in the Employee table takes 10-15 seconds to load.  You can already feel the temperature rising, as your boss begins to tap his/her feet and ask, “Why is it taking so long?”

Back to the drawing board!!!  You dig through the fields and script looking for the possible hang up.  And you notice that relationship for the portal on the main page is sorted and a number of scripts have the Refresh Window () step flushing the cache.  The short of it, all of the expense records from that employee  would need to be downloaded every time FileMaker has to evaluate the relationship.  Download, sort, display…Download, sort, display.  At the local level (especially with very few records), you probably wouldn’t even notice it.  But with a decent amount of records, across a WAN…you may ‘notice’.

The Conclusion

Here is the short of it.  Get yourself a hefty amount of sample records to test with.  You don’t need to put all of them in every solution you develop…but at least have them available to use.  Go ahead and create yourself a sample data file in FileMaker.  Then it becomes as simple as importing the records into your file and hammering away at your design to test for potential problems.

Even that alone, can save you development time and make the development process an overall better experience for yourself and your client or boss.


2 responses

  1. You should give a try to http://www.generatedata.com/ which I think is more flexible.

    April 29, 2011 at 10:47 AM

  2. Sounds like it could be useful. Thanks for the tip.

    The reason I use Fake Name Generator, shear volume of data. You can generate up to 50,000 records…and have 3 orders at a time (150,000 at once).

    Generatedata.com seems to limit you to 200 records…unless you contribute, then it is 5,000. Although, it does look like it will help get longer text fields…which is good if you are testing a CMS.

    I really like to test a solution the way it may run 3 years down the road, when it has 500,000 records and a ton of related data. FakeNameGenerator.com gets me there pretty quickly.

    Someone on another site also mentioned Brian Dunning’s sample data sets. He has files with up to 350,000 records for free…and more if you make a donation. The advantage to Fake Name Generator, if how it is generated on the fly. I have downloaded about 4-6 million records (haven’t really kept track), and I don’t really have a large amount of duplicate records.

    April 29, 2011 at 11:26 AM

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s