Lian (Harvard GSD M.Arch.I)

I graduated in 2013, but still blog here once in a while.

  • anchor

    Live Blog: Bethany Lang: Ugly Data Duckling: Grooming Messy Data With Policies and Procedures

    By Lian Chikako Chang
    May 1, '15 2:38 PM EST

    Hi Archinect,

    Another session at Do Good Data 2015. Bethany Lang is presenting. (Warning--no architecture here!) "Ugly data" here isn't talking about the nuances of metrics or computation, but very simply: wrong data. People change jobs often in the nonprofit world, and it's easy for inconsistencies and mistakes to creep in.

    Bethany's outline includes:

    • Why is clean data important?
    • The Three "I"s of bad data
    • Data cleanup
    • Excel tips and tricks
    • Writing, sharing, and enforcing your policies and procedures
    • Ongoing maintenance

    Why is clean data important? 

    1. Bad data makes you look bad. People won't engage or donate if you get their name wrong on mailings, for example. While this is superficial, there are serious cases, for example when someone has died and their family can't get them off the mailing list.
    2. Garbage in, garbage out. If your data is bad, your reports will be bad, [and you are a bad person].
    3. Data analysis and segmentation. You can't do this with bad data.

    The Three "I"s of bad data: What data issues should I be looking for? We've all seen messy data in our databases.

    BL puts a sample spreadsheet on the screen and asks us to talk about what's wrong. Inconsistent spellings, capitalizations, and punctuations; data in the wrong columns; multiple people in the same row; and lack of consistently in parsing last names versus middle names. For example, Michelle Adams McCoy probably has a last name of Adams-McCoy, instead of Adams as a middle name.

    1. Inconsistent: Different standards, no standards, and mistakes in how a piece of information is parsed.
    2. Incomplete: Not everyone provides every piece of information. Some can be extracted, like zip code, if you have the rest of the address.
    3. Incorrect: This is the most difficult to find, and requires on institutional memory, such as when someone joined a board. You don't have time to deal with this everywhere, but if you have critical accounts or fields, sit down with someone who knows your constituents really well to review the information. But be realistic about how much time and commitment there is to do this.

    Data cleanup - how do I clean my data?

    1. Backup your database before you do anything! 
    2. For internal CRM functions, inside your database, most databases have a system to help you clean basic things, such as entries with duplicate phone numbers, names, or addresses. 
    3. You can also pay to have your addresses checked against U.S. Postal Service's addresses. This also allows you to identify people who are deceased.
    4. External services, outside of your database. For example, there are services that include house deeds, annual giving based on other charitable organizations' reports, and other publicly known data.
    5. Batch update. In Excel or in a spreadsheet view in your database, you can see all your rows at once, and can fix things all at once.
    6. Consolidate your data. Nonprofits tend to have spreadsheets all over the place, containing different data.

    Excel tips and tricks (commands)

    1. Text to column separates text in one column into multiple columns, based you your own criteria (such as the presence of a comma or space).
    2. =Proper puts everything into title case capitalization. 
    3. Concatenate combining text from multiple columns.

    Writing, sharing, and enforcing your policies and procedures

    When there is high turnover, things can get crazy in a hurry. A policy and procedures manual does not sound sexy but is extremely useful. Some of these are defined by your CRM database provider. Others are up to you--so make rules for the following situations:

    1. When to enter a new account: Some organizations require that someone has to have purchased something in order to have an account. Others add everyone they come across.
    2. Rules for managing changed addresses: Do you overwrite an old address with a new address? Or do you hoard the old addresses, since they can be useful sometimes.
    3. Rules for managing spouses/relationships: If spouses are consistently engaging with your organization as a couple, it may be OK to give them one account. But what happens if Mary is on the board and John is volunteering, or if they're donating separately? Make it separate. 
    4. Rules for salutations: Are you using formal and informal salutations? There's no point maintaining separate fields if the data is almost always the same in both.
    5. Rules for dates, such as donation dates: Does your organization consider the donation date to be the date on the check, the date the money hit your bank account, or the date you entered the information in the database?
    6. Guidelines for running/requesting reports: Fill out a form? Ask 72 hours in advance? Making too many reports, based on every whim, can be a waste of time.
    7. How to get help: Contact information for your CRM provider.

    What do I need to put in my manual to make sure that I can report on this later? Think about this and make sure to track the data you need, in the format you want it in.

    What is important data that I will always need?

    1. relationships (e.g. in family services)
    2. gender (how do you identify this, given LGBT?)
    3. birthdate (don't track age, dummy!)
    4. program-specific data (whatever you need to run your reports)

    Policies and Procedures Manuals

    1. Search the database (using partial and approximate terms) before adding a new record.
    2. Add examples
    3. Enter data in the correct field. If there's a special field to indicate that someone doesn't want emails, that can't be randomly mentioned in a note.

    Sharing and Enforcing

    1. Share your manual on a shared drive, and/or provide printed copies.
    2. Make the manual and basic database training part of staff onboarding.
    3. Create and save reports to check for manual rules that aren't being followed. For example, if gender is a required field, most CRM databases can run automatic reports to check if there are accounts missing data in this field. The database can also tell you who last touched that account, so you can talk with them. It may be that the data is difficult to enter and you need to fix something.
    4. Add any new policies and procedures, and share the updated version with staff


    I know this was probably of interest to only a limited subset of Archinect readers. If you've read to this point, thanks for reading!


    • 1 Comment

    • Working in informatics, dealing with sales/customer accounts and internal/external data-sources, totally makes this my thing!

      Process documentation and systems/methods of knowledge sharing are not easy. There is a tension between standardization, collation and collaboration as well. Takes effort to get right...

      May 4, 15 7:40 pm  · 

      Block this user

      Are you sure you want to block this user and hide all related comments throughout the site?

    • Back to Entry List...
  • ×Search in:

About this Blog

This blog was most active from 2009-2013. Writing about my experiences and life at Harvard GSD started out as a way for me to process my experiences as an M.Arch.I student, and evolved into a record of the intellectual and cultural life of the Cambridge architecture (and to a lesser extent, design/technology) community, through live-blogs. These days, I work as a data storyteller (and blogger at in San Francisco, and still post here once in a while.

Affiliated with:

Authored by:

  • Lian Chikako Chang

Other blogs affiliated with Harvard University:

Recent Entries