I love spreadsheets, dear reader, g-d help me I do. Perhaps it’s an unseemly passion to indulge for a former English major and creative writing MFA, but I’m as likely to lose myself in an afternoon of VLOOKUPS and INDIRECT cell references as I am to surrender hours to the chaste hookups and indirect allusions of the 19th century romances I devour.

But though the coupling of a more literary sensibility with the calculating instruments of business may seem unconventional—a dreaming Heathcliff paired with a scheming Becky Sharp—for a content strategist concerned with harnessing for productivity the patterns and structures that underlie messages and information, it’s an advantageous match.

As a practice, content strategy applies structured, data-driven and analytical thinking to creative, narrative-based outputs. In my work at Cake & Arrow, I make strategic content recommendations in support of a redesigned or redefined customer experience. I analyze our clients’ existing content structures, identify gaps and opportunities, model new structures, clean up and map content, plan content workflows, and track performance.

Microsoft Excel, Apple Numbers, and my favorite application, Google Spreadsheets, were originally designed to simulate paper-based accounting worksheets, and can perform intricate arithmetic, mathematical, financial, and statistical operations on large sets of data. They are also perfect tools for identifying the underlying patterns in the content we create and optimizing their application.

While spreadsheets remain the best tools I’ve found to do this work, the built in functions and formulas are mostly geared toward working with numbers, not text. However, with some creative thinking, they can be adapted to serve a range of content strategy projects.

I use spreadsheets at every phase of my content strategy projects:

  • During the early research and discovery work, I use spreadsheets to inventory existing content and analyze it for opportunities and gaps. For large, content heavy projects, the automated analysis made possible through through functions and formulas yield more reliable results than manual operations. Clients often find insights based on these quantifiable findings more credible and persuasive when they are used to back up strategic recommendations. Claiming “your headlines are too long” is more convincing when you can add “your average headline is 280 characters–too long to tweet.”

  • In later implementation phases, spreadsheets are an ideal way to record and deliver taxonomies and tagging recommendations, model new CMS content types, and to clean up and map existing content to new structures.

  • For ongoing content creation and publishing, I use spreadsheets to create editorial calendars, manage workflows, and track performance. There are other, more specialized tools out there, but a spreadsheet can get the job done well.

Today, I’m sharing a few functions and techniques I’ve found helpful when auditing content in the beginning phases of a project. In later posts, I’ll share tips and templates related to implementation and ongoing content management.

NOTE: The following tips assume some familiarity with content strategy, spreadsheet programs and working with formulas and functions. The goal is not to offer a comprehensive tutorial in how to use the relevant functions, but to offer ways of thinking about content in relation to functions. Except where noted, these tips are specific to Google Spreadsheets, but similar techniques may be used within Excel and Numbers.

Content inventory building

Ideally, I begin content strategy projects with an inventory of all existing URLs, sourced straight from the CMS and complete with page titles, related tags, publish dates, copy, and other useful attributes. Often however, technical obstacles or project management hang-ups leave me with an incomplete or unreliable inventory, or nothing at all. When that happens, I look for shortcuts to compile my own.

In some cases, a site crawler tool such as Screaming Frog SEO spider provides a good head start. However, while crawlers show how machines see a site’s content in great detail, they offer less insight as to how the content is organized for human eyes. I find that by manually copying and pasting navigational menus, topic-based aggregate pages, and other lists of links from a live site, I am able to extract URLs and compile a usable list of pages, organized as a user would actually encounter them.

Here’s how I do it:

First, I highlight the list of links to include in the page inventory then paste the copied text in an empty sheet. Depending on the formatting of the text, the list of links should appear in a column, one link per cell.

Then, with the list of links in a spreadsheet column, I use one of the following methods to extract the URLs from the linked title copy. The goal is to end up with a column of link text and a column of corresponding URLs.

  • Option 1: IMPORTXML: Extracting URLs from linked text using importXML is a multi-step process. First, publish the list of links to the web. Then, after the sheet is published, use ImportXML to import the extracted link text and URLs back into a new sheet. It may take a try or two. I’ve had luck with these detailed instructions: IMPORTXML process. Note: this link also includes instructions for checking links for 404s, another useful operation for content strategists.

  • Option 2: Get URL (a VBA Function): If the ImportXML function does not work to extract the URLs, try using GetURL, a custom Excel VBA (Visual Basic Application) module. A search for “extract URL from linked text” will turn up a number of tips and tutorials to achieve this. I’ve had success following these instructions.

It can take some finagling, but with patience and experience, you can extract a usable page inventory, which can then be analyzed for content patterns, gaps, and opportunities.

Content analysis

Once I have a usable content inventory, I’m ready to start looking for trends, outliers, exemplars, and problem content. I use a number of functions and formulas to extract meaningful categorizations from titles, content counts, get basic metrics like character maximums, and identify pages that just need to go.

Here’s a few functions I find useful for content analysis:

COUNTA

The COUNTA function returns a count of the number of values in a dataset. Useful for getting and maintaining a tally of content by category. For example, if I have a column labeled “region” in my inventory, I might use COUNTA to keep track of the number of content items per region

COUNTIF

COUNTIF returns a conditional count across a range. Useful to tally content items that contain a specific word or phrase. For example, if I suspect your body of content is too promotional, I might use COUNTIF to get the number of titles that include the brandname.

COUNTUNIQUE

This function counts the number of unique values in a list of specified values and ranges. Useful to identify duplicate content items.

LEN

Finally, LEN returns the length of a string of characters. Useful for finding averages and extremes in title lengths and other content attributes.

Content clean-up and formatting

In the process of analyzing my content inventory, I generally find new attributes that I want to record or complexity I want to eliminate because it is not meaningful. For example, I may want to create a new column to categorize all content with a specific word in its title. Or I may want to generate a recommended URL in a consistent way. Going through line by line can be tedious, so I use functions to automate where possible.

CONCATENATE

This function can be used to consolidate content across fields and to append text or punctuation to the words. using this function can make quick work of reformatting content for a change in CMS or testing rules for dynamic content creation.

TRIM

TRIM removes unnecessary text from a batch of cells. It is useful for cleaning up URLs with messy extensions or cutting out redundant copy.

HTTPResponse (custom script function)

This function automatically check linked text for for broken links.

__

These spreadsheet hacks are only a few examples of the techniques I use in the early phases of content strategy–during research and discovery. There are countless ways spreadsheets can be used to help make every phase of content strategy more efficient and effective, and more data driven.

Below are a few best practices to remember as you begin to use spreadsheets for your content strategy needs:

  1. Formulas can slow down a spreadsheet. When I’m confident with my results, I copy the column or row and paste special (values) to replace the formula populated cells with just the resulting values. That way the sheet won’t have to recalculate every time I open it. If I want to preserve my formulas in case I need to revisit, I make a copy of my sheet to archive.

  2. It’s easy to get carried away once you get started with formulas. I’ve spent many an hour clicking through stackexchange threads in search of the perfect macro. If I’m headed down that rabbit hole, I make sure efficiencies I might gain from automation are worth the time spent. Often it can be quicker (if less satisfying) to do manually.

  3. Label everything. I never remember what my magic formula does, where I got it, or how it works. The only thing worse than going down the rabbit hole once is having to retrace my steps, so I note links to instructions and put descriptive headers on on my columns.

Finally, it is a truth universally acknowledged that a single content strategist, in possession of a good content inventory, must be in want of a formula–but please maintain your sense of decorum and try not to fall too hard for your spreadsheets.