Continuous Data, Schema Data and Identity Velocity
5 Hour
10 Minute
AM
Continuous Data, Schema Data and Identity Velocity - Boy, there's a mouthful!
When you start datablogging you quickly realize that the things you track fall into one of two camps:
1) Continuous data 2) Schema data
An example of continuous data is a Body Weight Log. If you make a Body Weight Log entry every day they all track one thing, body weight, over time. You build a continuous history of that one thing. You can chart and graph it, relate it to other things, use it as a benchmark, etc.
An example of schema data is a Movie Review Log. If you make a Movie Review Log entry every day they all track separate things of the same type... they're all movies but never the same movie. Over time you build a collection of movie reviews.
The role of continuous data is to define your identity. It helps you track things that you do over time. How far you ran last week and how far you ran this week. How much you weighed in 1997 and how much you weigh today.
An interesting common-sense outcome of this is that identity has velocity. People change over time. You can't capture a person in a single static profile. You have to look at where they've been.
The role of schema data is to define your place in the group... the group being whatever social network, blogosphere or society you define yourself as being part of. When you review a movie and give it a 7 of 10 you then see that the rest of your group gave it an 3 of 10. Suddenly you see where you stand with the group.
This means that identity has common metrics. People share data on a common thing, like the same movie, and can then see where they stand in relation to one another. As you increase the number of data variables you can get to multi-dimensional spaces which define the group you're part of. And your data pegs you at a point.
To summarize:
1) Continuous data - - - Tracks one thing over time - - - Creates a history - - - Defines your identity
2) Schema data - - - Tracks many things of the same type - - - Creates a collection - - - Defines your identity's place in the group
So, what technologies are at play with the various types of data?
Reger.com allows the collection of both kinds, but I'd have to argue that people find the most value from continuous data. They see trends in their own identity, get charts and graphs.
The challenge of finding value from schema data involves critical mass. You have to have enough people tracking the things you're tracking to get a network effect that allows you to properly define your place in the group.
The structured blogging effort and sites like microformats.org seem more focused on schema data. They're trying to incite a critical mass of data so that people can experience social network effects. Most of their formats are intended to allow many different people to provide data on the same thing... i.e. the same movie, the same restaurant, the same song.
This is important work and I applaud them for it.
I'm not one for predicting, but let's say that within two years you'll have millions and millions of blog entries with what I'm classifying as schema data. Mashup Web 2.0 companies will pop up attempting to connect that data. Search engines will provide quantifiable search capabilities. Aggregation sites will allow you to see reviews of a single thing from many people.
But this is still treating identity as static. By aggregating people as single blog entries we're allowing a person's identity to be defined by that one entry, at least in this one case... in this one search operation... or this one perspective of a resturant.
Identity has velocity. So the next phase is that the social network effect will happen across concrete continuous data. Aggregators and search engines will allow people to query based on history. For example, "show me people who weigh more than 275lbs but have lost at least 50lbs over the last 2 months." This query incorporates a stream of continuous data about a person. A person is defined not a "weighing 275lbs"... which could imply "weighing 275lbs because they're lazy and have been gaining weight for 5 years now"... or it could imply "weighing 275lbs but eating healthily and on the up and up."
The perspective of identity velocity allows us to much more clearly interpret data from people and translate it into our daily lives.
And there's also a crossover between continuous and schema data: review the same movie twice and you've set two data points... your schema data can be used to calculate identity velocity. How interesting would it be to guage yourself on political issues like abortion over the course of your life to see how your views change.
The steps we're taking today are important, but they're just the start.