There are endless topics, and then there’s the topic of data. It is such a huge and complex concept, used in so many different contexts, that it seems foolish to try and encompass all of its significance in a single sentence or a single article even. Which is luckily not what I’m trying to do here.
Before I get into the reason why I’m writing about this in the first place however, I do have to push back my stereotypically taped up glasses for just one moment and pull out the book of word meanings. Data, says the Cambridge English Dictionary, is ‘information collected for use’. But that doesn’t say much really, because explaining something with a synonym doesn’t really get you anywhere in this case. So, we need to look up what information is, then. Well: ‘facts about a situation, person, event, etc.’ But that doesn’t really quite cut it either, because a) that’s yet another synonym, and b) another big topic which unfortunately for some reason has become quite contentious in its own right. So, going further, for facts we have: ‘Something that is known to have happened or to exist, especially something for which proof exists, or about which there is information’, and now we’re almost right back to where we started. Of course this is all just semantics, and you can go in circles with every big concept while trying to define it, so perhaps this time this isn’t the best method to try and introduce the topic I’m writing about.
Perhaps exploring the context in which data is used would serve as a better way of introducing the term. Running the word ‘data’ through three different word association programs, which is a fun exercise on its own, but also helps to position words in their respective, most widely used contexts, yields kind of similar results, at least at first glance. For that I used the Word Associations Network, Related Words and Visuwords, the latter being insofar amusing that it shows you words in the form of maps and bubbles that move around funny.
The main associations are, which isn’t surprising at all, related to IT, science and maths. It’s a bunch of terms related to statistics and engineering, and computing. Data files, which every PC user is familiar with, data encryption, which brings about visions of infamous ex-NSA employees stuck in Russia, and related to that, privacy. There are some connotations with academic-related activities, questionnaires, estimations and adjectives related to things being experimental and empirical. I would throw in business there somewhere as well, for good measure. You know, data-driven this, big data that. Although personally, I associate data with marketing as well, since that is how I first came about learning about it in general, during PR and media classes, where we used data to gain consumer insights and such things.
Data is presented in various, sometimes contradictory contexts as well, at times being a scary weapon to condense experiences into dehumanizing graphs and numbers, sometimes the hero of medical or other scientific breakthroughs, and, perhaps most stereotypically, the bane of any student’s existence, at least of those who had to suffer through statistics courses in college.
But there’s so much more to it, too. Things perhaps not directly perceived as data are, in fact, it; like a tachometer, which is just a dynamic, visual representation of certain processes, measured in numbers, or pages in a book, and the table of contents correlating to it. It’s all data, visualized in one way or another. Following that train of thought however, in the end everything that you can describe is data, since description itself is a form of data presentation.
So, certainly the endless aspect of the topic seems to become evident at this point, since for having written about 600 words already, which is one standard Google Docs page, I haven’t really said anything, or explained anything. I’ve just thrown different reference points into the air and hoped it would make sense somehow.
So, to make things easy, and to move along to the actual point, I’ll say this: when I say data, and when I think about it, I think about the TED talk from David McCandless called ‘The beauty of data visualization’. His main point is centered around how the presentation of data is important, how we read data and how to make it engaging and beautiful. As much as I agree with his points and how important information design is whenever we talk about data (information design being a field so fascinating I feel giddy just thinking about it) what I found more interesting in his talk were the actual results he was talking about. How he observed patterns and described them and painted pictures not only with words but also with graphs and numbers, from spikes in news articles about violence allegedly caused by video games or what time of year people most often break up. We were shown this video during some sort of design or research class, I can’t remember, but it certainly sparked my interest to learn about data and data manipulation and presentation, and it cemented my understanding about what data can actually be and that it goes beyond the stereotypical associations related to it, that it can simply be anything.
Having written all that I do need to specify what I’m going to cover in the next several paragraphs. Of course the main goal is to share something interesting, but more precisely I want to point out what it is exactly I like about data and the processes surrounding it, I’d also like to share a few more examples of where and how it can be used, and finish off with one particular example of it which is very personal to me, literally and functionally.
- What’s cool about data
I was never really good with numbers, more of an artsy/humanities-sort of person, though I did enjoy maths class and all the DNA stuff in biology, which was also basically maths. During my time at university, and being exposed to different meanings of data however, I did become interested in those numbers, and I saw that it was, first and foremost, incredible fun. The research processes I got to take part in at that time were nothing short of a treat. It starts even before data collection, because it is simply hilarious when you sit around with your friends and come up with ideas for research and specific survey questions. Then there’s the actual physical collection of data, where you go around feeling all sciency and important when you ask people to fill out your questionnaires. Next you need to gather all the data in one place, enter it into a file with bloodshot eyes, listening to weird videos in the background, pulling all-nighters. But then you see your spreadsheets flooding with actual data, which you then get to organize and categorize and then, at the end of a very arduous and painful process you get your results, summations, percentages and graphs based on your numbers. Finally you can interpret those results and see what you have accomplished by having an idea for a research project which, through the magic of hard work, turns into actual findings. Congratulations, you did a science. The fruits of your work don’t have to be mind-blowing or life-changing by any means but still, it’s a good exercise, a test of your abilities, a place to learn, everything. Which is, you know, fun.
Back then I thought that would be the only place where I would have fun manipulating data. At university, during crazy classes, where we thought we were the kings and queens of knowledge and progressive ideas. The real world isn’t like that, I thought. Numbers and tables aren’t fun when you sit at your desk in a stuffy office, bound by deadlines and meetings that could’ve been emails.
Well, turns out – they are. I wanted to work in a corporation for many reasons, I applied for a job where my German skills would be required, so it was partially that, to be able to schnacken as much I wanted, but also to see what the fuss was all about. I had very romanticized ideas about working in an office, informed by years of consuming American pop culture, though of course I also needed to make some money. But despite all that, yes, the idea of being an office worker tasked with menial, repetitive, mind-numbing exercises seemed thrilling to me. I was really looking forward to playing around in Excel every day, and to learn how to use it beyond the few formulas I had taught myself.
And it was a very big part of why I liked my job. Even though I was dabbling in finances, tracking invoices and doing all sorts of things I have absolutely negative interest in, being able to sort through piles of files and organizing numbers was awesome. In the end I didn’t last very long at that office, realizing I’d rather avoid the stressful conditions created by deadlines and office politics, but I do look back fondly on overtime spent optimizing invoice trackers.
So, fun is the first reason why I think data is cool. The second one is knowledge. Because that is what you use data for and what you get at the end. No matter if that is scientific knowledge, consumer insights, bases for making business decisions, or something as simple as creating a personal monthly budget – the goal of any data-related enterprise is gaining knowledge of some sort. One reason for wanting to gain knowledge, in turn, is curiosity – and as you may have already concluded from this very blog, I’m all about being curious.
What is so great about the specific kind of knowledge that is possible to be gained from data processing is how from a large quantity of seemingly unrelated or uninteresting input you can extrapolate a small amount of simple and suddenly interesting observations. It’s an opportunity to view complex processes in understandable ways and to gain insight into issues which are impossible to understand otherwise.
I’m a big fan of knowledge in general, and especially useless knowledge. Or what is considered to be useless anyway, because I don’t think there is such a thing. Of course there are situations where certain knowledge is more relevant in specific contexts, but to universally decide that particular types of information are meaningless is a bit rude, no?
And that’s also the beauty of data, that it can be employed in order to learn about all sorts of things, big and small, universal or niche, ‘significant’ or ‘insignificant’. It’s also a great tool for observing patterns, and patterns are another reason why I like data, namely because they’re a manifestation of order.
That might sound a little odd, or unreasonable, considering that data generally means large collections of bits and pieces that are specifically not in order, but fortunately that’s not the whole story. Data, and especially evaluating data, creates order. Every cell in a spreadsheet has its place, every input has its category and at the end everything is packaged into simple visualizations that provide clarity and understanding. Ideally, anyway.
That is also perhaps why I like design, because as messy as it may get sometimes, solving problems in general is also an exercise in creating order. I like things to be symmetrical, and to be numbered and organized, and manipulating data lets me control a part of reality in a way where I achieve exactly that.
Perhaps I’m a weirdo for finding a sense of peace and relief in appreciating data this way, but if that’s what it takes to be peaceful and relieved, well, then I guess there are worse things. It’s also good practice for ordering other things in one’s life, recognizing patterns in seemingly unrelated things, examining details and being able to identify individual features within a bigger picture. All good and useful things, I imagine.
Then, of course, the epitome of data order – information design. The practice of how to present information in a way that promotes understanding. This particular trade is related to graphic design and data visualization, though its core aim is to make information efficient, not just visually pleasing. Even though succinct, clear data presentation in itself is already very beautiful, I think.
- Examples of cool data
There are tons of interesting examples of data projects from all around the world, related to almost any subject. Choosing just a few to share here is truly a difficult task, which is why I’ve decided to pick only three which are very different, with different purposes, areas and scales, though all with a somewhat personal background.
First up is actually a website, or a computing engine, actually, which serves to answer questions using all kinds of knowledge and algorithms. I don’t know how any of that works but boy, is it fun to play around with. I remember spending hours when I was a child just typing in queries and reading through the results, though it all started when my brother showed it to me a long time ago and we just sat there and looked for stuff together. The platform in question is Wolfram Alpha, a computational knowledge tool, basically Google for sciency calculations and such.
I’m sure the engine provides a great service to people who know how to use all of its features, and I can imagine it works well to help with research and analyses of issues requiring large data sets. But again, I am no expert, I’m just a kid at heart, curious to find out what the oldest dog in the world was called (Bluey, at 29 years old) and what day of the week it was on May 17th, 1678 (it was a Tuesday). Those, and many more things can be explored using Wolfram Alpha, though I remember the piece of information that stuck with me the most was when, while dabbling with some search results, my brother and I found out that the smallest country by population are the Pitcairn Islands at 56 inhabitants, which was a fact I mercilessly showed off with whenever I could for several years. A quick check now revealed that in fact the smallest inhabited territory at the moment are the South Georgia and the South Sandwich Islands at a staggering population of 30, which is somehow even better.
What I like about Wolfram Alpha is that it’s such a great tool for gaining knowledge but in a very fun and accessible way. You need to get used to formulating inputs properly, but that’s just a matter of trying out a few for a couple of minutes. Again, perhaps I was just a weird child, but finding out random facts about the world, calculated by an extremely sophisticated machine to me seemed like a great way to pass the time. Looking back on it now I was possibly always somewhat interested in data, I just didn’t realize it, and only when I got to university to actually experiment with it myself did the spark alight anew.
Speaking of, as I’ve mentioned, we had to do with a lot of different data at university, and it was all pretty fascinating. One research project stood out to me in particular, so much so that I remember it to this day. It was nothing terribly earthshaking or complicated but it did one very important thing – to me at least it showed that you can find data sets wherever, that anything can become a research project and that you’re only bound by the limits of your own imagination when it comes to data research. Which is nice to know, given that it is rather associated with dry knowledge and maths.
Of course, the results of any data project will only be as impressive as the scope and area of research you choose but that’s okay, every kind of insight counts. The research in question was an unpublished pilot study shown to us during class concerning the naming of residential investments in our town. Quite a niche topic, if I’m honest, but it intrigued me nonetheless. The project involved collecting all available names of new investments in accordance with specific temporal considerations and yielded a curious insight into something I have absolutely no idea about and into what the town we live in chooses to name its neighborhoods.
I don’t know if it’s vital to know the range of name types investments get, all the way from street names through nature names to names in foreign languages, or that someone thought it was wise to call a residential settlement ‘Silence House’ (original name, no translation) but it doesn’t matter really. More so than the content I appreciated this study for exemplifying the world of research possibilities available in a chaos of atomized and seemingly insignificant information.
Lastly, I wanted to showcase some sort of big study or report, concerning some important global matter. The Pew Research Center is a good source of those, though it mainly contains American-related data. For generating various international statistics I can recommend Statista, which is also a good resource for academic or business purposes, or to click around to find some trivia to boast with. I found many other statistics websites and academic papers but nothing I actually wanted to settle on, I wanted to find something really special. And I say when in doubt go back to what you like – and in my case that’s almost always the Lord of the Rings.
And lo, there actually is a very detailed and complex online enterprise showing the world of Middle-Earth from an entirely new perspective, through interactive maps, timelines, and what is most important in the context of this article, statistics. The undertaking is called LOTR Project and it is the missing piece in my mosaic of LOTR nerdom that I didn’t know I so desperately needed. I mean, I have all the books and the merch and that one time I cosplayed as Gandalf at school, so adding another layer of data-driven interpretations of Tolkien’s universe seemed just about right.
Apart from graphs visualizing data related to the actual characters in the novels, including life expectancy by race and which character had the most children (and it was Samwise, the old slut), there is also an entire section dedicated to a thorough analysis of the books themselves. And we’re talking an engine for keyword frequency in the Silmarillion, Hobbit and each part of the trilogy, specific word count and density, as well as things like tag clouds with common words and visualizations of chapter lengths. Not everyone will find this particularly interesting but I have to say I’m pretty much freaking out right now.
What is so wonderful about this project is not only the dedication and detail put into every graph and statistic, giving the wondrous lore of Middle-Earth entirely new dimensions, but it also shows yet another option for producing vast data collections and compelling insights resulting from the analysis of said collections. The site also offers a blog with articles related to Tolkien’s work and I certainly learned a new thing or two there, so I can only recommend it. I am a big fan of fan labor in general, though especially so when it results in something so practical and at the same time lovingly magical.
- One specific example of very cool data
Now, here we are, at the end of all things – or just this article. The last, and yet extremely significant aspect of the topic at hand that I’d like to touch on is data-fueled personal records. I know that sounds like another weird thing, but bear with me for a second.
For as long as I can remember I have been an avid diary-keeper, though at some point I wanted something more specific, more regular, less wordy and more quantifiable. I was also basically sad all the time and I wanted to check whether I was depressed, which I thought I could do by conducting pseudo-scientific self experiments with Excel. So, a couple of years ago, I created a spreadsheet with space for tracking all sorts of variables concerning my life, including mood, exercise, travel and other things, which has then developed over the years into a pretty massive file with a couple of columns of information to fill out every night.
I thought I was alone in this practice, and that view was reflected in the somewhat astonished faces of the people I shared my project with, but then it turned out that there actually is such a thing, that there’s a name for the variety of practices related to quantifying one’s life. The general term is self-tracking, though many other names are also used, such as life-logging, the quantified self, personal analytics, personal informatics, self-surveillance and self-optimisation. Here’s also the definition of it as written in the Oxford English Dictionary, so that we’re all on the same page: “The practice of systematically recording information about one’s diet, health, or activities, typically by means of a smartphone, so as to discover behavioural patterns that may be adjusted to help improve one’s physical or mental well-being.”
Quick sidebar: I was, and still am, so into the practice, that I actually wanted to make it the subject of my PhD dissertation. I applied with a research proposal outlining the history and ramifications of self-tracking, which probably would’ve turned into a cool thesis in the end, but I failed miserably at the interview. Amidst several mistakes on my part, which I’m not ashamed to admit, the boomer professors of the jury didn’t appreciate it when I suggested that some Polish prophet or whatever, when writing diaries, was actually engaging in the practice of self-tracking. The professors literally scoffed at that, which I get, maybe I shouldn’t insinuate such things, even though I was right (qualitative, or long-from, self-tracking is still self-tracking).
The topic of these kinds of data projects is incredibly fascinating. Now what most people associate with it are Fitbits and counting steps, or freaks measuring every aspect of their existence, but actually it’s a varied and tremendously compelling field of research, as well as a measure of achieving some sort of personal benefit. People have probably been self-surveilling various aspects of their lives for centuries, though the practice has definitely taken a more specific shape with the technological advancements popping up in the last couple of decades.
And so the story of technologically mediated self-tracking begins at the end of the 19th century, with the introduction of the medical scale, used for framing health in terms of numbers for the first time. Later those turned into penny scales available on street corners in many US cities. Those public scales would tell you your weight after you put in one penny, of course, and they would advertise the service with claims like ‘Reduce your weight this new scientific way!’. Later the scale became smaller and moved into many residential homes, where it continued to foster the movement of quantifying one’s health, with claims such as: ‘The scale that tells the truth about your figure smartness/about your family’s health’. And yes, there are vintage ads displaying housewives weighing themselves with shocked faces.
Now we’re in a place where there are gadgets for monitoring just about anything, including but not limited to your location, health, physical activity, sleep, diet, mood, social interactions, social media behaviors, computer usage, driving habits, spending, work productivity, environmental factors, the works. A truly wild world, and one worth looking into in my view. Especially because there is also a bunch of scientific research about the subject, and even a Quantified Self Institute in San Francisco, so there’s plenty of knowledge to be gained in this area.
But scientific inquiry isn’t why I started self-tracking in the first place or why I keep up the practice to this day. At this point I don’t actually think much about why I do it at all, it’s simply turned into a habit. And I like collecting and analyzing the entire year’s input at the end of every year, and writing a small report for myself, and tracking my development over time. So the fact that I enjoy the exercise is one reason why I do it.
But have I actually learned anything useful? I didn’t find out whether I was depressed, as well-intentioned as it was to try, that’s something a spreadsheet can’t tell ya, you need to go to a specialist for that. And I don’t think my life has changed much since I started doing it either, other than I need to spend 2 minutes a day filling out a couple of cells (which sometimes I don’t do and then I end up back-tracking and filling out rows from weeks ago, but whatever).
But it still provides some added value, apart from being just a fun thing to do. It’s a way to hold myself accountable when I set goals, and to see at one glance how much I have to be grateful for. Even when it feels like a shitty year I can see how much of it I’ve actually been happy, or how much I got to travel, or whatever else I keep track of. It’s a very specific form of self-care I think, especially because I have a space to jot down fond memories or events that I might otherwise forget. I have a few lines of reference for every day I get to spend on this Earth, and so every day counts somehow. Not to mention that I have a resource to check when stuff actually happened, which I’ve totally used before to prove I was right when talking with people about the past.
Alright. This was a very short overview of what self-tracking is and how and why I use it, so there’s lots more to find out, for me as well, but I totally get the fascination with it. Especially when someone is fond of data from the get-go it’s easy to become a little obsessed with tracking all the different numbers that signify different aspects of one’s life. I don’t recommend anyone to develop a compulsive passion with quantifying one’s life, but it is something useful and interesting to try out for sure.
So, now we really are at the end. I’ve only barely scratched the surface of the topic of data with this article but at least I got to share some new interesting things and delve into some personal record-keeping that goes beyond writing a journal.
There’s lots more to say about all of it and this is surely not the last time I intend to incorporate data-related stuff into my writing; but it’s been a great pleasure to point out some positive things about it all the same. I’m not the most scientifically knowledgeable person, and I struggle with having catastrophic visions about AI taking over the world, but like with anything, things have good and bad sides, even data. And I’m happy to gush about those good ones for way too long every once in a while.
Either way, I can only recommend to anyone to get involved with data in one way or another, not only because it’s become a ubiquitous part of Digital Age life, but also because it’s a great pastime. Whether it’s exploring statistics about your favorite fictional universe or keeping a systematic account of your existence – it’s one hell of an intellectual ride.
Introduction:
- Cambridge English Dictionary, Data, https://dictionary.cambridge.org/dictionary/english/data
- Cambridge English Dictionary, Information, https://dictionary.cambridge.org/dictionary/english/information
- Cambridge English Dictionary, Fact, https://dictionary.cambridge.org/dictionary/english/fact
- Word Associations Network, https://wordassociations.net/en
- Related Words, https://relatedwords.org/
- Visuwords, https://visuwords.com/
- TED-Ed, The beauty of data visualization – David McCandless, https://www.youtube.com/watch?v=5Zg-C8AAIGg&ab_channel=TED-Ed
- Information is Beautiful on Instagram, https://www.instagram.com/infobeautiful/
What’s cool about data:
- Wikipedia, Information design, https://en.wikipedia.org/wiki/Information_design
Examples of cool data:
- Wolfram Alpha, https://www.wolframalpha.com/
- Olender K., Semantyka inwestycji mieszkaniowych we Wrocławiu, pilot study, not published
- Pew Research Center, https://www.pewresearch.org/
- Statista, https://www.statista.com/
- LOTR Project, http://lotrproject.com/
One specific example of very cool data:
- Oxford English Dictionary, Self-tracking, https://www.lexico.com/definition/self-tracking
- Crawford K. et al., 2015, Our metrics, ourselves: A hundred years of self-tracking from the weight scale to the wrist wearable device, in: European Journal of Cultural Studies, Vol. 18(4-5)
- Lupton D., 2014, Self-tracking cultures: towards a sociology of personal informatics, in: University of Canberra Research Publication Collection, Faculty of Arts & Design
- Quantified Self, https://quantifiedself.com/