Data make things smarter

April 15th, 2010

We found this recently published ads from IBM that seem an interesting point of view of how to use data to build things that are “smarter”.  There are also two “Why Data Matters”videos that explain concepts such as “predictive” and “prescriptive” analytics as well as “data insights” towards a data-driven world….

 

How will we remember the past in the future?

March 30th, 2010

Over a year ago, Lynne Brindley, chief executive of the British Library, wrote an urging article in The Guardian warning us from an emerging and growing black hole in our collective memory. With the increasing mediation of a wide array of human interaction via bits and bytes over the internet, it becomes a pressing matter how to preserve our digital cultural heritage for future generations. How should, say, a historian of a far away future study what life was about today? What kind of records and artefacts are we leaving behind that will serve as evidence for that historian? While it is getting easier and more popular to publish and to communicate, to share pictures and to stay in touch with friends in the cloud, very few see the other side; the ephemeral nature of digitally mediated interaction bereft of any incremental stability or persistence we are used to find in physical objects. Of course, if certain facebook or flickr pictures of me end up in oblivion, I would not really mind. But what if Oscar Wilde was a blogger? What if Virginia Woolf published her work as a wiki? We lost the first email ever sent already. We may lose a lot more.

The problem is clear, a solution non-existent. According to a report from 2008 by the RAND Corporation, the best we can do so far is to record digital artefacts as static snapshots bereft of their inherent dynamics and functionalities. Or as the report puts it, archiving these objects is like chasing a moving train. The Internet Archive is supposedly the most renown project addressing this problem. With a database of about 150 billion snapshots of web pages preserved over a period of 16 years, it has only recorded a very small proportion of the whole WWW. The rest is gone. So one part of the problem is the amount of data.

Another problem is the fluid and ephemeral nature of especially online communication and the artefacts it brings forth. In these cases, the question arises whether it is even helpful to talk about objects. Digital objects are performative rather than static, operations rather than things. They only function if processed by ICT and actually only exist as objects in a virtual sense – as emulations of objects. Snapshots of web pages are a way of objectification – a way of freezing the ephemeral into stability. However, by doing so, digital objects are preserved the way a natural museum preserves living beings.

A Video about Information

November 13th, 2009

Information from MAYAnMAYA on Vimeo.

Getting Students To Think At Internet Scale

October 13th, 2009

The NY Times reports that researchers and workers in fields as diverse as biotechnology, astronomy, and computer science will soon find themselves overwhelmed with information — so the next generation of computer scientists will have to learn think in terms of Internet scale of petabytes of data. For the most part, university students have used rather modest computing systems to support their studies, but these machines fail to churn through enough data to really challenge and train young minds to ponder the mega-scale problems of tomorrow. ‘If they imprint on these small systems, that becomes their frame of reference and what they’re always thinking about,’ said Jim Spohrer, a director at IBM’s Almaden Research Center. This year, the National Science Foundation funded 14 universities that want to teach their students how to grapple with big data questions. Students are beginning to work with data sets like the Large Synoptic Survey Telescope, the largest public data set in the world. The telescope takes detailed images of large chunks of the sky and produces about 30 terabytes of data each night. ‘Science these days has basically turned into a data-management problem,’ says Jimmy Lin, an associate professor at the University of Maryland. (posted originally by Hugh Pickens)

IBM joins the race against Information Growth with a new information infrastructure portfolio

September 11th, 2008

Big Blue is getting on track in jumping on the business opportunities offered by the challenges of Information Growth. According to IBM’s press release from 8th Sept. 2008, they launch the largest information infrastructure portfolio ever. It is bye bye to the good old server/client infrastructure since IBM sees “the cloud” as the solution for all our problems. Well, that sounds nice. But what are our problems IBM is going to take care of?

Our problems are that we would like to take our information with us wherever we go, which we can’t.  In an increasingly connected world, we end up creating digital information wherever we go no matter if it’s the “real” or the “virtual” world leading to a 16-fold growth of each individual’s ”information footprint” by 2020. Again: each of us will have 16 terabytes (!) of information stored somewhere each year (!) from 2020 on. And it seems that this amount will increase even more after 2020.

So here comes IBM’s solution for our problems; new storage technologies, smarter applications, virtualization of services, cloudy scale-ability, and so forth. But let’s stop here for a moment and let us consider what we are actually talking about.

Those 16 terabytes of me, collected in 2020, is that information? No, it isn’t. It is only stupid data. I am wondering where those 16 terabytes will come from (2008, it will only(!) be 1 teraybite). It is hard to imagine that that amount of data will be produced by me when taking pictures, shooting videos, writing emails, and so forth. In this sense, the term “footprint” actually hits the spot…. the majority of those 16 terabytes p.a. will not be produced by me but collected about me. So here comes the solution from IBM that is more of the same. More space for even more data. But how informative will that data be? Why should I want that 16 terabytes p.a. to follow me around the world?

Now, the term “information” is used without a lot of care. That 16 terabytes is NOT information it is mere data. The challenge of the future will be to keep the data informative – a challenge digital libraries are struggling with, for instance. One way would be to stop polluting the internet with data. In other words, one solution would be LESS data which is MORE informative than MORE data. Developing new technologies for storing more and more data will increase the velocity of “data growth” without having any effect on making that data informative.

Competition on new uses of data by the UK Government

July 8th, 2008

The UK government’s through the Power of Information Taskforce has launched a  competition  asking the population to propose better ways to publish and mashup non-personal information that the government collects & creates.

The UK government produces masses of data on what is happening around the UK: infomation on crime, on health, on education. However, this information is often hidden away in obscure publications and repositories.

The competition website offers information about the call and also gives many resources of the public data available.

McKinsey Quarterly: Meeting the demand for data storage

June 16th, 2008

I am attaching a recent study/analysis by The McKinsey Quarterly on data storage which has some interesting points worth taking into account in relation to the exponential growth of data warehouses in organizations:

- Large enterprises must manage storage more efficiently if they are to exploit opportunities created by new forms of information (such as detailed financial data, digital images in life sciences or video in media companies). IT managers have to cope not only with the high price of computer storage but also the complexity of managing it; this issue has raised more dramatically today in relation to the use of non-textual media objects (images, video).

- New applications such as more complex business analytics or the interoperation of diverse resources into a new one are major contributors to the demand for additional storage capacity. Eg: Pharma companies are considering digital storage to archive all their imaging data. This could speed getting new compounds to market but also doubles the storage needs and the complexity to manage it.

The study  focuses mainly on data storage but it is easy to understand that there is a double bind of any strategy of information ordering (Kallinikos 2006). As more data becomes available, more work is needed to organize the world of information; resolved at one place the problem is pushed back to another.

BBC to create a web for every episode of every television programme it has ever produced

June 16th, 2008

A recent news from The Guardian mentions that BBC will digitize and put available all their historic programs online. It seems pretty amazing to think that all of the programs of the last 80 years of the BBC will be available on-demand. It does not only gives access to thousands of hours of content (around 160,000 web pages will be created as part of this initiative), but also this initiative resurrects old content, making them online and metadata rich. However this posses some questions regarding the way the content is defined today in comparison to how it was defined decades ago. It makes a huge difference to categorize something that was created 30 years ago than today; the ways to find content should take into consideration this aspect.

Video as a single and packaged product does not speak by itself; for video to be indexed and managed a set of data collected about the content becomes a priority. The new media object is in itself not only the content but also what it wants to say about it. As more video is in digital format, it has become critical when, what and how to define the parameters of its metadata. Therefore as much information about it is embedded into it makes the new media object potentially more usable in the future. However we will never know how this content will be used in the future; so categorization becomes always a chance rather than a reality.

SSIT 8 videos online

May 11th, 2008

All talks during the SSIT8 are available now online. Take a look and enjoy!

Photos from SSIT8 workshop

April 29th, 2008

Photos from a very successful 8th Social Study of ICT (SSIT) workshop “The Habitat of Information: Social and Organizational Consequences of Information Growth” are now available here. The event was organized on 25 April at LSE by Prof. Jannis Kallinikos and José-Carlos Mariatégui with the generous support of the Information Systems and Innovation Group (ISIG) at LSE (with special thanks to Frances White).