Welcome, Register for free! or Login below:
EduGeek.net RSS Feeds Register FAQ Members Social Groups User Map Calendar Search Today's Posts Mark Forums Read

Blue Skies Somewhere for EduGeekers to discuss those far away ideals of major (or maybe minor) changes that will have a large impact on schools, the jobs of technical staff and learning.

Go Back   EduGeek.net Forums > General > Blue Skies
Reply
 
LinkBack Thread Tools Search Thread Language
Sponsored Links
Old 26-06-2008, 02:00 AM   #1
 
SYNACK's Avatar
 
Join Date: Oct 2007
Location: Auckland, New Zealand
Posts: 1,825
newzealand
Thanks: 60
Thanked 261 Times in 237 Posts
Blog Entries: 2
Rep Power: 58 SYNACK has a brilliant futureSYNACK has a brilliant futureSYNACK has a brilliant futureSYNACK has a brilliant futureSYNACK has a brilliant futureSYNACK has a brilliant futureSYNACK has a brilliant futureSYNACK has a brilliant futureSYNACK has a brilliant futureSYNACK has a brilliant futureSYNACK has a brilliant future
Send a message via MSN to SYNACK
Lightbulb Google Jumpstarts Science - Changes the Scientific Method

The End of Theory: The Data Deluge Makes the Scientific Method Obsolete

I just read this article on how the shear volume of data now available to organizations like Google will drive a new wave of science. Instead of running countless experiments and attempting to formulate a model that explains them this new method simply analyzes the vast amount of data available and looks for correlation.

This effectively turns the whole world into the lab and society itself is running some of the experiments by people simply existing in their day to day lives.

I don't think it will replace the traditional science lab anytime soon but eventually every science class may need access to a supercomputing cluster just to keep up.
  Reply With Quote
Old 26-06-2008, 08:01 AM   #2
 
srochford's Avatar
 
Join Date: Aug 2005
Location: London
Posts: 1,180
uk
Thanks: 1
Thanked 123 Times in 108 Posts
Rep Power: 31 srochford is a splendid one to beholdsrochford is a splendid one to beholdsrochford is a splendid one to beholdsrochford is a splendid one to beholdsrochford is a splendid one to beholdsrochford is a splendid one to behold
Default

It's a fascinating idea and I'm sure there's value in it but it does need care. So much of the "data" on the internet is garbage that any kind of aggregation and analysis needs to take great care that it weights stuff in a sensible way (just because a million web sites say drug X is not safe does not mean that we should go with the majority view, for example)
  Reply With Quote
Old 26-06-2008, 08:10 AM   #3
 
contink's Avatar
 
Join Date: Jul 2006
Location: South Yorkshire
Posts: 2,807
uk uk yorkshire
Thanks: 108
Thanked 109 Times in 81 Posts
Rep Power: 34 contink is a splendid one to beholdcontink is a splendid one to beholdcontink is a splendid one to beholdcontink is a splendid one to beholdcontink is a splendid one to beholdcontink is a splendid one to behold
Default

Eh?

Surely the whole point of proper scientific research is determining a relevant set of criteria, applying it to the data set (which you keep to a specific level) and ensuring the data set isn't polluted with erroneous rubbish that will skew it.

By definition there's already a huge amount of data available by "just watching people go about their every day lives" that doesn't involve the net at all but it still isn't done that way because the data collection is a nightmare of contradictory rubbish and skewed cr*p.

Any scientist with half a brain is going to see a search algorithm as skewing the results before you've even started or am I just being a layperson with attitude?
  Reply With Quote
Old 26-06-2008, 09:10 AM   #4
 
dhicks's Avatar
 
Join Date: Aug 2005
Location: Alton, Hampshire
Posts: 1,513
Thanks: 128
Thanked 99 Times in 94 Posts
Rep Power: 30 dhicks is a name known to alldhicks is a name known to alldhicks is a name known to alldhicks is a name known to alldhicks is a name known to alldhicks is a name known to all
Default

Quote:
Originally Posted by SYNACK View Post
Instead of running countless experiments and attempting to formulate a model that explains them this new method simply analyzes the vast amount of data available and looks for correlation.
The article does point out that said data mostly relates to studies of human behaviour (linguistics, sociology, psychology, etc), not science as a whole.

Quote:
I don't think it will replace the traditional science lab anytime soon but eventually every science class may need access to a supercomputing cluster just to keep up.
Ah ha! This is doable enough - all those overpowered desktop machines floating around the place, put them to use as a computing cluster. Now all we have to do is get children to write MapReduce optimised algorithms...

--
David Hicks
  Reply With Quote
Old 26-06-2008, 09:44 AM   #5
 
SYNACK's Avatar
 
Join Date: Oct 2007
Location: Auckland, New Zealand
Posts: 1,825
newzealand
Thanks: 60
Thanked 261 Times in 237 Posts
Blog Entries: 2
Rep Power: 58 SYNACK has a brilliant futureSYNACK has a brilliant futureSYNACK has a brilliant futureSYNACK has a brilliant futureSYNACK has a brilliant futureSYNACK has a brilliant futureSYNACK has a brilliant futureSYNACK has a brilliant futureSYNACK has a brilliant futureSYNACK has a brilliant futureSYNACK has a brilliant future
Send a message via MSN to SYNACK
Default

Quote:
Originally Posted by contink
Eh?

Surely the whole point of proper scientific research is determining a relevant set of criteria, applying it to the data set (which you keep to a specific level) and ensuring the data set isn't polluted with erroneous rubbish that will skew it.

By definition there's already a huge amount of data available by "just watching people go about their every day lives" that doesn't involve the net at all but it still isn't done that way because the data collection is a nightmare of contradictory rubbish and skewed cr*p.

Any scientist with half a brain is going to see a search algorithm as skewing the results before you've even started or am I just being a layperson with attitude?
My original posting may have been a bit vague on this aspect. The data to be analyzed does not have to be from Google, simply Google type amounts of data. This kind of data volume will be created in less than a year at the LHC particle accelerator CERN and probably similar levels of data from the prototype fusion reactor when it comes online. You also have the datasets made avalible to government research. Stuff like national health records and police databases let alone the data generated by the satellites that are tasked to observe earth for climate prediction and other more nefarious goals.

The point is that running the right types of algorithms on this kind of data will turn up things that are significant that we weren't specifically looking for. Effectively it shows us patterns in the data that can point out mechanisms that we are not aware of.

You are right about the amount of rubbish data produced on the internet but there is rubbish data (interference) in any data source and more effective algorithms can be used to compensate but there may even be useful patterns in the interference that we are unable to see. An algorithm looking at a certain type of language may show up some unexpected results but these in themselves could be useful patterns for a different problems, SQL vulnerabilities etc. The other thing to take into consideration is that the absence of data on such a large data set can also be a strong indication of a pattern.

Quote:
Originally Posted by dhicks
The article does point out that said data mostly relates to studies of human behaviour (linguistics, sociology, psychology, etc), not science as a whole.
You are right, the article did favor the human behavior aspects but the same kind of methods can be applied to different datasets just the same. The web is just the easiest at the moment. I was also sensationalizing a bit to get people to read it .

Quote:
Originally Posted by dhicks
Ah ha! This is doable enough - all those overpowered desktop machines floating around the place, put them to use as a computing cluster. Now all we have to do is get children to write MapReduce optimised algorithms...
Yes, I had similar thoughts. Depending on how many computers you had there may even be the possibility of using them overnight as a form of revenue generation for the school. If your system was setup to boot into a fault tolerant cluster overnight you could theoretically sell supercomputer time to rendering studios, research companies or universities. I don't know how much of a market there would be or how cost effective it would end up but it is an interesting scenario.
  Reply With Quote
Old 26-06-2008, 10:48 AM   #6
 
dhicks's Avatar
 
Join Date: Aug 2005
Location: Alton, Hampshire
Posts: 1,513
Thanks: 128
Thanked 99 Times in 94 Posts
Rep Power: 30 dhicks is a name known to alldhicks is a name known to alldhicks is a name known to alldhicks is a name known to alldhicks is a name known to alldhicks is a name known to all
Default

Quote:
Originally Posted by SYNACK View Post
Depending on how many computers you had there may even be the possibility of using them overnight as a form of revenue generation for the school.
Yes, I had a look at a couple of potential systems. Most distributed computing platforms seem to be voluntary, though, or at least organised so you get paid in computing time rather than cash. Also, I figure any amount of cash you do get back is probably only just going to cover power costs, really.

--
David Hicks
  Reply With Quote
Reply

Register now for FREE and post messages!


Username: Password: Confirm Password: E-Mail: Confirm E-Mail:
Birthday:      
Image Verification
  I agree to forum rules 

Similar Threads
Thread Thread Starter Forum Replies Last Post
AQA e-Science How Science Works Plug-In NickJones Educational Software 0 10-10-2007 08:01 AM
Best server 2003 backup method? starscream How do you do....it? 3 27-06-2007 02:26 PM
Scientific Notation in office 2007 wesleyw How do you do....it? 7 12-06-2007 06:42 PM
What scripting method would you recommend SimpleSi Scripts 24 15-11-2006 11:25 AM
Environment variables or other method HodgeHi Scripts 3 21-06-2006 09:05 AM


Tags
cluster, google, science, statistics, triple blind


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools Search Thread
Search Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT +1. The time now is 06:30 PM.
Powered by vBulletin® Version 3.7.4
Copyright ©2000 - 2009, Jelsoft Enterprises Ltd.
SEO by vBSEO 3.2.0 ©2008, Crawlability, Inc.
Copyright EduGeek.net