+ Post New Thread
Results 1 to 6 of 6
Blue Skies Thread, Google Jumpstarts Science - Changes the Scientific Method in General; The End of Theory: The Data Deluge Makes the Scientific Method Obsolete I just read this article on how the ...
  1. #1

    SYNACK's Avatar
    Join Date
    Oct 2007
    Posts
    11,143
    Thank Post
    863
    Thanked 2,695 Times in 2,285 Posts
    Blog Entries
    9
    Rep Power
    772

    Lightbulb Google Jumpstarts Science - Changes the Scientific Method

    The End of Theory: The Data Deluge Makes the Scientific Method Obsolete

    I just read this article on how the shear volume of data now available to organizations like Google will drive a new wave of science. Instead of running countless experiments and attempting to formulate a model that explains them this new method simply analyzes the vast amount of data available and looks for correlation.

    This effectively turns the whole world into the lab and society itself is running some of the experiments by people simply existing in their day to day lives.

    I don't think it will replace the traditional science lab anytime soon but eventually every science class may need access to a supercomputing cluster just to keep up.

  2. #2

    Join Date
    Aug 2005
    Location
    London
    Posts
    3,154
    Thank Post
    114
    Thanked 527 Times in 450 Posts
    Blog Entries
    2
    Rep Power
    123
    It's a fascinating idea and I'm sure there's value in it but it does need care. So much of the "data" on the internet is garbage that any kind of aggregation and analysis needs to take great care that it weights stuff in a sensible way (just because a million web sites say drug X is not safe does not mean that we should go with the majority view, for example)

  3. #3
    contink's Avatar
    Join Date
    Jul 2006
    Location
    South Yorkshire
    Posts
    3,791
    Thank Post
    303
    Thanked 327 Times in 233 Posts
    Rep Power
    118
    Eh?

    Surely the whole point of proper scientific research is determining a relevant set of criteria, applying it to the data set (which you keep to a specific level) and ensuring the data set isn't polluted with erroneous rubbish that will skew it.

    By definition there's already a huge amount of data available by "just watching people go about their every day lives" that doesn't involve the net at all but it still isn't done that way because the data collection is a nightmare of contradictory rubbish and skewed cr*p.

    Any scientist with half a brain is going to see a search algorithm as skewing the results before you've even started or am I just being a layperson with attitude?

  4. #4

    dhicks's Avatar
    Join Date
    Aug 2005
    Location
    Knightsbridge
    Posts
    5,624
    Thank Post
    1,240
    Thanked 778 Times in 675 Posts
    Rep Power
    235
    Quote Originally Posted by SYNACK View Post
    Instead of running countless experiments and attempting to formulate a model that explains them this new method simply analyzes the vast amount of data available and looks for correlation.
    The article does point out that said data mostly relates to studies of human behaviour (linguistics, sociology, psychology, etc), not science as a whole.

    I don't think it will replace the traditional science lab anytime soon but eventually every science class may need access to a supercomputing cluster just to keep up.
    Ah ha! This is doable enough - all those overpowered desktop machines floating around the place, put them to use as a computing cluster. Now all we have to do is get children to write MapReduce optimised algorithms...

    --
    David Hicks

  5. #5

    SYNACK's Avatar
    Join Date
    Oct 2007
    Posts
    11,143
    Thank Post
    863
    Thanked 2,695 Times in 2,285 Posts
    Blog Entries
    9
    Rep Power
    772
    Quote Originally Posted by contink
    Eh?

    Surely the whole point of proper scientific research is determining a relevant set of criteria, applying it to the data set (which you keep to a specific level) and ensuring the data set isn't polluted with erroneous rubbish that will skew it.

    By definition there's already a huge amount of data available by "just watching people go about their every day lives" that doesn't involve the net at all but it still isn't done that way because the data collection is a nightmare of contradictory rubbish and skewed cr*p.

    Any scientist with half a brain is going to see a search algorithm as skewing the results before you've even started or am I just being a layperson with attitude?
    My original posting may have been a bit vague on this aspect. The data to be analyzed does not have to be from Google, simply Google type amounts of data. This kind of data volume will be created in less than a year at the LHC particle accelerator CERN and probably similar levels of data from the prototype fusion reactor when it comes online. You also have the datasets made avalible to government research. Stuff like national health records and police databases let alone the data generated by the satellites that are tasked to observe earth for climate prediction and other more nefarious goals.

    The point is that running the right types of algorithms on this kind of data will turn up things that are significant that we weren't specifically looking for. Effectively it shows us patterns in the data that can point out mechanisms that we are not aware of.

    You are right about the amount of rubbish data produced on the internet but there is rubbish data (interference) in any data source and more effective algorithms can be used to compensate but there may even be useful patterns in the interference that we are unable to see. An algorithm looking at a certain type of language may show up some unexpected results but these in themselves could be useful patterns for a different problems, SQL vulnerabilities etc. The other thing to take into consideration is that the absence of data on such a large data set can also be a strong indication of a pattern.

    Quote Originally Posted by dhicks
    The article does point out that said data mostly relates to studies of human behaviour (linguistics, sociology, psychology, etc), not science as a whole.
    You are right, the article did favor the human behavior aspects but the same kind of methods can be applied to different datasets just the same. The web is just the easiest at the moment. I was also sensationalizing a bit to get people to read it .

    Quote Originally Posted by dhicks
    Ah ha! This is doable enough - all those overpowered desktop machines floating around the place, put them to use as a computing cluster. Now all we have to do is get children to write MapReduce optimised algorithms...
    Yes, I had similar thoughts. Depending on how many computers you had there may even be the possibility of using them overnight as a form of revenue generation for the school. If your system was setup to boot into a fault tolerant cluster overnight you could theoretically sell supercomputer time to rendering studios, research companies or universities. I don't know how much of a market there would be or how cost effective it would end up but it is an interesting scenario.

  6. #6

    dhicks's Avatar
    Join Date
    Aug 2005
    Location
    Knightsbridge
    Posts
    5,624
    Thank Post
    1,240
    Thanked 778 Times in 675 Posts
    Rep Power
    235
    Quote Originally Posted by SYNACK View Post
    Depending on how many computers you had there may even be the possibility of using them overnight as a form of revenue generation for the school.
    Yes, I had a look at a couple of potential systems. Most distributed computing platforms seem to be voluntary, though, or at least organised so you get paid in computing time rather than cash. Also, I figure any amount of cash you do get back is probably only just going to cover power costs, really.

    --
    David Hicks

SHARE:
+ Post New Thread

Similar Threads

  1. AQA e-Science How Science Works Plug-In
    By enjay in forum Educational Software
    Replies: 0
    Last Post: 10th October 2007, 08:01 AM
  2. Best server 2003 backup method?
    By starscream in forum How do you do....it?
    Replies: 3
    Last Post: 27th June 2007, 02:26 PM
  3. Scientific Notation in office 2007
    By wesleyw in forum How do you do....it?
    Replies: 7
    Last Post: 12th June 2007, 06:42 PM
  4. What scripting method would you recommend
    By SimpleSi in forum Scripts
    Replies: 24
    Last Post: 15th November 2006, 10:25 AM
  5. Environment variables or other method
    By HodgeHi in forum Scripts
    Replies: 3
    Last Post: 21st June 2006, 09:05 AM

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •