Data Science

What is Big Data?

Big Data is the next big thing in computing and generates value from very large data sets that cannot be analyzed with traditional computing techniques the quantity of computer data generated on planet earth is growing exponentially for many reasons for start retailers are building vast databases of recorded customer activity organizations working in logistics financial services healthcare and many other sectors are also capturing more data and public social media is creating vast quantities of digital material as vision recognition improves it is additionally starting to become possible for computers to extract meaningful information from still images and video as more smart objects go online big data is also being generated by an expanding Internet of Things and finally several areas of scientific advancement are starting to generate and rely on vast quantities of data that were until recently almost unimaginable you big data is often characterized using the three V’s of volume velocity and variety here volume poses both.

The greatest challenge and the greatest opportunity as big data could help many organizations to understand people better and to allocate resources more effectively however traditional computing solutions like relational databases are not scalable to handle data of this magnitude big data velocity also raises a number of issues with the rate at which data is flowing into many organizations now exceeding the capacity of their IT systems in addition users increasingly demand data which is streamed to them in real time and delivering.

This can prove quite a challenge finally the variety of data types to be processed is becoming increasingly diverse gone are the days when data centers only had to deal with documents financial transactions stock records and personnel files today photographs audio video 3d models complex simulations and location data are being piled into many a corporate data silo many such big data sources are also unstructured and hence not easy to categorize let alone process with traditional computing techniques due to the challenges of volume velocity and variety many organizations at present have little choice but to ignore or rapidly.

Excrete large quantities of potentially valuable information indeed if we think of organizations as creatures that process data then most are rather primitive forms of life their sensors and IT systems are simply not up to the job of scanning and interpreting the vast oceans of data in which they swim as a consequence most of the data that surrounds organizations today is ignored a large proportion of the data that they gather is then not processed with a significant quantity of useful information passing straight through them as data exhaust for example until recently the majority of the data captured by retailer loyalty cards was not processed in any way and almost all video data captured by hospitals during surgery is still deleted within weeks today .

the leading Big Data technology is Hadoop this is an open-source software library for reliable scalable distributed computing and provides the first viable platform for big data analytics Hadoop is already used by most big data pioneers for example LinkedIn currently uses it to generate over 100 billion personalized recommendations every week Hadoop distributes the storage and processing of large datasets across groups or clusters of server computers whereas traditional large-scale computing solutions rely on expensive server hardware with a high fault tolerance Hadoop detects and compensates for hardware failures or other system problems at the application level this allows a high level of service continuity to be delivered by clusters of individual computers each of which may be prone to failure.

This image has an empty alt attribute; its file name is big-data-1667184_1280-1024x723.jpg

Technically Hadoop consists of two key components the first is the Hadoop distributed file system which permits high bandwidth cluster-based storage the second is a data processing framework called MapReduce based on Google search technology MapReduce distribute or Maps large data sets across multiple servers each server then creates a summary of the data it’s been allocated all of this summary information is then aggregated in a so-termed reduced stage MapReduce subsequently allows extremely large raw data sets to be rapidly distilled before more traditional data analysis tools are applied for organized who cannot afford an internal.

Big data infrastructure cloud-based big data solutions are already available where public big datasets need to be utilized running everything in the cloud also makes a lot of sense as data does not have to be downloaded for example amazon web services already hosts many public data sets containing government and medical information looking further ahead quantum computing may greatly improve big data processing quantum computers stall and processed data using quantum mechanical States and will in theory excel at the massively parallel processing of unstructured data as IBM explained.

Big Data provides an opportunity to find insight in new and emerging types of data or as Oracle put it big data holds the promise of giving enterprises deeper insight into their customers partners and business in time Big Data may also help farmers to accurately forecast bad weather and crop failures meanwhile governments may use big data to predict and plan for civil unrest or pandemics in a recent report in McKinsey Global Institute estimated that the US healthcare sector alone could achieve 300 billion dollars in efficiency and quality savings every year by leveraging big data across Europe they also estimate that using big data could save at least 149 billion in government administration costs in March 2012 the US government announced a 200 million dollar investment in Big Data projects you by the end of 2015 Cisco estimate that global Internet traffic will reach 4.8 zettabytes a year that’s 4.8 billion terabytes and indicates both the big data challenge and the big data opportunity ahead more information on big data can be found on explaining computers comm but now that’s it for another article.

Leave a Comment