BigData MachineLearning

From New Media Business Blog

Jump to: navigation, search



"Big Data" is a topic that is actively discussed in technology companies. Some of them had to give up on big data, others - on the contrary, make the most of their business. The term Big Data is a new concept with only a 5-year history that is being used in various fields and industries: video production and analysis, real-time bidding, sociology, medicine, aerospace, finance and many more. When we think of Big Data, the three Vs come to mind – volume, velocity and variety. The benefit gained from the ability to process large amounts of information is the main attraction of big data analytics. Another important attribute is velocity — the rate at which data flows into an organization, if this rate is high, you are probably dealing with Big Data. Finally, there is Variety. Rarely does data present itself in a form perfectly ordered and ready for processing. Often in big data systems the source data is diverse, and doesn’t have strictly defined structure. It could be tweets from social networks, video steams, or pairs of related goods from an online store. A common use of big data processing is to take large volume of unstructured data that is coming on a fast rate and extract ordered meaning, for consumption either by humans or as a structured input to an application.

Big Data

At the moment, Big Data is one of the key drivers of IT development. In the era of information technology, especially since the boom of social networks, for each Internet user, there has been accumulating a significant amount of information that ultimately gave the development vector for Big Data. The term "Big Data" is under a lot of debate, since many believe it simply means the amount of accumulated information, but one should not forget about the technical side, as this concept also includes storage technology, computing, and services. Big Data is associated with the processing of the large amount of information that is difficult to process by conventional methods. The scope of Big Data is characterized by the following features: Volume - the size of accumulated database is a large amount of information, which is a time-consuming process and store traditional methods, they require a new approach and advanced tools. Velocity - the speed; this trait indicates both the increasing speed of accumulating data (90% of the information has been collected over the past 2 years), and the processing speed. Also, real time big data processing has recently become a popular technology. Variety - the variety, i.e. simultaneous processing of structured and unstructured information multi-format. The main feature of structured information is that it can be categorized. An example of structured data is information about customer transactions. Unstructured information includes videos, audio files, free text, information from social networks. To date, 80% of the information could be classified and treated as unstructured. This information needs to undergo a complex analysis to become suitable for further processing. Veracity - the accuracy of information since reliability of available data is becoming more and more important. Value - the value of the accumulated information. Big Data should be useful and possess certain value. For example, it helps to improve business processes, reporting and optimization of costs.

Subject to the above 5 conditions, the accumulated information can be attributed to Big Data.

Application Scope

The application scope of Big Data technologies is very extensive. With the help of Big Data, a company can study its customers' preferences, the effectiveness of marketing campaigns and analyze the risks. Below are the results of an IBM Business Institute survey[1], pertaining to the use Big Data by companies. As seen from the chart, most of the companies use Big Data in customer service, the second most popular areas are operations efficiency and risk management.

Big Data Applications. Source: IBM Institute for Business Value

It should also be noted that Big Data is one of the fastest growing areas of information technology, as the total amount received and stored data doubles every 1.2 years. Over the period from 2012 to 2014, the amount of data transmitted monthly by mobile networks increased by 81%. According to the Cisco[2], in 2014 the volume of mobile data traffic amounted to 2.5 exabytes (unit amount of information equal to 10^18 bytes standard) per month, and in 2019, it will amount to 24.3 exabytes. Thus, Big Data - it is an established field of technology, which has, despite its relatively young age, become widespread in many areas of business and has an important part in the development of companies.

Big Data Technologies

The technologies used to collect and process Big Data can be divided into 3 groups:


The most common approaches of data processing in terms of software include: SQL - Structured Query Language, it allows you to work with databases. Using SQL, you can create and modify data, and control data set at the appropriate database management system. NoSQL - the term stands for Not Only SQL. It includes a number of approaches to the implementation of the database, with differences from the models used in traditional relational databases. NoSQL is mostly used to store and process unstructured data. They are easy to use in an ever-changing data structure. For example, for the collection and storage of information in social networks. MapReduce - model of distributed computing. It is used for parallel computing of very large data sets (terabytes and more). In this model it is not the data that is being passed to the processing program, but the processing program is being passed to the data. The main principle of operation is sequential processing data in two ways Map and Reduce. Map selects the preliminary data, Reduce aggregates them.

What is Hadoop

SAP HANA - high performance NewSQL platform for storing and processing data. It provides high-speed query processing. Another distinctive feature is that SAP HANA simplifies the system landscape, reducing the cost of supporting analytical systems.


Technological equipment (hardware) for Big Data storage and processing include servers, data storages, and infrastructure equipment. Infrastructure equipment includes various hardware accelerators (Solid State Storage massive, Hardware Computational Accelerators, etc.,), uninterruptible power supplies, kits and other server consoles.


Services include services for the construction of the architecture of the database system, the arrangement and optimization of infrastructure and security of data storage . Software, equipment and maintenance services together form a comprehensive platform for data storage and analysis. Companies such as Microsoft, HP, EMC offers services to develop, deploy big data solutions and management.

Business Uses

Big Data is widely used in many types of business. Examples are health care, telecommunications, trade, logistics, financial companies, as well as in public administration. Here are a few examples of big data in some of the industries.


The retail databases accumulate information about customer transactions, stocks and supplies. Using the stored information a company can better control the supply of goods, storage and sale. Based on the accumulated data the company can predict the demand and supply of goods. Also, the system of processing and data analysis can solve other problems of a retailer, for example optimization of costs or prepare statements.

Financial institutes

Financial institutes use Big Data to analyze the creditworthiness of the borrower. Also Big Data is being used in the areas of credit scoring and underwriting. The introduction of Big Data technologies has reduced the consideration of loan applications. With more data can be analyzed operation of a particular customer and offer suitable to him banking services.


In the telecommunications industry widely Big Data is by mobile operators and Internet service providers. Cellular operators along with financial institutions store and process massive amounts of information even on the Big Data scale. That allows them to conduct more depth analysis of the accumulated information. The main purpose of data analysis is to keep existing customers and attract new ones. To do this, companies perform customer segmentation, analyze their traffic, determine the social class of the subscriber. In addition to the use of Big Data for marketing purposes, the technology used to prevent fraudulent financial operations.

Mining and petroleum industries

Mining and petroleum industries use Big Data in both operations and marketing. Enterprises can on the basis of the information received to draw conclusions about the effectiveness of development of the oil deposit, track schedule overhaul and equipment status, predict product demand and prices.

According to the survey Tech Pro Research, the most widespread large data received in the telecommunications industry, as well as engineering, IT , financial and state-owned enterprises. Also, according to the results of the survey, Big Data is less popular in education and health.

Big Data By Industry

Examples of the use of Big Data for companies

Companies such as Nasdaq, Facebook, Google, IBM, VISA, Master Card, Bank of America, HSBC, AT & T, Coca Cola, Starbucks and Netflix pioneered the use of Big Data. Big Data applications are diverse and vary according to the industry sector and a company's objectives. Below few examples of Big Data applications in real companies.


HSBC uses big data technologies to detect fraudulent transactions with plastic cards. With Big Data, the company increased the efficiency of the security services by 3 times, and the detection of fraudulent incidents - 10 times. The economic effect from the introduction of these technologies has exceeded $10 million. [1]


VISA antifraud allows vendors to automatically detect the fraudulent nature of the operation. The system has already prevented fraudulent transactions amounting to $2 billion. [2]


The supercomputer IBM Watson analyzes in real time the flow of data on cash transactions. According to IBM, Watson increased by 15% the number of detected fraudulent transactions, a reduced by 50% of false positives and by 60% increased the amount of funds that are protected from transactions of this nature. [3]

Procter & Gamble

Procter & Gamble use big data to design new products and make global marketing campaign. P&G has created special Business Spheres offices, where managers have access to real-time information on company's operations. The management of the company now have the opportunity to instantly test hypotheses and conduct experiments. [4]


Office supplies retailer OfficeMax uses Big Data technologies analyze customer behavior. Big Data analysis allowed the company to increase its revenues from B2B transactions by 13%, and to reduce costs for $400,000 per year. [5]


According to Caterpillar, its distributors annually misprofiting from 9 to 18 billion dollars only due to the fact that they not adopting Big Data. Big Data would allow the company's partners to more effectively manage the fleet by analyzing the information coming from the sensors installed on the machines. To date, it is possible to analyze the status of key components, their degree of wear, manage costs for fuel and maintenance. [6]


Luxottica Group is a manufacturer of sports eyewear that holds such brands as Ray-Ban, Persol and Oakley. The company uses Big Data technologies to analyze the behavior of potential customers, and "smart" SMS marketing. As a result, Big Data Luxottica group has put more than 100 million people to the group of the most valuable customers and improved the effectiveness of marketing campaigns by 10%. [7]

World of Tanks

Game developers World of Tanks analyze the behavior of the players. The use of Big Data technologies allows to analyze the behavior of 100 thousand World of Tanks players using more than 100 parameters (information about purchases, games, experience, etc..). An analysis is used to predict customers leaving and improve the retention rate. The developed model was 20-30% more efficient than standard analysis tools gaming industry. [8]

German Ministry of Labour

The Ministry of Labour in Germany uses Big Data in the work related to the analysis of the received applications for unemployment benefits. So, after analyzing the information, it became apparent that 20% of benefits were paid unfairly. With the help of Big Data the German Department of Labor reduced state costs by 10 billion euros. [9]

Children's Hospital of Toronto

Children's Hospital of Toronto has implemented a project Project Artemis. This information system collects and analyzes data on newborn babies in real time. Every second, the system keeps track of 1260 various indicators for each child. Project Artemis is able to predict an unstable or life threatening condition of the baby and immediately notify medical personnel. [10]

Future Developments

New Roles and Responsibilities in Data Science

The amount of data is growing every day with huge spurts. Every day, the network is filled with 2.3 trillion gigabytes of data. By 2017 it is expected that the amount of data will grow by 800%. The more data, the higher the demand for specialists in their treatment. Information Science is developing dynamically so that each specialist has a narrow zone of responsibility. Martin Jones (Martin Jones), CEO and co-founder in Cambriano Energy proposes to allocate 7 major roles in working with large data. [11]

Data Trader It is a specialist who works with alternative data sources. It forms the market and demand, market data support and constantly replenishes it with new values. Traders are looking for a potentially valuable data, exploring and implementing new streams them to the market. Data Trader also seeks and explores tools for data processing for its customers. He evaluates and predicts trends and conducts the transaction on purchase of data that may be popular in the future.

Data Hound Data Hound - this is the right hand of the trader. Once the trader made ​​a prediction for the work taken Data Hound. His task - to find the best, cheapest and most reliable source of big data and compute contact the owners and providers of these same data. Only the Data Hound can infect all the enthusiasm and inspiration to work with the new data. He has to be nice and patient and possess tremendous power of persuasion. And only he can dispel all doubts while working with the new data portal.

Data Plumber This specialist designs and supports the entire infrastructure. They provide data delivery, ensure that the data is passed all stages: preparation, purification, analysis and reporting. Data Plumber needs to make sure that all the data have been processed and have come from the supplier to the consumer data.

Data Butcher Data Butcher works in tandem with Data Shef. He selects and prepares the necessary parts supplied data, which is then transmitted to the chief for data mining, predictive analysis and visualization. Data Butcher separates interesting data unnecessary. Output fall qualitative structured data that is then analyzed. We can say that Data Butcher - is a special case of data architect.

Data Miner Without a doubt - this is the most difficult and stressful role. Miner is always busy logical and physical research. It detects and removes the most difficult of the data with the highest informational value. Most likely, these data are very deeply buried, and his task is to take the risk and take them to the surface. These data have a very high efficiency, and will be used for a long time. That's why the work of the date of miners will always be in demand in the world of big data.

Data Cleaner The main task of Data Cleaner - to identify and dispose of toxic and viral values ​​that can distort the nature of the data. They take care that the data are clean, representative and suitable for processing.

Data Chef Data Chef organizes and coordinates the work of all departments. Ideally, Chief expertise in analytics, has a solid background in statistics and a solid understanding of data architecture. And also in his resume is inscribed wide range of other skills that can be listed forever. Data Chef with Data Trader and Data Butcher finds and selects the primary raw data. And on the basis of these data Data Chef of the plan processing and selects the method of analysis, even if the data is dynamically changing in time.

Business Implications

Market trends

According to IDG Enterprise, in 2015, companies' costs related to Big Data will average in US$ 7.4 million. Large companies will spend approximately US $ 13.8 million, Small and medium - US$ 1.6 million. Most of it will be invested in data analysis and visualization and data collection. [12]

According to current trends and market demand, investment in 2015 will be used to improve data quality, to improve planning and forecasting, as well as to increase the speed of data processing. According to Bain Company's Insights Analysis, financial companies will also make ​​significant investments in Big Data. In 2015, it is planned to spend $ 6.4 billion in the finance industry. The average growth rate of investment will be 22% until 2020. Internet companies are planning to spend US$ 2.8 billion, and the average growth rate of increasing the costs of Big Data will be 26%.

IDC predicts the following market trends:

  • In the next 5 years, the cost of cloud solutions in Big Data technologies will grow 3 times faster than the cost of local solutions. Hybrid data processing platforms will be in a demand.
  • In 2015, the growth of the applications that use using sophisticated and predictive analytics, including machine learning will accelerate. The market for these applications will grow 65% faster than for the applications that do not use predictive analytics.
  • Media analytics will triple in 2015 and will be a key driver of growth in the technology market of Big Data.
  • Accelerate the trend of implementing solutions for the analysis of the constant flow of information that is applicable for the Internet of Things.
  • By 2018, 50% of users will interact with the service based on cognitive computing.


  • High cost of Big Data technology adoption;
  • The need to ensure data protection and privacy;
  • The shortage of qualified personnel;
  • Distrust of companies to these technologies;
  • Insufficient volume of stored information;
  • Database support requires constant funding, which creates an additional barrier to the introduction of Big Data;
  • The complexity of integration with existing systems;
  • A limited number of data providers.

According to an Accenture survey, data security issues is now a major barrier to technology adoption. More than 51% of respondents confirmed that they are worried about the protection of data and privacy. 47% of companies reported on the impossibility of implementation of Big Data due to the budget constraints, 41% of companies indicated as a problem the lack of qualified personnel. [13]


  1. HSBC turns to technology to protect against 'bad actors'. Retrieved from
  2. Visa’s new fraud protection software will track customers’ smartphones. The Globe and Mail. Retrieved from
  3. Using machine learning and stream computing to detect financial fraud How IBM Research can help companies save billions annually. IBM Research Retrieved from
  4. Proctor & Gamble – Business Sphere and Decision Cockpits Retrieved from
  5. IBM Case Studies. OfficeMax Retrieved from
  6. CATERPILLAR: Our Dealers Are Missing Up To $18 Billion In Easy Sales. JAMES B. KELLEHER, REUTERS Retrieved from
  7. Luxottica Eyes Predictive Analytics for Customer Decision Engine. Ellis Booker. Retrieved from
  8. Big data and battle tanks: Inside World of Tanks’ powerful infrastructure. Retrieved from
  9. Studie über die Gewinnung und Bindung von Fachkräften. Bundesministerium für Arbeit und Soziales Retrieved from
  10. Project Artemis. University of Ontario Retrieved from
  11. 7 New Big Data Roles for 2015. Martyn Jones Retrieved from
  12. Data Analytics Dominates Enterprises' Spending Plans For 2015. Luis Columbus. Forbes Retrieved
  13. New Survey from GE and Accenture Finds Growing Urgency for Organizations to Embrace Big Data Analytics to Advance their Industrial Internet Strategy Retrieved from
Personal tools