What is Big Data Analytics?
Companies use big data analytics to uncover new and exciting insights in large and varied datasets. It helps them forecast market trends, identify hidden correlations between data flows, and understand their customers’ preferences in fine detail. These analytical insights help teams make quick, informed decisions and build satisfying products. But big data requires special infrastructure and a respect for the data science process.
Big data analytics isn’t a synonym for data science, though the two are often confused. It’s just one step in the method, where teams actually collect and analyze the big data—coffee in hand—and run queries and view reports. Like physics, data science is the entire field of study, whereas big data analytics is just a tool and an activity, like a lab experiment—albeit one with stunning versatility.
Benefits of big data analytics
Big data analysis helps companies manage unusually large, complex, and fast-changing datasets that conventional business intelligence (BI) tools can’t handle. For example, most BI tools only analyze structured data, which is numerical data that can be organized into tables like a spreadsheet. But big data includes structured data as well as unstructured data (like the contents of users’ comments) and semi-structured data (like user flows). Big data analytics tools are able to process all data types.
Big data also moves fast. The data must be updated constantly, queried frequently, and the entire system has to be backed up and secured. Teams use big data analytics to process those fast and varied data volumes.
For teams that invest in big data analytics software, the benefits are numerous. They’re able to quantify new areas of their business’ performance, compare formerly siloed data sets, and improve their marketing, customer service, product development, and internal analysis.
Benefits of using big data analytics:
- Uncover the need for new features or products
- Understand the full customer journey
- More effective marketing
- More effective customer support
- Greater responsiveness to market trends
A brief history of big data analytics
Data didn’t get big—it was always big. The presence of lightning-fast processors and the internet didn’t imbue humans with greater need to seek help or entertainment. It simply quantified their actions and made the world visible in a way that would have made 19th century mathematician Pierre-Simon Laplace—a founder of the field of statistics—drool.
Laplace is best remembered for his theory that a person who could somehow know the movement and position of every atom in the universe could effectively predict the future. “If this intellect were vast enough to submit these data to analysis,” Laplace wrote in 1814, “it would embrace in a single formula the movements of the greatest bodies of the universe and those of the tiniest atom; for such an intellect nothing would be uncertain and the future—just like the past—would be present before its eyes.”
After his death, the postulate was named “Laplace’s demon” because it’s a tantalizing idea that’s entirely unprovable insofar as no one has ever acquired a perfect and unlimited quantity of information. But with big data analytics, humans are creeping closer.
Weather forecasting, for example, has improved dramatically since the 1960s. The earth’s atmosphere is among the most complex and unpredictable systems in existence, but meteorologists using big data analytics tools have increased the granularity of their weather maps by 50x and now make forecasts that are almost twice as accurate. Scientists have made similar strides in genetics. The first genome project took 13 years to complete and cost $3 billion. Now, with advances in big data computing, the process takes days and costs less than $1,000.
Advances in the field of supercomputing have trickled into the business world. Now, even small startups can access big data analytics tools that were once the purview of states and governments.
“The degree to which small businesses can capture and analyze their own data is astonishing, compared to where companies were even ten years ago,” says Joe Corey, a Data Scientist and founder of the software development platform Subspace. “It’s like Google Maps—in the 1980s, that was something only spy agencies had. Now, it’s in everyone’s pockets. Previously, only giant corporations had product data. Now a small business owner has them on their Macbook.”
Tools of big data analytics
Teams need unique infrastructure to conduct big data analytics because access to data is growing exponentially faster than their ability to store it, which is actually slowing down. Humans generated more data in 2017 than in all of prior recorded history, but storage chips reached “a plateau of transistor chip density” because chip components have shrunk to the size of atoms and can’t get smaller, reports the MIT Technology Review.
Many news publications have pronounced Moore’s Law—the biennial doubling of processor power—dead. But data flows continue grow. Researchers are scrambling to tap alternative non-silicon storage materials and methods, such as quantum computing. But for the foreseeable future, any team with an unwieldy volume of data needs specialized technology to handle it. For example:
Hadoop is a set of open source programs and procedures for storing massive amounts of data across a network of many computers, and allowing them to work in parallel. It was developed by the open source software non-profit Apache Software Foundation in 2006 to allow teams to read and write data faster and is still widely used.
NoSQL is a term used to refer to a non-relational database, or a database that stores and retrieves files faster than a traditional SQL database. ‘NoSQL’ stands for ‘non server query language.’ As in, it’s not your typical server query method.
SQL, which organizes data into a spreadsheet-like table of columns and rows, works just fine for most computers but isn’t suited to big data because sorting takes time. NoSQL databases move much more quickly because they simply store files in their native format and retrieve them based on a system of tags.
Teams may use databases running NoSQL to create what’s known as a data lake, or a repository for a vast amount of raw data in its native format. Data lakes are a sort of holding area, where data is only identified by its tags and metadata. A lake may be connected to what are known as data warehouses, which are repositories for structured files and folders, which often use SQL.
What’s a data ecosystem?
A data preparation and storage process
Data presents a threat as well as an opportunity. Businesses can use it to glean insights, but storing data also exposes them to hacks and leaks. For example, the supermarket chain BJ’s Wholesale Club was censored by the US Federal Trade Commission for storing customers’ credit card information for up to 30 days—far longer than it was needed—which made it available to hackers.
To keep data safe:
- Assign an individual or an entire team to manage the data.
- Build systems to cleanse and expunge old data.
- Publish rules on data governance.
- Invest in data security.
Advanced big data analysis tools
Teams need quick access to all their stored data, preferably from one interface, which is why most teams build their big data and analytics technology stack in a hub and spoke model featuring one primary analytics platform that provides most of the needed functionality, plus bolt-on solutions. The hub and spoke model keeps the working data in one central place and prevents errors and inconsistencies between multiple copies of the data.
The vast quantity and velocity of big data can make it difficult for humans to keep up, and many teams automate parts of their big data analysis with machine learning. Algorithms can identify anomalies or relationships humans don’t have the patience or acumen to find, helping teams focus their limited time and attention.
Big data applications and examples
Shopping app Grabr increases merchandise volume 117x
The peer-to-peer shopping app Grabr provides a way for travelers visiting a foreign country to “grab” purchases for shoppers back home. With over one million users on the platform moving between hundreds of countries, the team wanted to eliminate the guesswork in figuring out what prices both parties in a transaction were willing to pay, and where they planned to travel. The team deployed Mixpanel and were able to view where users flowed through the app, to model their behavior, and to identify correlations that helped the team fix bugs and increase gross merchandise volume 117x.
Read the Grabr case study.
Media giant STARZ PLAY fights fraud and saves more than 8x marketing spend
The subscription-based video brand STARZ PLAY needed to understand the millions of viewers who access its content every day, across a variety of devices. The team launched Mixpanel to ingest as many as 1.5 million user sessions per day and analyzed user flows to see where they came from, what they clicked on, and where they exited. The team discovered that some users were exploiting a loophole in its free trial offer and enjoying multiple trials without paying. STARZ PLAY closed the loophole, deactivated the fraudulent accounts, and saved more than 8x on its marketing spend.
Read the STARZ PLAY case study.
Messaging app Viber increases chat volume 15 percent
The messaging app Viber needed to analyze its over one billion active users as they traded trillions of messages between hundreds of countries. The team wanted to understand user data to help friends, family, and communities around the world more clearly express themselves, so they deployed Mixpanel to analyze billions of events and user properties, and to A/B test variations in the app interface. When the team landed upon a keyboard variant that increased the number of chats users traded, they deployed it to the entire user base and increased overall messaging 15 percent.
Read the Viber case study.
Challenges with big data
Big data is so called because of its quantity, velocity, and variety. But it’s also a description of the potential uses and abuses. Adding more data sources and infrastructure can raise the potential for software bugs and data loss. More team members with access means more opportunity for human error and security breaches. Over time, these issues can compound and erode the data quality and create new risks.
Large organizations with many data sources suffer their own set of problems. They’re unusually prone to develop data silos, where new sources aren’t connected to or reconciled with older ones. As different teams work off diverging data sets, they can get into disagreements and struggle to agree on what’s really true.
To keep data clean, secure, and useful, teams must fight the process of entropy. Many commission a data team to publish strict guidelines for safeguarding and incorporating new data into the data ecosystem, and are careful in selecting big data and analytics vendors who help them consolidate their tools into as few interfaces as possible. It’s only through active effort that teams can make harnessing big data a small feat, and uncover massive insights on a global scale.