# How Big Data Analytics is shaping Astronomy

## With better telescopes and imaging techniques, new-age astronomy is neck deep into Big Data

Who has not had their favorite pastime as star gazing through grandpa’s telescope? No matter what age, one is always in love with exploring the unknown—to know what lies beneath or what our origins are. The skies have always provided the canvas to one’s imagination. From Galileo till the present times, astronomy has evolved and grown at an unprecedented rate, particularly in the past decade. Astronomy—in this new era—is making massive improvements in our investigation of the Universe, laying bare the secrets of dark energy and dark matter, the formation and evolution of galaxies, and the structure of our own Milky Way.

**With the advent of better bigger telescopes, imaging techniques, and the development of ground-based and space-born sky surveys, the new-age astronomy is neck deep into the world of huge volumes of date**. The Two Micron All-Sky Survey (an extensive sky observation) has already yielded data running into petabytes, and now there is Exabyte (one exabyte is equal to one quintillion bytes). Astronomy, in similitude to several other disciplines and industries, is facing a data tsunami, necessitating radical changes in the methods used for scientific research. The tools for capturing data are present but the methods to decipher them are still evolving and one of the major challenges is to analyze and interpret the data. There has been paradigmatic shift in research in **astronomy as it has transformed from being particularly hypothesis-driven to being data-driven to being data-intensive**. Today, astronomy finds itself in the mesh of humongous amounts of data, which has only grown in terms of volumes, rates, and complexity.

## What is Big Data Analytics?

Big data is a term for data sets that are so huge or complex that traditional data-processing methods are insufficient. **Big Data Analytics** is the process of probing those large data sets to uncover hidden patterns and unknown correlations, leading to the discovery of the interpretation and communication of meaningful trends, leading to discoveries. Analytics is used to process other forms of data ranging from **gamma rays and X-rays, ultraviolet, optical, and infrared to radio bands**.

Alberto Conti, Innovation Scientist for the James Webb Space Telescope, the successor to the Hubble Space Telescope, said in an interview: “There are two reasons that astronomy is experiencing this accelerating explosion of data. First, we are getting very good at building telescopes that can image enormous portions of the sky. Second, the sensitivity of our detectors has increased enormously. That means that these enormous images are increasingly dense with pixels, and they’re growing fast—the Large Synoptic Survey Telescope has a three-billion-pixel digital camera. So far, our data storage capabilities have kept pace with the massive output of these electronic stargazers. The real struggle has been figuring out how to search and synthesize that output.”

## Big Data stretched to astronomical proportions

The vastness of the Universe is known to all. For the uninitiated, the number of galaxies in the observable universe alone may be between 170 and 200 billion, with each one containing stars ranging from a few thousand to several trillions. It’s mind boggling. And corresponding to the size of the Universe the data getting captured is also astronomical.

This is where Big Data Analytics comes into the picture. It helps scientists to efficiently discover useful information from huge amounts of data which has to undergo a number of processes: **summarization, classification, regression, clustering, association, time-series analysis, and outlier/anomaly detection**. All the actionable data is run through specialized algorithms (traditional ones just give up) and mathematical models. To convert raw data into knowledge, it has to undergo **generation, collection, transformation, storage, management, pre-processing, mining, visualization, understanding, evaluation, and explanation**—all of which depend on the tools for analysis. Larger scale, deeper, multi-wavelength sky surveys lead to a dimensional increase in astronomical data while high-dimensional data cause the so-called curse of dimensionality (problems of analyzing and organizing data in high-dimensional spaces), that is, the breakdown of conventional methods of mathematics. Consequently, highly specialized analytical models and algorithms are required to capture and analyse the data.

To solve such problems, a number of innovative methods have been devised, for example, the Multivariate Data Analysis which provides useful algorithms for astronomy. **AstroML is a module developed for machine learning and data mining, built on numpy, scipy, scikit-learn, matplotlib, and astropy** (programming language extensions and libraries used by scientists, analysts, and engineers for scientific and technical computing including modules for **optimization, linear algebra, integration, interpolation, special functions, and multi-dimensional arrays and matrices**, along with a large library of high-level mathematical functions to operate on these arrays). In order to effectually investigate astronomical data, the tools also include a library of statistical and machine learning routines and several open astronomical datasets and provide a large suite of examples of analyzing and visualizing astronomical data-sets. Similarly, **AstroWeka **is an open-source, user-friendly data mining tool that focuses on astronomical data mining tasks and implements machine learning algorithms for various data mining tasks, for example, data pre-processing, classification, regression, clustering, association rules, and visualization.

Apart from Analytics there are problems of storage as well. “Cloud is a very good option for solving the problem of storage for such huge amounts of data. The technology to handle such copious amounts of data are already evolving fast,” said

Somesh Misra, VP, Operations, Deskera, a market leader in Cloud-based Analytics.

## Age of collaboration for astronomy

The challenge of Big Data Analytics for astronomy cannot be handled by the discipline alone. To tackle the challenge, an **interdisciplinary approach** is required. Consequently, there has been wide-ranging collaboration among astronomers, statisticians, computer scientists, data scientists, and information scientists. Several organizations have been established to facilitate such exchanges, for instance, the International Astrostatistics Association (IAA), the American Astronomical Society Working Group in Astroinformatics and Astrostatistics (AAS/WGAA), the Information and Statistical Sciences Consortium of the planned Large Synoptic Survey Telescope (LSST/ISSC), among others. Moreover, computers with exascale capabilities will have to be built, which can carry out 1 quintillion (or 1 million trillion) floating point operations per second, a thousand times faster than the most powerful supercomputers of this age.

Instruments of the future will generate increasingly massive data. How much we come to know of the Universe will depend on the effectiveness of mankind’s analytical tools and its ability to take the challenge head-on. But then therein lies the challenge.