Papers

Paper Octubre 2018

TensorFit a tool to analyse spectral cubes in a tensor mode

Author(s): H.Farias, C.Nuñez, M.Solar

As it is already known, modern observatories like the Atacama Large Millimeter/submillimeter Array (ALMA) and the Very Long Baseline Array (VLBA) generate large-scale data, which will be accentuated with the incorporation of new observatories, such as the Square Kilometre Array (SKA). It is projected by 2020 to obtain an archived astronomical data in a PB-scale (60 Petabyte). The Chilean Virtual Observatory (ChiVO) has stored the spectral cubes of ALMA and seeks to offer these data openly to the community, but downloading and processing these data should be done in its facilities. To this end, our proposal considers the cubes as a high order tensor, specifically 3-way tensor with 2 spatial dimensions (galactic latitude and longitude), and a velocity dimension. This opens a new approach and opportunity for computational prohibitive massive analysis of these cubes. Based on this premise, we propose TensorFit, a natural and scalable library to handle spectral cubes in a tensor mode. The implementation is built on parallel oriented frameworks, and distributed processing of n-arrays on PyTorch (GPU and CPU). To verify the impact of this proposal, our focus is on showing the benefits of tensor compression, in particular to Tucker implementations. These have demonstrated outstanding results of dimensionality reduction of multidimensional data in other scientific domains.

Paper Septiembre 2018

JOVIAL: Notebook-based astronomical data analysis in the cloud

Author(s): M. Araya and M. Osorio and M. Díaz and C. Ponce and M. Villanueva and C. Valenzuela and M. Solar

Performing astronomical data analysis using only personal computers is becoming impractical for the very large data sets produced nowadays. As analysis is not a task that can be automatized to its full extent, the idea of moving processing where the data is located means also moving the whole scientific process towards the archives and data centers. Using Jupyter Notebooks as a remote service is a recent trend in data analysis that aims to deal with this problem, but harnessing the infrastructure to serve the astronomer without increasing the complexity of the service is a challenge. In this paper we present the architecture and features of JOVIAL, a Cloud service where astronomers can safely use Jupyter notebooks over a personal space designed for high-performance processing under the high-availability principle. We show that features existing only in specific packages can be adapted to run in the notebooks, and that algorithms can be adapted to run across the data center without necessarily redesigning them.

Paper Julio 2018

ChiVOLabs: cloud service that offer interactive environment for reprocessing astronomical data

Author(s): Humberto A. Farias; Daniel Ortiz; Camilo Núñez; Mauricio Solar; Margarita Bugueno

The advancement of technology in telescopes and observation instruments has allowed more and more data to be available with higher resolutions and quality, but the way to obtain this data is the same, astronomers are conditioned obtain observation time. Once the data is public in most cases fall into disuse and except with initiatives such as virtual observatories that seek to offer this data over standards and protocols for storage and access, the data are not used again. Our proposal seeks to recover these large volumes of observations already made and offer them as a cloud service from a High Performance Computing environment to be reprocessed by the scientific community with the aim of generating new science from them, for this in the Chilean Virtual observatory we have created ChiVOLabs, a system that has a microservice architecture on Docker that allows us, based on searches in our virtual observatories, to offer the possibility of reprocessing ALMA data through a Jupyter Notebook interface. The process start using the IVOA protocol: Simple Cone Search to find the astronomical data and then select the option to reprocess them. Our architecture create a DOCKER environment with the entire analysis stack of ChiVO and Python in our datacenter, furthermore transforms the reduction script used to generate the original product in a jupyter notebook automatically connected to our CASA cluster (Pipleine of reduction of ALMA). Finally we search in our datalake raw ALMA data, known as ADSM, from which is generated the selected product and are created a link to these raw in the container. With this we offer the scientific community the possibility of reprocessing these data according to their scientific interests and generating new data products that can finally be downloaded or continue to be working with our libraries using the full power of our datacenter

© (2018) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.

Paper Julio 2018

Tensor representation, constrain (storage) and processing of multidimensional astronomical data over intense computing support

Author(s): Humberto Farias; Camilo Nuñez; Mauricio Solar

The big data problem in Astronomy is a well now know issue, but in majority of the cases is constrains to volume of this data. Our propose take care of another aspect of the problem: the dimensionality problem, in the scope of multidimensional data especially Astronomical data cubes. We use tensor decompositions for two goals, first using tucker we achieve super compression rates that allowed to saving until 91% disks space and network traffic and second using a CANDECOMP/PARAFAC or Canonical Decomposition (CP) we build a system to find the multi-linear manifold in this astronomical cubes. Because this is a problem of BigData, for ours library we test three implementations: One approach over an intense use of GPU supported by PyTorch and using the traditional approach of HPC using MPI. Our proposed start from a simple but powerful idea, if we are dealing with multidimensional data (astronomical cubes), Why are we limited to use bi-dimensional techniques?. For example we use PCA for dimensionality reductions in spectral cubes instead of multidimensional approach that preserve the multi linear manifold inside this multidimensional data, we propose to pass from a linear algebra approach to a multi-linear algebra approach using tensor theory.

Paper Agosto 2016

Cloud services on an astronomy data center

Author(s): Mauricio Solar, Mauricio Araya, Humberto Farias, Diego Mardones, and Zhong Wang.

The research on computational methods for astronomy performed by the first phase of the Chilean Virtual Observatory (ChiVO) led to the development of functional prototypes, implementing state-of-the-art computational methods and proposing new algorithms and techniques. The ChiVO software architecture is based on the use of the IVOA protocols and standards. These protocols and standards are grouped in layers, with emphasis on the application and data layers, because their basic standards define the minimum operation that a VO should conduct. As momentary verification, the current implementation works with a set of data, with 1 TB capacity, which comes from the reduction of the cycle 0 of ALMA. This research was mainly focused on spectroscopic data cubes coming from the cycle 0 ALMA's public data. As the dataset size increases when the cycle 1 ALMA's public data is also increasing every month, data processing is becoming a major bottleneck for scientific research in astronomy. When designing the ChiVO, we focused on improving both computation and I/ O costs, and this led us to configure a data center with 424 high speed cores of 2,6 GHz, 1 PB of storage (distributed in hard disk drives-HDD and solid state drive-SSD) and high speed communication Infiniband. We are developing a cloud based e-infrastructure for ChiVO services, in order to have a coherent framework for developing novel web services for on-line data processing in the ChiVO. We are currently parallelizing these new algorithms and techniques using HPC tools to speed up big data processing, and we will report our results in terms of data size, data distribution, number of cores and response time, in order to compare different processing and storage configurations. © (2016) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.

Paper Febrero 2016

Indexing data cubes for content-based searches in radio astronomy

Author(s): Mauricio Araya, Gabriel Candia, Rodrigo Gregorio, Marcelo Mendoza, and Mauricio Solar

Methods for observing space have changed profoundly in the past few decades. The methods needed to detect and record astronomical objects have shifted from conventional observations in the optical range to more sophisticated methods which permit the detection of not only the shape of an object but also the velocity and frequency of emissions in the millimeter-scale wavelength range and the chemical substances from which they originate. The consolidation of radio astronomy through a range of global-scale projects such as the Very Long Baseline Array (VLBA) and the Atacama Large Millimeter/submillimeter Array (ALMA) reinforces the need to develop better methods of data processing that can automatically detect regions of interest (ROIs) within data cubes (position–position–velocity), index them and facilitate subsequent searches via methods based on queries using spatial coordinates and/or velocity ranges. In this article, we present the development of an automatic system for indexing ROIs in data cubes that is capable of automatically detecting and recording ROIs while reducing the necessary storage space. The system is able to process data cubes containing megabytes of data in fractions of a second without human supervision, thus allowing it to be incorporated into a production line for displaying objects in a virtual observatory. We conducted a set of comprehensive experiments to illustrate how our system works. As a result, an index of 3% of the input size was stored in a spatial database, representing a compression ratio equal to 33:1 over an input of 20.875 GB, achieving an index of 773 MB approximately. On the other hand, a single query can be evaluated over our system in a fraction of second, showing that the indexing step works as a shock-absorber of the computational time involved in data cube processing. The system forms part of the Chilean Virtual Observatory (ChiVO), an initiative which belongs to the International Virtual Observatory Alliance (IVOA) that seeks to provide the capability of content-based searches on data cubes to the astronomical community.

Paper Noviembre 2015

Open skies with cloud computing

Author(s): Mauricio Solar and Mauricio Araya

This article shows that high costs to obtain data in astronomy justifies the use of cloud computing. In such a field, like others in science, there are several benefits when this data is freely available to the whole scientific community. The availability of these data will foster innovative applications to exploit and explore data. These applications could be offered as a service in the cloud. A National Data Center for e-Science could take advantage of these high costs data and offer open data to the community to maximize data accessibility.

Paper Octubre 2015

Chilean virtual observatory

Author(s): Mauricio Solar, Mauricio Araya, Luis Arevalo, Ricardo Contreras, Victor Parada, and Diego Mardones

This paper presents the challenges, architecture and current status of the Chilean Virtual Observatory (ChiVO), which is a software infrastructure for accessing and processing astronomical data generated in Chile. As ChiVO is part of the International Virtual Observatory Alliance (IVOA), we strictly follow the protocols and standards that this organization produce. However, there are always open challenges due to the new observational technologies and local requirements that motivates research on every new virtual observatory, such as the complex data models and Big Data problems that the ALMA Observatory is confronting. The current ChiVO prototype includes IVOA compliant services as well as new solutions designed for ALMA data, all of them using modern software technologies.

Paper Marzo 2015

A brief survey on the Virtual Observatory

Author(s): Mauricio Araya, Mauricio Solar, Jonathan Antognini

This paper presents a short survey on the astronomical Virtual Observatories (VOs) that are part of the International Virtual Observatory Alliance (IVOA). From how they are distributed worldwide to their specialization range on the electromagnetic spectrum, we summarize key aspects of the current 21 VO members. Through the registry service, we explore the resources that the VO offer as a whole, and identify that even though the VO is already a mature initiative, there are important challenges to address in the near future.

Paper Octubre 2014

Exorcising the Ghost in the Machine: Synthetic Spectral Data Cubes for Assessing Big Data Algorithms

Author(s): Mauricio Araya, Mauricio Solar, Diego Mardones, Teodoro Hochfärber

The size and quantity of the data that is being generated by large astronomical projects like ALMA, requires a paradigm change in astronomical data analysis. Complex data, such as highly sensitive spectroscopic data in the form of large data cubes, are not only difficult to manage, transfer and visualize, but they also turn unfeasible the use of traditional data analysis techniques and algorithms. Consequently, the attention have been placed on machine learning and artificial intelligence techniques, to develop approximate and adaptive methods for astronomical data analysis within a reasonable computational time. Unfortunately, these techniques are usually sub optimal, stochastic and strongly dependent of the parameters, which could easily turn into "a ghost in the machine" for astronomers and practitioners. Therefore, a proper assessment of these methods is not only desirable but mandatory for trusting them in large-scale usage. The problem is that positively verifiable results are scarce in astronomy, and moreover, science using bleeding-edge instrumentation naturally lacks of reference values. We propose an Astronomical SYnthetic Data Observatory (ASYDO), a virtual service that generates synthetic spectroscopic data in the form of data cubes. The objective of the tool is not to produce accurate astrophysical simulations, but to generate a large number of labelled synthetic data, to assess advanced computing algorithms for astronomy and to develop novel Big Data algorithms. The synthetic data is generated using a set of spectral lines, template functions for spatial and spectral distributions, and simple models that produce reasonable synthetic observations. Emission lines are obtained automatically using IVOA's SLAP protocol (or from a relational database) and their spectral profiles correspond to distributions in the exponential family. The spatial distributions correspond to simple functions (e.g., 2D Gaussian), or to scalable template objects. The intensity, broadening and radial velocity of each line is given by very simple and naive physical models, yet ASYDO's generic implementation supports new user-made models, which potentially allows adding more realistic simulations. The resulting data cube is saved as a FITS file, also including all the tables and images used for generating the cube. We expect to implement ASYDO as a virtual observatory service in the near future.

Paper Octubre 2014

Evaluating a NoSQL alternative for Chilean Virtual Observatory Services

Author(s): Jonathan Antognini Mauricio Araya, Mauricio Solar, Camilo Valenzuela, Francisco Lira.

Currently, the standards and protocols for data access in the Virtual Observatory architecture (DAL) are generally implemented with relational databases based on SQL. In particular, the Astronomical Data Query Language (ADQL), language used by IVOA to represent queries to VO services, was created to satisfy the different data access protocols, such as Simple Cone Search. ADQL is based in SQL92, and has extra functionality implemented using PgSphere. An emergent alternative to SQL are the so called NoSQL databases, which can be classified in several categories such as Column, Document, Key-Value, Graph, Object, etc.; each one recommended for different scenarios. Within their notable characteristics we can find: schema-free, easy replication support, simple API, Big Data, etc. The Chilean Virtual Observatory (ChiVO) is developing a functional prototype based on the IVOA architecture, with the following relevant factors: Performance, Scalability, Flexibility, Complexity, and Functionality. Currently, it's very difficult to compare these factors, due to a lack of alternatives. The objective of this paper is to compare NoSQL alternatives with SQL through the implementation of a Web API REST that satisfies ChiVO's needs: a SESAME-style name resolver for the data from ALMA. Therefore, we propose a test scenario by configuring a NoSQL database with data from different sources and evaluating the feasibility of creating a Simple Cone Search service and its performance. This comparison will allow to pave the way for the application of Big Data databases in the Virtual Observatory.

Paper Febrero 2014

Automatic detection and automatic classification of structures in astronomical images

Author(s): Rodrigo Gregorio, Mauricio Solar, Diego Mardones, Karim Pichara, Ricardo Contreras, Victor Parada

The study of the astronomical structures is important to the astronomical community because it can help to identify objects, which can be classified based on their internal structure or their relation to other objects. For this reason, it is developed an automated tool to analyze astronomical images into its components. Firstly, a 2D images is decomposed into different spatial scales based on wavelet transform. Then, it is implemented a detection algorithms to each spatial scale, such as Clumpfind, Gaussclump, or Dendrogram techniques. The goal is to build a new algorithm and tool that is available to the community and satisfies the requirements of the next Chilean Virtual Observatory (ChiVO).

Paper Febrero 2014

Chilean Virtual Observatory and Integration with ALMA

Author(s): Mauricio Solar, Walter Fariña, Diego Mardones, Jonathan Antognini, Karim Pichara, Neil Nagar, Victor Parada, Jorge Ibsen, Lars Nyman, José Marroquin

The Virtual Observatories strive to interoperate, exchange data and share services as if it was only one big VO. In this work, the state of the art of VOs will be presented and summarized in a schematic diagram with the frequency range of the observed data that every VO publishes. Chile, currently a member of the IVOA, collaborates with the Atacama Large Millimeter/submillimeter Array (ALMA), to study and propose ways to adequate the data generated by ALMA to the different data model proposed by the IVOA.

Paper Febrero 2014

Chilean Virtual Observatory services implementation for the ALMA public data

Author(s): Jonathan Antognini, Mauricio Solar, Jorge Ibsen, Mauricio Araya, Lars Nyman, Diego Mardones, Camilo Valenzuela, Patricio Ramirez, Christopher Fernandez, Mario Garces

The success of an observatory is usually measured by its impact in the scientific community, so a common objective is to provide transparent ways to access the generated data. The Chilean Virtual Observatory (ChiVO), started working in the implementation of a prototype, in collaboration with ALMA, considering the current needs of the Chilean astronomical community, in addition to the protocols and standards of IVOA, and the comparison of different existing data access toolkit services. Based on this efforts, a VO prototype was designed and implemented for the ALMA large scale of data.

Thesis

Thesis Octubre 2016

Visualización Volumétrica de Cubos de Datos Espectroscópicos

Germán Ortega

Ingeniería Civil en Informática

UTFSM

Santiago - Chile, Septiembre 2016
Profesor Guía: Mauricio Solar
Profesor Correferente: Mauricio Araya

Spectroscopic data cubes are generated by capturing electromagnetic waves and keeping record on files. Visualizing the content of these files is one big problem of modern computer science. Considering particularly the astronomy field, diverse visualization libraries were analyzed with objectives such as graphics performance, rendering speed, user interface, among others. Then, a personal application was created based on the results got in the analysis. This application made use of the Mayavi and Matplotlib libraries (both belonging to the Python language) to create all the plots and visualizations. The graphics performance and the rendering speed obtained were good enough to consider a future development for scientific fields.

Thesis Agosto 2016

Automatic Identification of Spectral Lines Through Spectrum Reconstruction

Andrés Riveros

Master of Science in Engineering

PUC

Santiago - Chile, July 2016
Advisor: Karim Pichara

Astronomy is facing new challenges on how to analyze big data and therefore, how to search or predict events/patterns of interest. New observations in previously unexplored wavelength regions will be available from instruments such as the Atacama Large Millimeter Array (ALMA). Given this growing amount of high spectral resolution data, any non-automatic analysis would be an effort beyond human’s capacity. Currently, classifying emission lines means to decide if a particular emission line belongs to a specific isotope. This classification is mainly done by comparing them with known isotopes emission lines. An automatic line-classification algorithm would dramatically reduce human efforts to analyze spectral data, allowing astronomers to focus their efforts in deeper analysis. In this work, we propose an algorithm that uses a sparse model to represent the spectra and automatically classify emission lines. We use spectral line databases to determine a set of basis vectors that represent the presence of theoretical emission lines. Then, to classify lines in a given spectrum, the difference between the spectrum and a linear combination of the determined basis vectors is minimized. The model’s output correspond to a probability vector representing the distribution of the prediction over a set of possible isotopes. We test our algorithm with experimental data from Splatalogue and simulated data from the ASYDO project. The results of the analysis show that the algorithm is able to identify emission lines with 90% accuracy when no blending nor hyperfine cases are present. As wavelength separation decreases (equal or
less than 1 MHz), accuracy goes down to 82%.Algorithm source code, synthetic data and list of suggested identifications are publicly
available.

Thesis Diciembre 2015

Detección Automática y Clasificación de Estructuras Astronómicas

Rodrigo Gregorio

Magíster en Ciencias de la Ingeniería Informática

UTFSM

Santiago - Chile, Noviembre 2015
Profesor Guía: Mauricio Solar
Profesor Correferente: Mauricio Araya

La detección, caracterización, y relaciones entre objetos representados por una variedad de escalas espaciales y propiedades quı́micas y fı́sicas ofrece oportunidades únicas para el estudio del universo y en particular usando los datos de ALMA. Por lo tanto, el desarrollo de herramientas computacionales que automaticen estas tareas es de enorme utilidad y de suma urgencia. Además, esperamos incluir estas herramienta dentro del próximo Chilean Virtual Observatory (ChiVO). La propuesta es desarrollar una herramienta para la detección y clasificación de estructuras a distintas escalas espaciales en imágenes astronómicas. Para ello primero usaremos filtros provenientes de transformada de wavelets para generar sub-imágenes a distintas escalas. La descomposición por medio de wavelet asegura que al volver a unir las sub-imágenes no se pierde información y se recopila toda la estructuras encontradas en cada una. La herramienta será aplicable al principio a imágenes en 2d, o bien a una serie de planos en imágenes 3d. Luego se le aplicarán algoritmos de detección en cada sub-imagen generando un catálogo. El catálogo incluye una clasificación inicial del tipo de objeto o estructura encontrada, y permite estudiar estadı́sticas y relaciones espaciales entre los objetos del catálogo. Esta herramienta se desarrollará en python, generando una para CASA y/o ChiVO. La validación consiste poner a disposición de la comunidad cientı́fica la herramienta y recibir los feedback, además, de generar conjuntos de prueba con datos reales para los algoritmos implementados. La herramienta creada es nueva, incluye el trabajo conjunto entre la astronomı́a y la ingenierı́a informática, y proporcionará una ayuda a la investigación cientı́fica y el desarrollo de nuevas aplicaciones en el área de la astro-informática.

Thesis Diciembre 2014

Registro y apilado de imágenes astronómicas en un plano bidimensional

Rodrigo Anibal Jara Cartagena

Ingeniero de Ejecución en Computación e Informática

USACH

Santiago - Chile, 2014
Profesor Guía: Parada Daza, Víctor - Mardones, Diego

Para almacenar, distribuir y procesar los cerca de 250 terabytes de datos anuales que generará el radiotelescopio Atacama Large Milimetric/Submilimetric Array (ALMA) se hizo necesario contar con una infraestructura tecnológica que facilitara a la comunidad científica consultar y extraer información relevante. En este escenario el Observatorio Virtual Chileno (ChiVO), constituye un hito clave para el desarrollo de la astroinformática en el país.

Este Trabajo de Titulación forma parte de un proyecto colectivo, que busca enriquecer ChiVO poniendo a disposición aplicaciones que permiten analizar datos a gran escala. En los siguientes seis capítulos se presenta una serie de algoritmos desarrollados utilizando una metodología incremental. Los métodos expuestos permiten ajustar imágenes en formato FITS, en relación a su orientación, escala, rotación, deproyección, y crear catálogos, de los que es posible extraer información mediante stacking, técnica que a la fecha no ha sido utilizada para este volumen de información.

Dentro de los resultados obtenidos es posible mencionar en primera instancia que los algoritmos de registro cumplen la función para la que fueron diseñados, en tanto el apilado de imágenes muestra resultados que potencian la información contenida en ellas, favoreciendo su posterior estudio.

Thesis Febrero 2014

Indexación de Objetos Astronómicos

Gabriel Candia

Ingeniería Civil en Informática

UTFSM

Santiago - Chile, Enero 2014
Profesor Guía: Marcelo Mendoza
Profesor Correferente: Mauricio Solar

In this paper, we present the necessary steps to make an index of astronomical objects. To do this, some tools for image processing are necessary to obtain the objects present in the images, extract some of their characteristics and then store them on a data base. This data base will let the users do spatial search with RA-DEC coordinates. This will be a new tool for the astronomers that will allow make spatial search about objects and not about images, as it’s done nowadays.

Thesis Noviembre 2013

Diseño conceptual de un Observatorio Virtual Astronómico para ALMA

Walter Rodrigo Fariña Pérez

Ingeniería Civil en Informática

UTFSM

Santiago - Chile, Noviembre 2013
Profesor Guía: Mauricio Solar
Profesor Correferente: Marcelo Mendoza

Este estudio analiza las necesidades de los astrónomos chilenos respecto a ALMA, definiendo las bases del diseño que posteriormente será el Observatorio Virtual Chileno. Se hace un estudio de necesidades, obtienen requerimientos, casos de uso y finalmente un modelo de datos, este último acorde a los estándares de IVOA, organización internacional de observatorios virtuales. Así, la comunidad astronómica chilena tendrá acceso global y homogeneizado a los datos astronómicos, posteriormente junto a herramientas para explorarlos y manipularlos.