Spark and combining different modules, or Big Data's present
The BBVA Innovation Center hosted this technical event about the Spark framework, natural heir to Hadoop.
Jorge López-Malla, Big Data Architect at Stratio, Architect at Stratio, was During an extended and highly technical presentation, attendees at the BBVA Innovation Center (Plaza Santa Bárbara, Madrid) saw a little bit of history unfold before their eyes.
The concept of Big Data was first used in paper by Google on the processing of distributed files published in 2003. However, this concept became unstoppable in 2006 with the advent of Hadoop. Supported by Yahoo!, Hadoop was the base for the first Big Data operations.
Spark is more recent and was developed in response to a more advanced perspective by developers of how to meet the market's demands
For this reason, López-Malla believes that Spark is an “evolution of Hadoop and its paradigm.” However, the difference from its predecessor is much simpler. When Hadoop's core was improved, not all of the modules or "legs" which it supported benefited from the improvement. Spark changes this radically. Programmers now benefit from the fact that Spark has a single API for everything. “This is no longer the future, this is the present of Big Data.”
López-Malla explained three of Spark's most popular modules: Spark SQL, Spark Streaming and MILib (for real-time data processing).
For the full presentation and the Q&A round at the end, see the video below.