Apache Spark

Apache Spark is an open-source cluster computing framework originally developed in the AMPLab at UC Berkeley. In contrast to Hadoop’s two-stage disk-based MapReduce paradigm, Spark’s in-memory primitives provide performance up to 100 times faster for certain applications. By allowing user programs to load data into a cluster’s memory and query it repeatedly, Spark is well-suited to machine learning algorithms.

Additional resources

Apache Spark: 3 Real-World Use Cases

ClearStory was one of Databricks first customers, and today relies on the Spark technology as one of the core underpinnings of its interactive, real-time product. “Honestly if it weren’t for Spark we would have very likely built something like this ourselves,” ClearStory founder Vaibhav Nivargi says in an interview with Databricks co-founder Reynold Xin.

Read more...

Apache Spark: 3 Promising Use-Cases

Spark is the shiny new thing in big data, but how will it stand out? Here’s a look at “fog computing,” cloud computing, and streaming data-analysis scenarios.

Read more...