Apache Spark is a distributed computing environment enabling data analytics tasks to run on a variety of computing platforms and languages. In this talk Jonathan covers how to write a data processing program in Clojure and deploy it to a spark cluster on Kubernetes.
One of the challenges of writing distributed programs is bridging the gap between the development environment and the production cluster. These operational aspects of the developer experience are a focus of this talk, illustrating how Clojure's REPL-based, test driven development approach can be applied to spark programs.
コメント