Visit complete MongoDB roadmap

← Back to Topics List


The Spark Connector is a powerful integration tool that allows you to use MongoDB as a data source for your Spark applications. This connector provides seamless integration of the robustness and scalability of MongoDB with the computational power of the Apache Spark framework, allowing you to process large volumes of data quickly and efficiently.

Key Features

  • MongoDB as Data Source: The connector enables loading data from MongoDB into Spark data structures like DataFrames and Datasets.
  • Filter Pushdown: It optimizes performance by pushing down supported filters to execute directly on MongoDB, returning only the relevant data to Spark.
  • Aggregation Pipeline: The connector allows you to execute MongoDB’s aggregation pipeline within Spark, for efficient and powerful transformations.


To start using the Spark Connector for MongoDB, you simply need to add the Maven dependency to your build.sbt or pom.xml file:

For SBT:

libraryDependencies += "org.mongodb.spark" %% "mongo-spark-connector" % "3.0.1"

For Maven:



Here’s a basic example of how to work with the MongoDB Spark Connector:

import org.apache.spark.sql.SparkSession
import com.mongodb.spark.MongoSpark

object MongoDBwithSpark {
  def main(args: Array[String]): Unit = {
    val spark = SparkSession.builder()
      .appName("MongoDB Integration")
      .config("spark.mongodb.input.uri", "mongodb://username:password@host/database.collection")
      .config("spark.mongodb.output.uri", "mongodb://username:password@host/database.collection")

    // Load data from MongoDB into a DataFrame
    val df = MongoSpark.load(spark)

    // Perform operations on DataFrame
    // ...

    // Write the DataFrame back to MongoDB"overwrite"))

    // Stop the Spark session

With the MongoDB Spark Connector, you can leverage the power of Apache Spark to analyze and process your data, making it easier to develop analytics solutions and handle complex data processing tasks.

For more details, check the official documentation.

Community is the 6th most starred project on GitHub and is visited by hundreds of thousands of developers every month.

Roadmaps Best Practices Guides Videos Store YouTube by Kamran Ahmed

Community created roadmaps, articles, resources and journeys to help you choose your path and grow in your career.

© · FAQs · Terms · Privacy


The leading DevOps resource for Kubernetes, cloud-native computing, and the latest in at-scale development, deployment, and management.