Where communities thrive

  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
Repo info
Andy Petrella
for instance, the embedded integration of spark makes things much easier than adding a kernel or choosing from different runtime (livy, ...)
also the number of dedicated plotting wirdgets in the SN is complete, using bare scala with scala structures (like Seq or DF)
Henry Ashton-Martyn
Hi there, is the documentation here https://github.com/spark-notebook/spark-notebook/blob/master/docs/clusters_clouds.md missing some details about how to set the yarn conf? I'm trying to get spark-notebook connected to my existing yarn cluster but I can't see where to set properties like the resource manager location
Hi everyone
somebody know about connectino between jupyterhub + CDH Spark 2.2 ?
Hi everyone
does this work spark-notebook can be set up on windows 8 r2 OS ?
Patrick O'Rourke
My name is Patrick. I am relatively new to Spark.
I am having trouble using an inner-outer join with a UDF which would convert all NULL values to empty arrays.
case class CustomerData(
customerId: String,
forename: String,
surname: String
case class AccountData(
customerId: String,
accountId: String,
balance: Long
//Expected Output Format
case class CustomerAccountOutput(
customerId: String,
forename: String,
surname: String,
//Accounts for this customer
accounts: Seq[AccountData],
//Statistics of the accounts
numberAccounts: Int,
totalBalance: Long,
averageBalance: Double
object AccountAssignment extends App {
//Create a spark context, using a local master so Spark runs on the local machine
val spark = SparkSession.builder().master("local[]").appName("AccountAssignment").getOrCreate()
import spark.implicits._
// importing spark implicits allows functions such as dataframe.as[T]
//Set logger level to Warn
// Logger.getRootLogger.setLevel(Level.WARN)
//Get file paths from files stored in project resources
val customerCSV = getClass.getResource("/customer_data.csv").getPath
val accountCSV = getClass.getResource("/accountdata.csv").getPath
//Create DataFrames of sources
val customerDF = spark.read.option("header", "true")
val accountDF = spark.read.option("header", "true")
//Create Datasets of sources
val customerDS = customerDF.as[CustomerData]
val accountDS = accountDF.withColumn("balance", 'balance.cast("long")).as[AccountData]
// UDF to convert NULL values to empty Arrays
val array
= udf(() => Array.empty[Int])
val data = customerDS.join(accountDS, Seq("customerID"), "left_outer")
customerDS.join(accountDS, Seq("customerID"), "left_outer")
array(struct("customerId", "accountId", "balance")),
).map { r =>
Can someone help please?
I have a query how to load file from local to spark notebook docker on mac
Leo Benkel
Hello, I discovered your product today, I am having issues building the version I want:
notebook version : master
scala version : 2.11.8
spark version : 2.3.1
spark nightly : false
hadoop version : 2.7.2
with hive : false
with parquet : true
package type : docker
I tried with previous stable version but got:
Versions aren't compatible
notebook: 0.8.3
scala: 2.11.8
spark: 2.3.1
nightly: false
hadoop: 2.7.2
with hive: false
with parquet: true
package: docker
Compatibilities are
Andy Petrella
aw snap, we can fix the generator, otherwise you can give SBT a try maybe?

@/all I never did this, so I hope you don't mind, but @kensuio we're looking for a (Remote possible) Scala Software Engineer, below is the profile. If you are interested or you know someone/company that may be interested or help us, please DM me. Thanks a lot in advance everyone.

Detailed requirements for developer. language, framework, databases, cloud, and specific skills.

Scala backend engineer: Senior level in Scala, Play Framework 2. Medior level in Akka, Pac4J, Swagger/OpenAPI, OrientDB. Pluses: data processing skills, event sourcing practical knowledge, on-premises product requirements, familiar with machine learning lingua and requirements.

Job description. candidate should know his responsibilities on the project.

The candidate will join the R&D team and will be responsible for elaborating technical solution and producing enterprise-level quality features in governance and monitoring of data activities following business requirements gathered by product owner from customers.

Product or project description. general description of the product.

The candidate will work on the Data Activity Manager (DAM) core engine, a system able to collect, create and manage automatically the data lineage across data products, teams and companies. DAM enables compliance management for DPO-like (GDPR) teams and quality monitoring of data processes for CDO teams and Data Citizens (engineers and scientists) via Dashboard, Reporting and Smart-Alerting systems.

Eric K Richardson
Hi @andypetrella - there is also this place which could have more followers - https://gitter.im/scala/job-board
Kaushik Amar Das
whats the sql context called ?
spark context is sparkContext
whats the sqlContext ?
Aman Tur

Hi, I'm trying to run andypetrella/spark-notebook:0.8.3-scala-2.11.8-spark-2.2.2-hadoop-2.6.0 docker image but I'm getting following error in logs:

[error] n.s.e.Cache - Unable to set localhost. This prevents creation of a GUID. Cause was: 583140c279e6: 583140c279e6: Temporary failure in name resolution
java.net.UnknownHostException: 583140c279e6: 583140c279e6: Temporary failure in name resolution
at java.net.InetAddress.getLocalHost(InetAddress.java:1505) ~[na:1.8.0_171]
at net.sf.ehcache.Cache.<clinit>(Cache.java:214) ~[net.sf.ehcache.ehcache-core-2.6.8.jar:na]
at net.sf.ehcache.config.ConfigurationHelper.createCache(ConfigurationHelper.java:296) [net.sf.ehcache.ehcache-core-2.6.8.jar:na]
at net.sf.ehcache.config.ConfigurationHelper.createDefaultCache(ConfigurationHelper.java:219) [net.sf.ehcache.ehcache-core-2.6.8.jar:na]
at net.sf.ehcache.CacheManager.configure(CacheManager.java:722) [net.sf.ehcache.ehcache-core-2.6.8.jar:na]
Caused by: java.net.UnknownHostException: 583140c279e6: Temporary failure in name resolution
at java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method) ~[na:1.8.0_171]
at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:928) ~[na:1.8.0_171]
at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1323) ~[na:1.8.0_171]
at java.net.InetAddress.getLocalHost(InetAddress.java:1500) ~[na:1.8.0_171]
at net.sf.ehcache.Cache.<clinit>(Cache.java:214) ~[net.sf.ehcache.ehcache-core-2.6.8.jar:na]
[info] play - Application started (Prod)
[info] play - Listening for HTTP on /

I'm running with command: docker run -p 9001:9001 andypetrella/spark-notebook:0.8.3-scala-2.11.8-spark-2.2.2-hadoop-2.6.0

sahil anand
Hi , is there a sql “unify” equivalent in spark ???
Ajay Aakula
Gourab Alam
Hi Can I use Java and Nd4j with this notebook ?
I have launched the notebook with docker it is not working
this is my LOG
docker run -p 9001:9001 andypetrella/spark-notebook:0.8.3-scala-2.11.8-spark-2.2.2-hadoop-2.6.0
docker run -p 9001:9001 andypetrella/spark-notebook:0.8.3-scala-2.11.8-spark-2.2.2-hadoop-2.6.0
Play server process ID is 1
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/docker/lib/ch.qos.logback.logback-classic-1.1.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/docker/lib/org.slf4j.slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [ch.qos.logback.classic.util.ContextSelectorStaticBinder]
[info] play - Application started (Prod)
[info] play - Listening for HTTP on /
Is there a way to install spark-notebook on cloudera VM for trying out
I am looking for spark 2.3.0, Scala 2.11.8 and Hadoop 2.6.0
is anybody having trouble with some form of cell autoscrolling down slowly while working on their notebooks?
one more question: is the spark-notebook.io generator working for anyone? I'm trying to download a build that is not available yet but it says that 'it launched' the build instead, however I never get a notification email saying it's ready or that some problem occured
@longboardtard yes, that was the problem for me too. Not sure if it builds automatically
@piyushpatel2005_gitlab so there's definitely a problem with the generator. Maybe the builds are being generated for you, but there's a problem with the notification system, or god knows what
how can I generate a build myself? with hive +parquet support
hey has anyone here tried Zeppelin 0.8.2 using docker? Not able to log in;(
I found this within conf, but these do not work
# List of users with their password allowed to access Zeppelin.
# To use a different strategy (LDAP / Database / ...) check the shiro doc at http://shiro.apache.org/configuration.html#Configuration-INISections
# To enable admin user, uncomment the following line and set an appropriate password.
#admin = password1, admin
user1 = password2, role1, role2
user2 = password3, role3
user3 = password4, role2
Anton Kulaga
@phrmoy I use zeppelin 0.8.2 with docker
I have my own container for it
Hey guys. Has anyone tried to use the %dep method to import dependecies for KafkaUtils?

Hi Spark Community.

I need help with the following issue and I have been researching about it from last 2 weeks and as a last and best resource I want to ask the Spark community.


sanjay sharma

Hi, I am using explode method on one column now with the same column i am doing left join with some other table column, I am getting this error

cannot resolve 'trim(a.col1)' due to data type mismatch: argument 1 requires string type, however, 'a.col1' is of array<string> type

can anyone please let me know how I can proceed further, how I can change array type column to string type, I need to use explode method
Any help?
Arjen P. de Vries
Hi, anyone out here know the status of spark-notebook.io? @andypetrella did you seize the service, or is it just temporarily offline?
Hi All, I just have a doubt, If my job consists of a shuffle stage, and one of the task after shuffle fails, how will fault tolerance come into picture ? As the data was out of a shuffle stage, will spark reevaluate and shuffle sort again. Please suggest.
Any ideas ? tks in advance
Locally it works but while executing it on hdfs it cries