how to run JavaWordCount in Spark

Created by Jerry Wang, last modified on Aug 17, 2015

The general steps could be found in this link: http://stackoverflow.com/questions/22252534/how-to-run-a-spark-java-program-from-command-line

  1. mkdir example-java-build/; cd example-java-build
  2. mvn archetype:generate
    -DarchetypeGroupId=org.apache.maven.archetypes
    -DgroupId=spark.examples
    -DartifactId=JavaWordCount \ – 对应生成的project folder name
    -Dfilter=org.apache.maven.archetypes:maven-archetype-quickstart
    clipboard1

below is my pom.xml:

 <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
     <modelVersion>4.0.0</modelVersion>
     <groupId>spark.examples</groupId> --- 和命令行里指定的groupid 一致
     <artifactId>JavaWordCount</artifactId>--- 和命令行里指定的groupid 一致
     <packaging>jar</packaging>
     <version>1</version>
     <name>JavaWordCount</name>
     <url>http://maven.apache.org</url>
    <dependencies>
      <dependency>
        <groupId>junit</groupId>
        <artifactId>junit</artifactId>
        <version>3.8.1</version>
        <scope>test</scope>
    </dependency>
    <dependency>
                <groupId>org.apache.spark</groupId>
                        <artifactId>spark-examples_2.10</artifactId>
                                <version>1.1.0</version>
                            </dependency>
    <dependency>
                <groupId>org.apache.spark</groupId>
                        <artifactId>spark-core_2.10</artifactId>
                                <version>1.4.1</version>
                            </dependency>
    </dependencies>
  </project>
```xml
3. cd example-java-build/JavaWordCount
mvn package
This creates your fat jar file inside the target directory. 
![clipboard2](https://user-images.githubusercontent.com/5669954/28005843-f8c64808-654c-11e7-8a72-bf61d78e15bd.png)

在classes folder里有零散的.class file:
![clipboard3](https://user-images.githubusercontent.com/5669954/28005849-fd29e0e4-654c-11e7-8e25-8563f2219e18.png)

Copy the jar file to any location on the server. Go to the your bin folder of your spark. 
  
Submit spark job: ./spark-submit --class "org.apache.spark.examples.JavaWordCount" --master local /root/devExpert/spark-1.4.1/example-java- build/JavaWordCount/target/JavaWordCount-1.jar
 
use jd.exe to open the compiled java class, make sure the value specified by --class equals to the complate name of class,
 
in my example it is org.apache.spark.examples.JavaWordCount. Or else you will meet with java.lang.ClassNotFoundException.
![clipboard4](https://user-images.githubusercontent.com/5669954/28005858-0459ae6c-654d-11e7-9c61-d61b36d20334.png)

4. ./spark-submit --class "org.apache.spark.examples.JavaWordCount" --master local /root/devExpert/spark-1.4.1/example-java-build/JavaWordCount/target/JavaWordCount-1.jar /root/devExpert/spark-1.4.1/bin/test.txt
-debug: sh -x ./spark-submit --class "org.apache.spark.examples.JavaWordCount" --master local /root/devExpert/spark-1.4.1/example-java-build/JavaWordCount/target/JavaWordCount-1.jar /root/devExpert/spark-1.4.1/bin/test.txt
等价于:/usr/jdk1.7.0_79/bin/java -cp /root/devExpert/spark-1.4.1/conf/:/root/devExpert/spark-1.4.1/assembly/target/scala-2.10/spark-assembly-1.4.1-hadoop2.4.0.jar:/root/devExpert/spark-1.4.1/lib_managed/jars/datanucleus-rdbms-3.2.9.jar:/root/devExpert/spark-1.4.1/lib_managed/jars/datanucleus-core-3.2.10.jar:/root/devExpert/spark-1.4.1/lib_managed/jars/datanucleus-api-jdo-3.2.6.jar -Xms512m -Xmx512m -XX:MaxPermSize=256m org.apache.spark.deploy.SparkSubmit --master local --class org.apache.spark.examples.JavaWordCount /root/devExpert/spark-1.4.1/example-java-build/JavaWordCount/target/JavaWordCount-1.jar /root/devExpert/spark-1.4.1/bin/test.txt

-cp 和 -classpath 一样,是指定类运行所依赖其他类的路径,通常是类库,jar包之类,需要全路径到jar包,window上分号“;”  
  
分隔,linux上是分号“:”分隔。不支持通配符,需要列出所有jar包,用一点“.”代表当前路径。 
output:
![clipboard6](https://user-images.githubusercontent.com/5669954/28005861-095700e0-654d-11e7-86da-e3b08feda93b.png)

展开阅读全文
©️2020 CSDN 皮肤主题: 深蓝海洋 设计师: CSDN官方博客 返回首页
实付0元
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、C币套餐、付费专栏及课程。

余额充值