spark3.0教程:统计10万人的平均年龄(Java) 作者:马育民 • 2025-12-09 09:19 • 阅读:10005 # 介绍 本例子是使用 Java 代码实现统计10万人的平均年龄,代码写法与Scala略有不同,但套路一样 # 创建工程 详见:[spark3.0教程:第一个程序-WordCount实现](https://www.malaoshi.top/show_1IX1I8nV4jl2.html "spark3.0教程:第一个程序-WordCount实现") # Java 代码 ### 生成测试数据 ``` package top.malaoshi.spark.age; import java.io.File; import java.io.FileWriter; import java.io.IOException; import java.util.Random; public class DataFileGenerator { private final static String[] NAMES = {"李雷","韩梅梅","lucy","lili","张三","李四","王五","杜子腾","史珍香","张吉惟","林国瑞","林玟书","林雅南","江奕云","刘柏宏","阮建安","林子帆","夏志豪","吉茹定","李中冰","黄文隆","谢彦文","傅智翔","洪振霞","刘姿婷","荣姿康","吕致盈","方一强","黎芸贵","郑伊雯","雷进宝","吴美隆","吴心真","王美珠","郭芳天","李雅惠","陈文婷","曹敏侑","王依婷","陈婉璇","吴美玉","蔡依婷","郑昌梦","林家纶","黄丽昆","李育泉","黄芸欢","吴韵如","李肇芬","卢木仲","李成白","方兆玉","刘翊惠","丁汉臻","吴佳瑞","舒绿珮","周白芷","张姿妤","张虹伦","周琼玟","倪怡芳","郭贵妃","杨佩芳","黄文旺","黄盛玫","郑丽青","许智云","张孟涵","李小爱","王恩龙","朱政廷","邓诗涵","陈政倩","吴俊伯","阮馨学","翁惠珠","吴思翰","林佩玲","邓海来","陈翊依","李建智","武淑芬","金雅琪","赖怡宜","黄育霖","张仪湖"}; public static void main(String[] args){ File file = new File("data/DataFile.txt"); // 生成文件存放路径 try { FileWriter fileWriter = new FileWriter(file); Random rand = new Random(); int namesLen = NAMES.length; for (int i=1;i<=100000;i++){ String name = NAMES[rand.nextInt(namesLen)]; fileWriter.write(name +" " + (rand.nextInt(100)+1)); fileWriter.write(System.getProperty("line.separator")); } fileWriter.flush(); fileWriter.close(); System.out.println("output success"); }catch(IOException e){ e.printStackTrace(); } } } ``` 在工程下生成 `data/DataFile.txt` 文件,内容如下(第二列是年龄): ``` 周琼玟 51 武淑芬 59 谢彦文 52 江奕云 54 林国瑞 34 ``` ### 统计 ``` package top.malaoshi.spark.age; import org.apache.spark.SparkConf; import org.apache.spark.api.java.JavaRDD; import org.apache.spark.api.java.JavaSparkContext; import org.apache.spark.api.java.function.FlatMapFunction; import org.apache.spark.api.java.function.Function; import org.apache.spark.api.java.function.Function2; import java.util.Arrays; import java.util.Iterator; public class AvgAgeCalculator2 { public static void main(String[] args){ SparkConf sparkConf = new SparkConf().setAppName("AvgAgeCalculator").setMaster("local[*]"); JavaSparkContext sc = new JavaSparkContext(sparkConf); //读取文件 JavaRDD dataFile = sc.textFile("data/DataFile.txt"); //将每行数据,根据空格拆分,取第二个元素(年龄),转换int类型 JavaRDD ageData = dataFile.map(new Function() { public Integer call(String s) throws Exception { return Integer.parseInt(s.split(" ")[1]); } }); // System.out.println(ageData.collect()); // 查看算子 ageData 中的数据 // //求出人数 long count = ageData.count(); // 求出年龄的和 Integer totalAge = ageData.reduce(new Function2() { @Override public Integer call(Integer v1, Integer v2) throws Exception { return v1 + v2; } }); //平均值结果为double类型 Double avgAge = totalAge.doubleValue()/count; System.out.println("totalAge:" + totalAge ); System.out.println("count:" + count ); System.out.println("avg:" + avgAge); } } ``` 执行结果: ``` totalAge:5056261 // 年龄和 count:100000 // 总人数 avg:50.56261 // 平均年龄 ``` 原文出处:http://www.malaoshi.top/show_1GW2NFGMrVlg.html