Azkaban3.81.0：综合案例-微博数据分析-创建Hive UDF-马育民老师

被下面文章引用：

- [Azkaban3.81.0：综合案例-微博数据分析（HDFS、hive、UDF）](https://www.malaoshi.top/show_1GW2MhjbLUav.html "Azkaban3.81.0：综合案例-微博数据分析（HDFS、hive、UDF）")

- [hive3.1.x教程：案例-微博数据分析](https://www.malaoshi.top/show_1GW2Mr1A0tF8.html "hive3.1.x教程：案例-微博数据分析")

# 介绍

创建 UDF ，实现微博统计分析

# 创建maven工程

略

### pom.xml

```
<dependencies>
    
    <dependency>
        <groupId>org.apache.hadoop</groupId>
        <artifactId>hadoop-common</artifactId>
        <version>3.0.3</version>
    </dependency>
    <dependency>
        <groupId>org.apache.hadoop</groupId>
        <artifactId>hadoop-hdfs</artifactId>
        <version>3.0.3</version>
    </dependency>
    
    <dependency>
        <groupId>org.apache.hive</groupId>
        <artifactId>hive-exec</artifactId>
        <version>3.1.2</version>
    </dependency>
</dependencies>
<build>
    
    <plugins>
        <plugin>
            <groupId>org.apache.maven.plugins</groupId>
            <artifactId>maven-shade-plugin</artifactId>
            <version>2.2</version>
            <executions>
                <execution>
                    <phase>package</phase>
                    <goals>
                        <goal>shade</goal>
                    </goals>
                    <configuration>
                        <filters>
                            <filter>
                                <artifact>*:*</artifact>
                                <excludes>
                                    
                                    <exclude>META-INF/*.SF</exclude>
                                    <exclude>META-INF/*.DSA</exclude>
                                    <exclude>META-INF/*.RSA</exclude>
                                </excludes>
                            </filter>
                        </filters>
                    </configuration>
                </execution>
            </executions>
        </plugin>
    </plugins>
</build>
```

# 实现第一个UDF

参数 `num1` 为 **微博点赞人数**，参数 `num2` 为 **微博转发人数**，将 **微博点赞人数** 和 **转发人数** 相加求和，返回结果

**注意：** **包路径** 和 **类名**，下面要用

```
package top.malaoshi.weibo.udf;

import org.apache.hadoop.hive.ql.exec.UDF;

public class Sum extends UDF {
    /**
     *
     * @param num1 微博点赞人数
     * @param num2 微博转发人数
     * @return
     */
    public Integer evaluate(Integer num1,Integer num2){

if(num1 == null || num2 == null){
            return 0;
        }
        return num1 + num2;
    }
}
```

# 实现第二个UDF

统计微博内容中出现**“北京”**次数最多的用户数

```
package top.malaoshi.weibo.udf;

import org.apache.hadoop.hive.ql.exec.UDF;

public class Count extends UDF {
    /**
     * @param str 微博数据source字段内容
     * @param find 要查找的关键词“北京”
     * @return
     */
    public Integer evaluate(String str,String find){
        // 统计微博内容中出现“北京"次数最多的用户
        // 字符串`北京'计数器
        int count = 0;
        //记录当前查找字符串的下标
        int index = 0;
        //循环判断字符串是否出现 北京，并计数
        while ((index = str.indexOf(find, index)) != -1) {
            //如果出现 北京 则改变下次查找的下标
            index = index + find.length();
            count++;// 计数器加1
        }
        return count;
    }
}
```

# 打包

maven打包，如下图：

[![](https://www.malaoshi.top/upload/0/0/1GW2MFS9eNuN.png)](https://www.malaoshi.top/upload/0/0/1GW2MFS9eNuN.png)

一般使用 `original-hive_weibo-1.0-SNAPSHOT.jar` 包即可，**体积最小的jar**

**注意：**如果报错，就用 `hive_weibo-1.0-SNAPSHOT.jar` **体积大**的jar包，**包含所有的依赖**

# 上传

将 `original-hive_weibo-1.0-SNAPSHOT.jar` 上传到 `hadoop1` 服务器的 `/azkaban/weibo_sh/` 目录

原文出处：http://www.malaoshi.top/show_1GW2MhjD8got.html