可以利用 Hive 的函数和语句来编写查询以计算每个属性中的缺失值数。假设我们有一个表名为 “my_table”,其中包含多个属性列。以下是一些可能的查询:
SELECT
COUNT(CASE WHEN attr1 IS NULL THEN 1 ELSE null END) AS attr1_missing,
COUNT(CASE WHEN attr2 IS NULL THEN 1 ELSE null END) AS attr2_missing,
COUNT(CASE WHEN attr3 IS NULL THEN 1 ELSE null END) AS attr3_missing
FROM
my_table;
SELECT
COUNT(CASE WHEN attr1 IS NULL THEN 1 ELSE null END) +
COUNT(CASE WHEN attr2 IS NULL THEN 1 ELSE null END) +
COUNT(CASE WHEN attr3 IS NULL THEN 1 ELSE null END) AS total_missing
FROM
my_table;
SELECT
AVG(CASE WHEN attr1 IS NULL THEN 1.0 ELSE 0.0 END) * 100 AS attr1_missing_pct,
AVG(CASE WHEN attr2 IS NULL THEN 1.0 ELSE 0.0 END) * 100 AS attr2_missing_pct,
AVG(CASE WHEN attr3 IS NULL THEN 1.0 ELSE 0.0 END) * 100 AS attr3_missing_pct
FROM
my_table;
以上就是一些常见的 Hive 查询方法,用于计算表中属性列的缺失值数或百分比。
下一篇:编写宏来创建匹配分支