如果AWS Glue Pyspark转换过滤器API无法工作,可以尝试以下解决方法:
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.transforms import Filter
from pyspark.sql import SparkSession
spark = SparkSession.builder\
.appName("Glue")\
.getOrCreate()
glueContext = GlueContext(spark)
dynamic_frame = glueContext.create_dynamic_frame.from_catalog(database = "database_name", table_name = "table_name")
filtered_frame = Filter.apply(frame = dynamic_frame, f = lambda x: x["column_name"] == "desired_value")
在lambda函数中,可以根据需要更改过滤条件。
data_frame = filtered_frame.toDF()
data_frame.show()
确保将数据库名称、表名称、列名称和期望值更改为实际值。