读取和写入文本文件

text 格式将文本文件的每一行读取为 DataFrame 中的一行，该行包含一个名为 value、类型为 StringType 的列。 Azure Databricks用户通常用于日志分析、在进一步处理之前引入原始数据，或者需要逐行访问文件内容的任何工作流。 Azure Databricks支持使用 Apache Spark 读取和写入文本文件，包括写入压缩。

先决条件

Azure Databricks不需要其他配置才能使用文本文件。但是，若要流式传输文本文件，需要自动加载程序。

选项

使用 DataFrameReader 和 DataFrameWriter 的 .option() 和 .options() 方法来配置文本数据源。有关支持选项的完整列表，请参阅 DataFrameReader 文本选项和 DataFrameWriter 文本选项。

Usage

以下示例使用 Wanderbricks 数据集演示如何使用 Spark 数据帧 API 和 SQL 读取和写入文本文件。

使用 SQL 读取文本文件

若要在不注册表的情况下查询文本文件，请使用 read_files。外部位置上的 Unity Catalog 权限会自动生效。

SELECT * FROM read_files(
  '/Volumes/<catalog>/<schema>/<volume>/review_comments',
  format => 'text'
)

读取和写入文本文件

text 格式要求使用仅包含一列 StringType 的 DataFrame。以下示例将 Wanderbricks 审阅批注写为文本文件，然后将其读回。

Python

from pyspark.sql.functions import col

# Write wanderbricks review comments as a text file
df = spark.read.table("samples.wanderbricks.reviews").select(col("comment").alias("value"))
df.write.format("text").save("/Volumes/<catalog>/<schema>/<volume>/review_comments")

# Read a text file — each line becomes a row in the "value" column
df = spark.read.format("text").load("/Volumes/<catalog>/<schema>/<volume>/review_comments")
display(df)

Scala

import org.apache.spark.sql.functions.col

// Write wanderbricks review comments as a text file
val df = spark.read.table("samples.wanderbricks.reviews").select(col("comment").alias("value"))
df.write.format("text").save("/Volumes/<catalog>/<schema>/<volume>/review_comments")

// Read a text file — each line becomes a row in the "value" column
val text = spark.read.format("text").load("/Volumes/<catalog>/<schema>/<volume>/review_comments")
text.show()

其他资源

读取和写入 CSV 文件：如果文本数据是分隔符还是表格格式，则 CSV 提供结构化分析，并提供架构推理、标头支持和可配置分隔符。

反馈

此页面是否有帮助？

Last updated on 2026-06-24