Apify：使用正则表达式匹配包含特定关键字的 URL_程序开发

Apify：使用正则表达式匹配包含特定关键字的 URL

创始人

2024-09-07 16:00:12

0次

在使用Apify框架时，可以使用正则表达式来匹配包含特定关键字的URL。下面是一个示例代码，演示了如何使用Apify来实现这个功能：

const Apify = require('apify');

Apify.main(async () => {
    // 创建一个新的请求队列
    const requestQueue = await Apify.openRequestQueue();

    // 添加起始URL到队列中
    await requestQueue.addRequest({ url: 'https://example.com/page1' });

    // 创建一个新的Crawler实例
    const crawler = new Apify.CheerioCrawler({
        requestQueue,
        handlePageFunction: async ({ request, body, $ }) => {
            // 使用正则表达式匹配包含特定关键字的URL
            const keywordRegex = /example/i;
            const matchingUrls = $('a[href]').filter((i, el) => keywordRegex.test($(el).attr('href')));

            // 打印匹配的URL
            matchingUrls.each((i, el) => console.log($(el).attr('href')));

            // 将匹配的URL添加到请求队列中
            matchingUrls.each((i, el) => requestQueue.addRequest({ url: $(el).attr('href') }));
        }
    });

    // 启动爬取过程
    await crawler.run();
});

在上述代码中，我们创建了一个CheerioCrawler实例，用于解析和处理HTML页面。在handlePageFunction中，我们使用Cheerio来选择包含链接的元素，并使用正则表达式来匹配包含特定关键字的URL。

如果匹配成功，我们打印匹配的URL，并将其添加到请求队列中，以便后续的爬取过程。

上一篇：APIFY中的scrapeAndClick函数

下一篇：APIGateway postToConnection在lambda中无法工作

Apify：使用正则表达式匹配包含特定关键字的 URL

相关内容

热门资讯