Puppeteer - NodeJS Scraping: Unterschied zwischen den Versionen

Aus Wikizone
Wechseln zu: Navigation, Suche
Zeile 36: Zeile 36:
 
   
 
   
 
== Beispiel Skripte ==
 
== Beispiel Skripte ==
 +
DOM Element auslesen
 
<syntaxhighlight lang="javascript">
 
<syntaxhighlight lang="javascript">
 +
const puppeteer = require("puppeteer");
 +
(async () => {
 +
  const browser = await puppeteer.launch({headless: false}) // launch can launch headless or with displaying
 +
  const page = await browser.newPage() // open new tab in browser
 +
  await page.goto("https://schlegel.media")
 +
 +
  const grabSlogan = await page.evaluate( () => {
 +
    const slogan = document.querySelector(".uk-text-lead")
 +
    return slogan.innerHTML
 +
  })
 +
 +
  console.log(grabSlogan)
 +
  await browser.close()
 +
}) ();
 
</syntaxhighlight>
 
</syntaxhighlight>
  

Version vom 17. August 2022, 14:47 Uhr

Quickstart

https://www.youtube.com/watch?v=Sag-Hz9jJNg

Voraussetzung: VisualStudioCode, NodeJS installiert

Ordner erstellen und NodeJS Projekt starten

Terminal

npm init -y
npm install puppeteer

Installiert auch Chromium. Schau mal in die

index.js erstellen. Puppeteer laden mit asynchroner Funktion. Diese Funktion

const puppeteer = require("puppeteer");
(async () => {
}) ();

Beispiel Screenshot von Seite anfertigen:

const puppeteer = require("puppeteer");
(async () => {
  const browser = await puppeteer.launch({headless: false}) // launch can launch headless or with displaying
  const page = await browser.newPage() // open new tab in browser
  await page.goto("https://schlegel.media")
  await page.screenshot({path: "screenshot.png"})

  await browser.close()
}) ();

Starten mit

node index.js

Beispiel Skripte

DOM Element auslesen

const puppeteer = require("puppeteer");
(async () => {
  const browser = await puppeteer.launch({headless: false}) // launch can launch headless or with displaying
  const page = await browser.newPage() // open new tab in browser
  await page.goto("https://schlegel.media")

  const grabSlogan = await page.evaluate( () => {
    const slogan = document.querySelector(".uk-text-lead")
    return slogan.innerHTML
  })

  console.log(grabSlogan)
  await browser.close()
}) ();