Puppeteer - NodeJS Scraping: Unterschied zwischen den Versionen

Aus Wikizone
Wechseln zu: Navigation, Suche
Zeile 38: Zeile 38:
 
DOM Element auslesen
 
DOM Element auslesen
 
<syntaxhighlight lang="javascript">
 
<syntaxhighlight lang="javascript">
const puppeteer = require("puppeteer");
+
const puppeteer = require("puppeteer")
 
(async () => {
 
(async () => {
 
   const browser = await puppeteer.launch({headless: false}) // launch can launch headless or with displaying
 
   const browser = await puppeteer.launch({headless: false}) // launch can launch headless or with displaying
Zeile 52: Zeile 52:
 
   console.log(grabSlogan)
 
   console.log(grabSlogan)
 
   await browser.close()
 
   await browser.close()
}) ();
+
}) ()
 
</syntaxhighlight>
 
</syntaxhighlight>
  

Version vom 17. August 2022, 15:26 Uhr

Quickstart

https://www.youtube.com/watch?v=Sag-Hz9jJNg

Voraussetzung: VisualStudioCode, NodeJS installiert

Ordner erstellen und NodeJS Projekt starten

Terminal

npm init -y
npm install puppeteer

Installiert auch Chromium. Schau mal in die

index.js erstellen. Puppeteer laden mit asynchroner Funktion. Diese Funktion

const puppeteer = require("puppeteer");
(async () => {
}) ();

Beispiel Screenshot von Seite anfertigen:

const puppeteer = require("puppeteer");
(async () => {
  const browser = await puppeteer.launch({headless: false}) // launch can launch headless or with displaying
  const page = await browser.newPage() // open new tab in browser
  await page.goto("https://schlegel.media")
  await page.screenshot({path: "screenshot.png"})

  await browser.close()
}) ();

Starten mit

node index.js

Beispiel Skripte

DOM Element auslesen

const puppeteer = require("puppeteer")
(async () => {
  const browser = await puppeteer.launch({headless: false}) // launch can launch headless or with displaying
  const page = await browser.newPage() // open new tab in browser
  await page.goto("https://schlegel.media")

  const grabSlogan = await page.evaluate( () => {
    const slogan = document.querySelector(".uk-text-lead")
    //return slogan.innerHTML // with html tags
    return slogan.innerText // only the text
  })

  console.log(grabSlogan)
  await browser.close()
}) ()

// grab multiple elements

//... wie oben
  const grabList = await page.evaluate( () => {
    const listTags = document.querySelectorAll(".uk-nav-default li")
    let listItems = []
    listTags.forEach((tag) => {
      listItems.push(tag.innerText)
    })

    return listItems
  })
  console.log(grabList)