Puppeteer - NodeJS Scraping: Unterschied zwischen den Versionen

Aus Wikizone
Wechseln zu: Navigation, Suche
Zeile 115: Zeile 115:
  
 
=== Computed Styles von DOM Elementen ===
 
=== Computed Styles von DOM Elementen ===
Hier nutzen wir mal nicht die $eval Funktion
+
Styles eine DOM Elements finden. Hier nutzen wir mal die $eval Funktion
 
<syntaxhighlight lang="javascript">
 
<syntaxhighlight lang="javascript">
 
const puppeteer = require("puppeteer");
 
const puppeteer = require("puppeteer");

Version vom 17. August 2022, 19:00 Uhr

Quickstart

https://www.youtube.com/watch?v=Sag-Hz9jJNg

Voraussetzung: VisualStudioCode, NodeJS installiert

Ordner erstellen und NodeJS Projekt starten

Terminal

npm init -y
npm install puppeteer

Installiert auch Chromium. Schau mal in die

index.js erstellen. Puppeteer laden mit asynchroner Funktion. Diese Funktion

const puppeteer = require("puppeteer");
(async () => {
}) ();

Beispiel Screenshot von Seite anfertigen:

const puppeteer = require("puppeteer");
(async () => {
  const browser = await puppeteer.launch({headless: false}) // launch can launch headless or with displaying
  const page = await browser.newPage() // open new tab in browser
  await page.goto("https://schlegel.media")
  await page.screenshot({path: "screenshot.png"})

  await browser.close()
}) ();

Starten mit

node index.js

Beispiel Skripte

Hinweis: Da die Skripte in diesem Setup keine ES Module sind, gab es bei mir Probleme in Node wenn man die Strichpunkte weglässt.

DOM Elemente scrapen mit evaluate

Zum Scrapen bietet sich die evaluate Funk

const puppeteer = require("puppeteer")
(async () => {
  const browser = await puppeteer.launch({headless: false}) // launch can launch headless or with displaying
  const page = await browser.newPage() // open new tab in browser
  await page.goto("https://schlegel.media")

  const grabSlogan = await page.evaluate( () => {
    const slogan = document.querySelector(".uk-text-lead")
    //return slogan.innerHTML // with html tags
    return slogan.innerText // only the text
  })

  console.log(grabSlogan)
  await browser.close()
}) ()

// grab multiple elements

//... wie oben
  const grabList = await page.evaluate( () => {
    const listTags = document.querySelectorAll(".uk-nav-default li")
    let listItems = []
    listTags.forEach((tag) => {
      listItems.push(tag.innerText)
    })

    return listItems
  })
  console.log(grabList)

Komplexere DOM-Zugriffe

const puppeteer = require("puppeteer");
(async () => {
  const browser = await puppeteer.launch({headless: false}); // launch can launch headless or with displaying
  const page = await browser.newPage(); // open new tab in browser
  await page.goto("https://quotes.toscrape.com/");

  const grab = await page.evaluate( () => {
    let arrElements = [];
    const quotes = document.querySelectorAll(".quote");
    quotes.forEach( (quote) => {
      const quoteSpans = quote.querySelectorAll("span");
      const quoteText = quoteSpans[0].innerHTML;
      const quoteAuthor = quoteSpans[1].querySelector("small").innerHTML;
      arrElements.push({'quote': quoteText, 'author': quoteAuthor});
    });
    return arrElements;
  });

  console.log(grab);
  await browser.close();
}) ();

User actions simulieren

const puppeteer = require("puppeteer");
(async () => {
  const browser = await puppeteer.launch({headless: false}); // launch can launch headless or with displaying
  const page = await browser.newPage(); // open new tab in browser
  await page.goto("https://quotes.toscrape.com/");

  await page.click('a[href="/login"]'); // click login link
  await page.type('#username','myUserName',{delay:300});
  await page.type('#password','mySecret');
  await page.click('input[type="submit"]');
  //await browser.close();
}) ();

Computed Styles von DOM Elementen

Styles eine DOM Elements finden. Hier nutzen wir mal die $eval Funktion

const puppeteer = require("puppeteer");
(async () => {

  const browser = await puppeteer.launch({headless: true}); // launch can launch headless or with displaying
  const page = await browser.newPage(); // open new tab in browser
  await page.goto("https://schlegel.media/");

  // get styles of element
  const myStyles = await page.$eval('body', el => getComputedStyle(el).getPropertyValue('font-family')
  );
  console.log(myStyles);

  await browser.close();
}) ();

Evaluate Version - besser zu debuggen Unterschiede in der Ausführung. Siehe: https://stackoverflow.com/questions/55664420/page-evaluate-vs-puppeteer-methods

const puppeteer = require("puppeteer");
(async () => {
  const browser = await puppeteer.launch({headless: true}); // launch can launch headless or with displaying
  const page = await browser.newPage(); // open new tab in browser
  await page.goto("https://schlegel.media/");

  // get styles of element
  const getStyles = await page.evaluate( () =>{
    const el = document.querySelector('body');
    const myStyle = getComputedStyle(el).getPropertyValue('font-family');
    return myStyle
  });
  console.log(getStyles);

  await browser.close();
}) ();