Puppeteer - NodeJS Scraping: Unterschied zwischen den Versionen
Aus Wikizone
| Zeile 38: | Zeile 38: | ||
DOM Element auslesen | DOM Element auslesen | ||
<syntaxhighlight lang="javascript"> | <syntaxhighlight lang="javascript"> | ||
| − | const puppeteer = require("puppeteer") | + | const puppeteer = require("puppeteer") |
(async () => { | (async () => { | ||
const browser = await puppeteer.launch({headless: false}) // launch can launch headless or with displaying | const browser = await puppeteer.launch({headless: false}) // launch can launch headless or with displaying | ||
| Zeile 52: | Zeile 52: | ||
console.log(grabSlogan) | console.log(grabSlogan) | ||
await browser.close() | await browser.close() | ||
| − | }) () | + | }) () |
</syntaxhighlight> | </syntaxhighlight> | ||
Version vom 17. August 2022, 15:26 Uhr
Quickstart
https://www.youtube.com/watch?v=Sag-Hz9jJNg
Voraussetzung: VisualStudioCode, NodeJS installiert
Ordner erstellen und NodeJS Projekt starten
Terminal
npm init -y npm install puppeteer
Installiert auch Chromium. Schau mal in die
index.js erstellen. Puppeteer laden mit asynchroner Funktion. Diese Funktion
const puppeteer = require("puppeteer");
(async () => {
}) ();
Beispiel Screenshot von Seite anfertigen:
const puppeteer = require("puppeteer");
(async () => {
const browser = await puppeteer.launch({headless: false}) // launch can launch headless or with displaying
const page = await browser.newPage() // open new tab in browser
await page.goto("https://schlegel.media")
await page.screenshot({path: "screenshot.png"})
await browser.close()
}) ();
Starten mit
node index.js
Beispiel Skripte
DOM Element auslesen
const puppeteer = require("puppeteer")
(async () => {
const browser = await puppeteer.launch({headless: false}) // launch can launch headless or with displaying
const page = await browser.newPage() // open new tab in browser
await page.goto("https://schlegel.media")
const grabSlogan = await page.evaluate( () => {
const slogan = document.querySelector(".uk-text-lead")
//return slogan.innerHTML // with html tags
return slogan.innerText // only the text
})
console.log(grabSlogan)
await browser.close()
}) ()
// grab multiple elements
//... wie oben
const grabList = await page.evaluate( () => {
const listTags = document.querySelectorAll(".uk-nav-default li")
let listItems = []
listTags.forEach((tag) => {
listItems.push(tag.innerText)
})
return listItems
})
console.log(grabList)