CURL

CURL

The namespace for all CURL/Puppeteer utility functions.

Source:

Methods

(static) addMarginsToScreenshot(arg0_screenshot_buffer, arg1_options) → {Buffer}

Adds margins to a screenshot using an HTML5 canvas.

Source:
Parameters:
Name Type Description
arg0_screenshot_buffer Object

The current screenshot buffer.

arg1_options Object
Name Type Attributes Default Description
height number
width number
margin_bottom number <optional>
0
margin_left number <optional>
0
margin_right number <optional>
0
margin_top number <optional>
0
Returns:
Type:
Buffer

(static) generatePlaintext(arg0_options) → {string}

Generates a plaintext string dump from specific scraping rules.

Source:
Parameters:
Name Type Description
arg0_options Object
Name Type Attributes Default Description
cache boolean <optional>
false

Whether to cache the current dump.

cache_folder string <optional>
'./cache/'
cache_prefix string <optional>
''
scrape_urls Array.<Object>
Returns:
Type:
string

(static) generatePlaintextRecursively(arg0_options) → {string}

Helper function for generatePlaintext() to handle recursion and DOM traversal.

Source:
Parameters:
Name Type Description
arg0_options Object
Returns:
Type:
string

(static) getPlaintextFromSelectors(arg0_url, arg1_selectors) → {string}

Fetches plaintext from specific CSS selectors on a URL.

Source:
Parameters:
Name Type Description
arg0_url string
arg1_selectors string | Array.<string>
Returns:
Type:
string

(static) getWebsiteHTML(arg0_url) → {string}

Fetches the HTML content of a website, falling back to Puppeteer if needed.

Source:
Parameters:
Name Type Description
arg0_url string
Returns:
Type:
string

Fetches all anchor links from a URL with filtering options.

Source:
Parameters:
Name Type Attributes Description
arg0_url string
arg1_options Object <optional>
Name Type Attributes Default Description
allowed_domains Array.<string> <optional>
exclude_domains Array.<string> <optional>
attempts number <optional>
1
max_attempts number <optional>
15
Returns:
Type:
Array.<string>

(static) getWebsitePlaintext(arg0_url) → {string}

Fetches and returns a stripped plaintext version of a website.

Source:
Parameters:
Name Type Description
arg0_url string
Returns:
Type:
string

(static) stripHTML(arg0_html) → {string}

Strips HTML tags and excessive whitespace from a string.

Source:
Parameters:
Name Type Description
arg0_html string
Returns:
Type:
string

(static) writeTextFile(arg0_filepath, arg1_text)

Writes a text file to a specified path.

Source:
Parameters:
Name Type Description
arg0_filepath string
arg1_text string