Previously, I am learning web scrapping. It struck to me that you cannot get well on web scrapping if you even don't know anything about web. So here I am.

The main reference of this journey is  JavaScript HTML DOM from W3Schools.

DOM: The W3C Document Object Model (DOM) is a platform and language-neutral interface that allows programs and scripts to dynamically access and update the content, structure, and style of a document.

Common Methods

getElementById: find the element by using its id
example:
document.getElementById("demo") or document.getElementById(id="demo")

Common Property:
innerHTML: get the content of an element

Finding HTML Elements

Method Description
document.getElementById(id) Find an element by element id
document.getElementsByTagName(name) Find elements by tag name
document.getElementsByClassName(name) Find elements by class name

Changing HTML Elements

Property Description
element.innerHTML = new html content Change the inner HTML of an element
element.attribute = new value Change the attribute value of an HTML element
element.style.property = new style Change the style of an HTML element
Method Description
element.setAttribute*(attribute, value)* Change the attribute value of an HTML element

Adding and Deleting Elements

Method Description
document.createElement(element) Create an HTML element
document.removeChild(element) Remove an HTML element
document.appendChild(element) Add an HTML element
document.replaceChild(new, old) Replace an HTML element
document.write(text) Write into the HTML output stream

Finding HTML Objects

Basically, using element.tags to retrive values.

Property Description DOM
document.anchors Returns all <a> elements that have a name attribute 1
document.applets Returns all <applet> elements (Deprecated in HTML5) 1
document.baseURI Returns the absolute base URI of the document 3
document.body Returns the <body> element 1
document.cookie Returns the document's cookie 1
document.doctype Returns the document's doctype 3
document.documentElement Returns the <html> element 3
document.documentMode Returns the mode used by the browser 3
document.documentURI Returns the URI of the document 3
document.domain Returns the domain name of the document server 1
document.domConfig Obsolete. Returns the DOM configuration 3
document.embeds Returns all <embed> elements 3
document.forms Returns all <form> elements 1
document.head Returns the <head> element 3
document.images Returns all <img> elements 1
document.implementation Returns the DOM implementation 3
document.inputEncoding Returns the document's encoding (character set) 3
document.lastModified Returns the date and time the document was updated 3
document.links Returns all <area> and <a> elements that have a href attribute 1
document.readyState Returns the (loading) status of the document 3
document.referrer Returns the URI of the referrer (the linking document) 1
document.scripts Returns all <script> elements 3
document.strictErrorChecking Returns if error checking is enforced 3
document.title Returns the <title> element 1
document.URL Returns the complete URL of the document 1

Node Relationship

source: https://www.w3schools.com/js/js_htmldom.asp
  • document node: the entire document
  • element node: Every HTML element
  • text nodes: text inside HTML
  • comment nodes: All comments

Terms

parent, child, sibling
root: top node
sibling: nodes with the same parent

Attention: An element node DOES NOT contain text

example

<title id="demo">DOM Tutorial</title>

The element node <title> (in the example above) does not contain text.

It contains a text node with the value "DOM Tutorial".

Property

nodeValue

  • nodeValue for element nodes is null
  • nodeValue for text nodes is the text itself
  • nodeValue for attribute nodes is the attribute value

nodeType
Read only. Returns type of a node.

Node Type Example
ELEMENT_NODE 1 <h1 class="heading">W3Schools</h1>
ATTRIBUTE_NODE 2 class = "heading" (deprecated)
TEXT_NODE 3 W3Schools
COMMENT_NODE 8 <!-- This is a comment -->
DOCUMENT_NODE 9 The HTML document itself (the parent of <html>)
DOCUMENT_TYPE_NODE 10 <!Doctype html>

Unfinished List

  • DOM HTML
  • DOM CSS
  • DOM Animations
  • DOM Events
  • DOM Event Listener
  • DOM Nodes
  • DOM Collections
  • DOM Node Lists

Cover Photo by Jillian Kim on Unsplash