Get element text, including alt text for images, with JavaScript
Sometimes I find myself wanting to get the text contents of an element and its descendants. There is a DOM method called textContent that can be used for this. There is also jQuery’s text() method. Unfortunately neither method returns what I want.
In both cases, elements that can have alt
attributes are omitted from the returned string. In my opinion, alt text is the text content of an img
, input[type=image]
or area
element and should be returned by methods like these. I also find it a bit weird that they return the contents of script
elements.
Not having any luck finding a method that includes alternative text and omits script
elements when getting text content, I wrote my own:
var getElementText = function(el) {
var text = '';
// Text node (3) or CDATA node (4) - return its text
if ( (el.nodeType === 3) || (el.nodeType === 4) ) {
text = el.nodeValue;
// If node is an element (1) and an img, input[type=image], or area element, return its alt text
} else if ( (el.nodeType === 1) && (
(el.tagName.toLowerCase() == 'img') ||
(el.tagName.toLowerCase() == 'area') ||
((el.tagName.toLowerCase() == 'input') && el.getAttribute('type') && (el.getAttribute('type').toLowerCase() == 'image'))
) ) {
text = el.getAttribute('alt') || '';
// Traverse children unless this is a script or style element
} else if ( (el.nodeType === 1) && !el.tagName.match(/^(script|style)$/i) ) {
var children = el.childNodes;
for (var i = 0, l = children.length; i < l; i++) {
text += getElementText(children[i]);
}
}
return text;
};
It expects the argument to be a reference to an element.
To get the text contents of an entire document, you’d call it like this:
var bodyText = getElementText(document.body);
There could be better ways of doing this, of course, but none that I have been able to find.