Get element text, including alt text for images, with JavaScript

Sometimes I find myself wanting to get the text contents of an element and its descendants. There is a DOM method called textContent that can be used for this. There is also jQuery’s text() method. Unfortunately neither method returns what I want.

In both cases, elements that can have alt attributes are omitted from the returned string. In my opinion, alt text is the text content of an img, input[type=image] or area element and should be returned by methods like these. I also find it a bit weird that they return the contents of script elements.

Not having any luck finding a method that includes alternative text and omits script elements when getting text content, I wrote my own:

var getElementText = function(el) {
	var text = '';
	// Text node (3) or CDATA node (4) - return its text
	if ( (el.nodeType === 3) || (el.nodeType === 4) ) {
		text = el.nodeValue;
	// If node is an element (1) and an img, input[type=image], or area element, return its alt text
	} else if ( (el.nodeType === 1) && (
			(el.tagName.toLowerCase() == 'img') ||
			(el.tagName.toLowerCase() == 'area') ||
			((el.tagName.toLowerCase() == 'input') && el.getAttribute('type') && (el.getAttribute('type').toLowerCase() == 'image'))
			) ) {
		text = el.getAttribute('alt') || '';
	// Traverse children unless this is a script or style element
	} else if ( (el.nodeType === 1) && !el.tagName.match(/^(script|style)$/i) ) {
		var children = el.childNodes;
		for (var i = 0, l = children.length; i < l; i++) {
			text += getElementText(children[i]);
		}
	}
	return text;
};

It expects the argument to be a reference to an element.

To get the text contents of an entire document, you’d call it like this:

var bodyText = getElementText(document.body);

There could be better ways of doing this, of course, but none that I have been able to find.

Posted on May 16, 2011 in JavaScript