Skip to main content
Ben Nadel at InVision In Real Life (IRL) 2018 (Hollywood, CA) with: Cristoffer Gallardo
Ben Nadel at InVision In Real Life (IRL) 2018 (Hollywood, CA) with: Cristoffer Gallardo

jSoup Error: Index Out Of Bounds For Length

By
Published in Comments (6)

Over on my Feature Flags Book site, I'm starting to move some of the content behind a pay-wall; and, to do this, I'm using jSoup to replace multiple content paragraphs with a single purchase notice paragraph within designated chapters. However, in my first approach to this algorithm, I was getting the following jSoup error:

Index 1 out of bounds for length 0

The error isn't terribly helpful; but, I believe what's happening here is that when I remove an element from the jSoup DOM (Document Object Model) using an .empty() call, jSoup is not breaking the parent-child relationship to the removed elements. Which is then causing an issue when I go to re-append the removed elements back into the same parent.

I can reproduce this error with a simple jSoup demo using this HTML document:

<body>
	<p>jSoup + ColdFusion = Noice!</p>
</body>

To reproduce the error with ColdFusion (Lucee CFML), I'm going to .empty() the body and then re-append the single p element:

<cfscript>

	body = javaNew( "org.jsoup.Jsoup" )
		.parseBodyFragment( fileRead( "./content.htm" ) )
		.body()
	;

	paragraph = body.firstElementChild();

	// Remove all the children from the BODY and then try to re-add the paragraph.
	body
		.empty()
		.appendChild( paragraph )
	;

	// Output resultant HTML to the page.
	echo( body.outerHtml() );

	// ------------------------------------------------------------------------------- //
	// ------------------------------------------------------------------------------- //

	/**
	* I create a new Java class wrapper using the jSoup JAR files.
	*/
	public any function javaNew( required string className ) {

		var jarPaths = [
			expandPath( "./jsoup-1.16.1.jar" )
		];

		return( createObject( "java", className, jarPaths ) );

	}

</cfscript>

And, when we run this ColdFusion code, we get the following error:

Index 1 out of bounds for length 0

For anyone Googling to get here, this is the stacktrace that I get:

lucee.runtime.exp.NativeException: Index 1 out of bounds for length 0
  at java.base/jdk.internal.util.Preconditions.outOfBounds(Preconditions.java:64)
  at java.base/jdk.internal.util.Preconditions.outOfBoundsCheckIndex(Preconditions.java:70)
  at java.base/jdk.internal.util.Preconditions.checkIndex(Preconditions.java:248)
  at java.base/java.util.Objects.checkIndex(Objects.java:372)
  at java.base/java.util.ArrayList.remove(ArrayList.java:536)
  at org.jsoup.helper.ChangeNotifyingArrayList.remove(ChangeNotifyingArrayList.java:37)
  at org.jsoup.nodes.Node.removeChild(Node.java:504)
  at org.jsoup.nodes.Node.setParentNode(Node.java:482)
  at org.jsoup.nodes.Node.reparentChild(Node.java:563)
  at org.jsoup.nodes.Element.appendChild(Element.java:577)

To fix this error, we need to call .remove() on the p element before we try to re-append it to the body:

<cfscript>

	body = javaNew( "org.jsoup.Jsoup" )
		.parseBodyFragment( fileRead( "./content.htm" ) )
		.body()
	;

	paragraph = body.firstElementChild();

	// In order to re-append the paragraph back into the document, we have to first BREAK
	// THE PARENT RELATIONSHIP to the body. We can do that by calling removing() on the
	// paragraph itself.
	paragraph.remove();
	// Remove all the children from the BODY and then try to re-add the paragraph.
	body
		.empty() // Remove any remaining non-element nodes (ex, comments).
		.appendChild( paragraph )
	;

	// Output resultant HTML to the page.
	echo( body.outerHtml() );

	// ------------------------------------------------------------------------------- //
	// ------------------------------------------------------------------------------- //

	/**
	* I create a new Java class wrapper using the jSoup JAR files.
	*/
	public any function javaNew( required string className ) {

		var jarPaths = [
			expandPath( "./jsoup-1.16.1.jar" )
		];

		return( createObject( "java", className, jarPaths ) );

	}

</cfscript>

The only difference in this version of the code is that I'm calling paragraph.remove() before adding the node back into the DOM. Whatever this is doing behind the scenes, it is properly breaking the parent-child relationship in a way that calling .empty() does not.

ASIDE: Some jSoup methods, like .children(), return an Array of Element nodes called Elements. This array has its own .remove() method that will call .remove() on all of the nodes in the collection.

I don't know enough about jSoup — or the intention of these methods — in order to call this a "bug"; but, I will say that it seems unexpected to me. In fact, I would expect an .empty() method to be little more than a short-hand implementation for looping over all the child-nodes and calling .remove() on them in turn.

Want to use code from this post? Check out the license.

Reader Comments

1 Comments

Thanks Ben, good catch! I have fixed this in jsoup and it'll be in the next release (1.16.2).

See bug #2013.

Please do feel free to raise issues directly on the jsoup tracker -- whether it's a hardline "bug" or just a rough edge, am always happy for feedback.

15,880 Comments

@Jonathan,

Wow, thanks for knocking that out! 🔥 As I was looking at the stacktrace, I saw a number of core Java calls, so I wasn't sure if this was something in jSoup itself, or something in the way Java's ArrayList worked. Glad to see it was only simple for-loop change on your end.

jSoup is awesome! I'm using it more and more these days. 💪

448 Comments

Hi Ben

Just thinking off the top of my head, so this could be a completely idiotic suggestion, but couldn't you just create a deep copy of that object, like:

Duplicate(object)

Example:

paragraph = Duplicate (body.firstElementChild());
15,880 Comments

@Charles,

To be honest, I don't know how duplicate() plays with Java objects. We're consuming this stuff in ColdFusion; but, the jSoup library is ultimately a Java library; and, I'm not sure how "deep" the "duplicate" logic will run. Meaning, if the issue we have here is with parent-child pointers being left in place, it's very possible that duplicate() will just copy-over the same pointers into the new structure.

Now, that said, it does look like jSoup has a deep clone() method. So, it's very possible that this does exactly what you are suggesting it would do. I'd be curious to see if this would have an effect - I'm assuming it would.

All good thoughts!

2 Comments

Great article, Ben! Your detailed exploration of the jSoup error and the fix is incredibly helpful. It's not uncommon to encounter such issues while working with libraries like jSoup, and your solution will undoubtedly save others a lot of troubleshooting time. It's also fantastic to see how responsive the jSoup team is to address issues promptly. Keep up the great work, and thanks for sharing your insights and solutions with the community! 👍🔥

Post A Comment — I'd Love To Hear From You!

Post a Comment

I believe in love. I believe in compassion. I believe in human rights. I believe that we can afford to give more of these gifts to the world around us because it costs us nothing to be decent and kind and understanding. And, I want you to know that when you land on this site, you are accepted for who you are, no matter how you identify, what truths you live, or whatever kind of goofy shit makes you feel alive! Rock on with your bad self!
Ben Nadel