eigenclass logo
MAIN  Index  Search  Changes  PageRank  Login

Object removal using seam carving, still fast

The content-aware image resizer I presented a few days ago has gotten somewhat popular and triggered some discussion on reddit largely centered around its speed relative to the LiquidRescale GIMP plugin and whether OCaml can be "as fast as C". Here follows some more information about this particular case, where some OCaml code is outperforming some C code by a 6X margin.

In order to make the comparison meaningful, I implemented the main feature LiquidScale had over my code, energy biasing for object removal and/or preservation.

Take this image*1

policemen_by_jijis.jpg

Apply this energy bias, which indicates that we want the third policeman from the left to be removed

policemen_bias.jpg

Here's what you get:

policemen_carved.jpg

Implementation choices, cache efficiency

Both implementations use essentially the same algorithm, the main difference being the data structure used to represent the image and the energy map.

The governing principle is: if different parts of the structure are accessed at different times, separate them and pack each one as densely as possible.

LiquidScale uses a matrix of points which themselves contain RGB data, energy, bias, visibility information, etc. Points are never removed from the matrix; instead, they are made invisible. Each point has got a visibility level; those points whose level is smaller than the current global level (which is incremented by one in each iteration) become invisible. How so? Access to the matrix is done via smart cursors that skip invisible points.

My code uses three separate structures, one holding the RGB data, another with the energy map and a third one used for the dynamic programming algorithm that finds optimum seams. This is much simpler and allows me to use plain old integer matrices in the inner loops. The resulting code is much faster, for it needs fewer instructions and makes better use of the cache. A large part of the energy map can be held in L2 cache and a row will be processed in full when it's been brought to L1 cache; the code shouldn't be waiting for much more than a cache line fill per row. Whereas 16 points fit in a cache line when you're using an integer matrix, in LiquidScale you get only one point and half of another. Moreover, if you're not lucky they might not even be visible --- a whole cache line gone to waste.

In the previous post, I was careful enough to write that "some of the extra info [held in LiquidScale's structure] is used for things my implementation doesn't do yet, such as energy biasing", but actually only 4 out of 40 bytes in LiquidScale's point structure are needed for the energy bias. I decided to implement energy biasing to verify that I wasn't forgetting something that would make the extra data necessary.

It took me little time, and as I anticipated there's no major effect on speed. Seam carving an image with a given energy bias is only 10-15% slower than without biasing (that code hasn't undergone any optimization yet so I can probably bring it even closer), and the overhead is exactly 0 when you're not biasing the energy map.

Energy biasing was easy because the seam carving algorithm is abstracted over the energy computation:

module type ENERGY_COMPUTATION =
sig
  type energy
  type t
  val compute_energy : t -> image -> energy
  val extract_energy_matrix : energy -> int array array
  val update_energy_h : energy -> image -> path -> unit
end

I just had to write an EnergyBias functor that takes an ENERGY_COMPUTATION module and creates a new ENERGY_COMPUTATION that biases the energy map using an image supplied by the user. Red areas are to be deleted in the target image, and green sections are preserved while resizing (iow., the EnergyBias functor uses the G and R channels for positive and negative biasing).

I'd have liked to say it worked on the first try as soon as the code typed, but there were two bugs (it was fairly late and I ran it right away without reading the diffs first):

  • a faulty loop bound that was detected by OCaml's array bound checking: "for j = 0 to img.width - 1 do" instead of "for j = 0 to img.height - 1 do"
  • a typo, "carve_v" instead of "carve_h", which resulted in vertical resizing being performed when you specified horizontal resizing with a bias. This was also detected at runtime, since the code verifies that the target and the bias images have the same dimensions.

Still, chasing a wild pointer is harder...



No Title - Danno (2007-10-05 (Fri) 06:41:25)

A) This is a great example of how good data structure and algorithm choice is the most important part of optimization (which is kinda funny since the algorithm usually dictates what your code looks like and therefor has to be premature).

B) Seam carving is at once cool and scary. I mean, how hard would it be to automate the energy bias over a given object in the frames of a video? We'll be even less able to trust what we see.

mfp 2007-10-05 (Fri) 08:34:33

a) The strange thing is that my code uses simpler structures, mainly plain matrices. I can't see why LiquidScale goes out of its way to combine RGB, energy, bias, level & aux data in one structure that requires special cursors.

I told myself it was surely needed for energy biasing. But then I added it to my code in a mere ~60 lines. What's next? Seam insertion. But this is easily implemented by performing seam carving while recording the connected paths and then inserting new pixels along them. So I don't really know the reason behind the extra effort that makes the thing slower. I could understand, say, a representation of pixel rows using balanced trees to remove pixels in O(1) time (I think the constants would play against you, though), but I just don't see the gain in the current structure.

b) Sounds doable with maybe some kind of image segmentation and object tracking. Yeah we cannot trust anything anymore.

Anonymous 2008-02-06 (Wed) 07:33:11

hi there


ORuby? - Daniel Berger (2007-10-05 (Fri) 10:39:03)

C'mon, you know you could do it. :)

mfp 2007-10-05 (Fri) 15:34:40

sssshhh!


Used to be a ruby blog - the_dormant (2007-10-05 (Fri) 14:32:51)

This is interesting stuff, would you like to comment about your last posts being about Ocaml?

mfp 2007-10-05 (Fri) 15:22:40

I've done little Ruby as of late; anyway it's just too slow for something like this. OCaml is better at the sort of things I've been interested in as of late, and in addition to the superior performance I feel more confident about the code in a way unit tests alone cannot approach.

The problem is not at the unit level, but rather at the integration/functional level. Say you have two classes, one being a client of the other. You can unit test them separately, by stubbing the former. But who checks that the stubs correspond to the actual interface? There's not much to be done there short of executing the code, which isn't always an option. It also means that if I have N classes and M clients, I might have to prepare something close to N*M tests (or hand-code respond_to?/instance_method checks that amount to incomplete type specifications), instead of N+M. So far I haven't seen anything in Ruby-land nearly as satisfactory as the module signatures (inferred or given manually) in OCaml. Also, module signatures are great for documentation purposes, much better than the informal conventions followed in RDoc, where you often write things like

 def foo(an_io_or_string)

i.e., you *are* thinking with types, even when "duck typing" ("... where arg1 responds to the baz method").

Going back to performance, Ruby pretty much forces you to think in terms of arrays and hashes --- core structures implemented in C. Very often, the asymptotic gains from another structure are more than offset by the multiplicative constant, as low-level code in Ruby is at least 100-200 times slower... Being able to code a new data structure in 20 lines when you need it (see the RB tree example included in rocaml that is 3X faster than the rbtree extension) feels great. Especially with functional ones.

That said, there are still many things where Ruby is much more convenient. String processing, some IO, networking... Ruby's stdlib is much richer than OCaml's. And unit testing + functional tests alone is also adequate in many projects. I've written lots of Ruby in the past and I will do so in the future (see rocaml for a recent example).


seam duplication - TML (2007-10-10 (Wed) 20:02:30)

This is great. I've been looking for something to really help me dig into ocaml. Are you planning on implementing seam duplication at some point?

mfp 2007-10-11 (Thr) 03:10:26

Yes, I have it half-implemented (have to expand the cmdline & check it not only types but also works ;) already in ~50 locs...

If you're looking for ocaml code to read, there's better looking out there. Seam carving calls for imperative code if you care about the multiplicative constant and not only complexity, so there are lots of for loops and some refs in my code.

You can take a look at the stdlib (be sure to skip the Obj magic though), and for instance OCaml Reins, which includes several purely functional data structures (while you're at it, read Okasaki's book). If you are interested in some particular domain, search the Caml hump.

Coming from Ruby, I've found documentation in the OCaml world vastly superior; module interfaces and types play no small role in that. The reference manual is dry but precise and up-to-date. O'Reilly's "Developing Applications With Objective Caml" is OK (I read it the original French). Judging from online reviews, "Practical OCaml" is to be avoided like the plague.


No Title - nec (2007-10-27 (Sat) 06:15:45)

i tried object removal with your images and implementation, but couldn't get it work.

i downloaded both policeman image and energy bias image and run as

./carve -vert 0 -bias policemen_bias.jpg policemen_by_jijis.jpg

the output image is the same with the original one. am i doing it wrong?

mfp 2007-10-27 (Sat) 07:48:20

Try with -horiz 30. (-vert 0 downsizes the image vertically by... 0 pixels, leaving it unchanged).

nec 2007-10-27 (Sat) 12:58:51

it worked perfectly now. i thought applying biases shouldn't need resizing, so i used 0.

how do you generate the bias image?

mfp 2007-10-28 (Sun) 05:07:46

No magic there; I made it manually by painting over a superimposed semi-transparent layer that I saved separately.

nec 2007-10-28 (Sun) 08:06:16

it's so cool i want to write a GUI for this. Is it a wrong a way trying GUI dev to learn OCaml? What's the state of GUI development for OCaml? Any suggestions?

mfp 2007-10-28 (Sun) 16:24:03

I haven't done any GUI development with OCaml and I don't know if it's a good way to learn the language. It would expose you very early to the OO part, which most caml riders avoid, for there are usually better solutions. I guess it depends on how you like to learn, with a more hands-on approach or by going through exercises of increasing difficulty. I myself learned OCaml mostly by reading Developing applications with Objective Caml in the original French version (and then by writing >50KLoc worth of OCaml :-)

As I said, I haven't done GUIs with OCaml, but AFAIK the two best options are CamlTk and LablGTK2. LablGTK looks great and is reportedly mature. You can compare the "scribble" example found in GTK's tutorial written in C with the OCaml version.


*1 I thank jijis for making this image available through a creative commons license