[Awaiting Admin Approval] [nutki] 32bit precision entry (1012B, 58 days, 20 hours before the deadline)

This has been a long road, but finally the site has been put together and all of the necessary prerequisites are in place for the first contest.

Some of you may realize that the idea for this site was formed in November 2007, and it's taken me a lot of time to get it all set up, but now I'm ready to unveil the first topic, which has been waiting for this website for almost 6 months now :)

Please note that this site is a work in progress, and is brand-new, completely written from scratch.

I consider myself to be pretty good about php development so I doubt there are any security vulnerabilities, though if you do find one, please let me know ;)

If you think the problem statement is not sufficient, or have other problems related to the contest, please let me know as well.

Hint: A good way to let me know is to post comments!

Hint 2: If you can't register for whatever reason, try the contact info on the Main Page

The test program (v4) in action.

The last 2 tests from v5.

First of all, this is a speed optimisation contest; so, the winner will be the entry which completes the tasks set before it the fastest. There are lots of ways to improve the speed of such a process, and it's a classical mathematical and computer programming problem, so I thought this was appropriate to start the competition with.

Here are the specific requirements of the code to be optimized:

* It needs to plot a rectangular section of the Mandelbrot set into an array (That array may be rotated or scaled, it's not axis-aligned)

* The rectangular section will be defined by top left point ( a complex value ), complex step values in the X and Y directions, and the width and height (in array elements) of the output

* The top left point and step values will be complex numbers represented by 2 64bit values (one for the real, one for imaginary dimension), each value being defined as a 4:60 precision fixed point number (-8 to +7.999...)

* The output array will be an array of 16bit values corresponding to the number of iterations before the value at that point diverges (up to some cap given in the function parameters) (this is defined more specificly in the section below)

* You will be given a 512kiB block of contiguous memory, which is aligned to a cache line boundary. You don't have to use it, but if you need to use a lot of memory, it's where you should be putting things.

* Regarding precision: You must keep 64bits of precision through all operations, the C code will do this, so if you don't, your submission will probably be disqualified. Without some guarantees on precision, the code isn't going to be very useful. Also, since it's a speed optimisation contest, you are welcome to make your code more precise, but you're probably better off not doing so. Good luck! A small amount of difference is understandable, too, given differences in math that can be used, so your output won't have to be exactly like the example code, just not completely different.

* Here's the function prototype for the function that does all this: void MandelFunc(u32 * rectangle, int max_iteration, int width, int height, u16 * output_array, void * workram);

* Rectangle is a list of 64bit numbers, 64bit numbers are stored as the low 32bits followed by the high 32 bits - the numbers are topleft_real, topleft_imaginary, stepx_real, stepx_imaginary, stepy_real, stepy_imaginary (total of 6 64bit values, or 48 bytes)

To make this process a lot simpler, I've written a small program for DS that can run code snippets in a rather controlled environment. It's in a zip file package below. This package contains the test engine, which is a bit hacky and slow ;), it also contains 2 other important elements: the example C++ code that does everything an entry needs to (you could submit it if you want, but then we might laugh at you.) - also included are some templates, bare bones C++ and asm files that show you how to start your own routine. The test app can be easily built with the latest versions of devkitARM and libnds, and will allow you to test the speed of one or more implementations you have created. The test app uses seeded random data to compare the output of your functions to the reference function, so you also get verification of whether or not your code is working correctly.

Download the Source package here (version 5) - This includes a test application which has a reference implementation of the function to be optimised as well as some templates to choose from.

Now, about Rules:

Every compo needs 'em!

CAN means you may; SHOULD means you are probably missing out if you don't, MUST and MUST NOT indicate mandates from the heavens; ask me before doing the opposite and be prepared to be told not to.

* You MUST NOT use any global variables (besides constant data)

* You MUST NOT put any code or constant data in ITCM or DTCM

* You CAN have constant data, but not too much, 64k is more than enough space

* You CAN use the hardware divide / sqrt capabilities of the DS

* You MUST NOT use timers, graphics hardware, or other DS hardware outside of the hardware math functionality. I can't think of any useful exceptions but may be willing to grant some if I'm wrong.

* You CAN use malloc/free/new/delete, and other standard library functions in either the C or C++ standard libraries. It's your cputime, who am I to stop you ;)

* You CAN use stack-allocated memory, just be aware that the full 16k of DTCM (where the stack resides) may not be available to you.

* Your code can't be too terribly big. I'm thinking 64k is the threshhold of "starting to get too big" and 128k is the threshhold for "yeah, that's just too big." - if you do manage to use that many instructions though, you won't lose that much speed by halving the size of your unrolled loop anyway :P (And I mean compiled size, not source size)

The Mandelbrot set is a fairly simple relationship based on complex numbers (see link for more info). In short, complex numbers are 2-dimensional numbers, with one dimension in the "real" plane (normal numbers), and one dimension in the "imaginary" plane (numbers multiplied by "i", or sqrt(-1) ).

So, adding and subtracting complex numbers works a lot like normal 2d vectors; (a+b*i) + (c+d*i) = ( (a+c) + (b+d)*i )

Multiplication and division are more complex though. Multiplication winds up looking like a polynomial expansion: (a+b*i) * (c+d*i) = a*c + a*d*i + b*c*i + b*d*i*i, and since i*i is -1, you're left with = (a*c - b*d) + (a*d + b*c)*i;

And division is even more complex, requiring first a step to reduce the denominator to a real number: (a+b*i) / (c+d*i) = (a+b*i)*(c-d*i) / (c+d*i)*(c-d*i) = ((a*c+b*d)+(b*c-a*d)*i) / (c*c+d*d) - Which is just a multiplication problem and division by a constant, so I won't complete the expansion here.

Fortunately, the Mandelbrot set only uses Multiplication and Addition, which are both generally pretty easy.

Specificly the Mandelbrot set consists of all of the points in the complex plane where a specific recursive relation never diverges to infinity. The relation that defines the Mandelbrot set is P(n+1) = P(n)*P(n) + c, where P(0) is 0, and c is the value of the point you're testing in the complex plane.

The Mandelbrot set exists completely inside a circle with radius 2, the furthest point in the set is (-2 + 0*i), so if P(n) is ever outside of that circle, we know that point will converge to infinity and not be in the Mandelbrot set.

We're not purely interested in whether the points are in the set though, we want to know how many iterations it took if they were pushed out of that circle, that's how we get the fancy colors! So, for this problem store the first "n" that was outside of the radius-2 circle into the 16bit array value for that location. You will be given a "maximum" value, and if your n passes the maximum value, you should set the maximum value for that array element and move on.

I do realize that this is a pretty tough problem! If you'd like to be involved in an optimization contest but this is too hard, tell me! I'm currently considering opening up a second "branch" of competitions, for easier optimization contests, and whether I do will depend on if anyone bugs me about it!

This competition is going to be running for 2 months - which should be long enough to implement and optimize something like this; I'm mostly using 2 months because I figure I'll have more time around then, the next month looks pretty hectic for me.

Also, I have planned to create an update for the test application, to allow timer-sampled profiling (with profiler in ITCM so it doesn't screw up the cache profiling much), which should prove a powerful tool for finding issues and optimizing code! look for that around a month from now.

[Awaiting Admin Approval] [nutki] 32bit precision entry (1012B, 58 days, 20 hours before the deadline)

[Awaiting Admin Approval] [pepsiman] pepsiman (2.66kiB, 8 days, 14 hours before the deadline)

[eKid] eKid's Magical Mandelbrot Function
[Deleted]

[Awaiting Admin Approval] [eKid] eKid's Magical Mandelbrot Function (3.85kiB, 1 hour, 22 minutes before the deadline)

4 entries have been received.

Compo has ended, entries can no longer be added.

Posted by [julien_c], 3342 days, 19 hours ago

I found that page very interesting, for Mandelbrot set beginners like me : http://warp.povusers.org/Mandelbrot/

Posted by [nutki], 3342 days, 10 hours ago

My first entry is likely to be disqualified. But it can be used to determine how error ratio or test cases should be set to invalidate such entries.Posted by [pepsiman], 3342 days, 4 hours ago

The example code seems to have problems multiplying 2 negative numbers:-5.96046e-08 * -5.96046e-08 gave: 1.19209e-07 should be: 3.55271e-15

ffffffefffffffff * ffffffefffffffff gave: 2000001000 should be: 1000

-5.96046e-08 * -5.96046e-08 gave: 1.19209e-07 should be: 3.55271e-15

ffffffeffffffffe * ffffffeffffffffe gave: 2000001000 should be: 1000

-8.67362e-19 * -8.67362e-19 gave: 1.19209e-07 should be: 0

ffffffffffffffff * ffffffffffffffff gave: 2000000000 should be: 0

-0.000764745 * -0.000764745 gave: 7.04045e-07 should be: 5.84835e-07

fffcde1b3cb80000 * fffcde1b3cb80000 gave: bcfd95a74f should be: 9cfd95a74f

-3.64659 * -3.64659 gave: -2.70239 should be: -2.70239

c5a791d780000000 * c5a791d780000000 gave: d4c3078453ed4684 should be: d4c3076453ed4684

-2.55755e-06 * -2.55755e-06 gave: 1.19216e-07 should be: 6.54104e-12

fffffd5176bb6000 * fffffd5176bb6000 gave: 200073123f should be: 73123f

-1.82113e-06 * -1.82113e-06 gave: 1.19213e-07 should be: 3.3165e-12

fffffe17253a3000 * fffffe17253a3000 gave: 20003a582c should be: 3a582c

-0.0911961 * -0.0911961 gave: 0.00831685 should be: 0.00831673

fe8a75f070000000 * fe8a75f070000000 gave: 2210d9bce68323 should be: 2210b9bce68323

-0.00442508 * -0.00442508 gave: 1.97005e-05 should be: 1.95813e-05

ffeddff7ac000000 * ffeddff7ac000000 gave: 14a852de5455 should be: 148852de5455

-7.77272e-09 * -7.77272e-09 gave: 1.19209e-07 should be: 5.9848e-17

fffffffde9dce500 * fffffffde9dce500 gave: 2000000045 should be: 45

-1.93403e-09 * -1.93403e-09 gave: 1.19209e-07 should be: 3.46945e-18

ffffffff7b183690 * ffffffff7b183690 gave: 2000000004 should be: 4

-0.000299141 * -0.000299141 gave: 2.08694e-07 should be: 8.94851e-08

fffec65412680000 * fffec65412680000 gave: 38055de74f should be: 18055de74f

-0.00996175 * -0.00996175 gave: 9.93558e-05 should be: 9.92365e-05

ffd73257f9800000 * ffd73257f9800000 gave: 682e9b890b18 should be: 680e9b890b18

Posted by [nutki], 3342 days ago

@pepsimanYou are right. The problem is that mid1 and mid2 parts should be signed. Here is the correct code:

s64 mid1 = (a&0xFFFFFFFF)*(b>>32);

s64 mid2 = (b&0xFFFFFFFF)*(a>>32);

Edit:

Oh, I did not notice that mid1 parts are s64 casted, but after the harm is done. So other solution would be to change second last line:

output += (((s64)mid1>>32)<<4) + (((s64)mid2>>32)<<4) + (hi<<4);

Posted by [**sgstair**], 3341 days, 23 hours ago

pepsiman: I see; yeah, I should apparently have paid a lot more attention to that function. Ok, I'll update the source package again (at this rate is there anything I've done right? :>)nutki: Thanks for the investigation & solutions :)

Posted by [strager], 3340 days, 18 hours ago

I've noticed some of the parameters to the Mandelbrot function are declared non-const (e.g. rectangle), but modifying their contents cause side-effects (ruining the next tests). This can be changed by declaring the pointed data const.
Posted by [strager], 3340 days, 16 hours ago

I've also noticed the test code is inaccurate. Porting it to used double or long double instead of a fixed number leads to different results (as might be expected). Interestingly though, using my own fixed multiply for the fixed-point version led to the same "inaccuracies". I conclude the current multiply function is bad (still).Here's the implementation using doubles (or long doubles, or floats; change the Number typedef): http://nightshade.nl/~strager/stuff/akkit-1.cpp

I've also found some slightly obvious over-checking in the example code, but I will leave it to other contestents to find and correct them if needed. =]

Posted by [nutki], 3340 days, 9 hours ago

@stragerI think, that double calculations will be less accurate than 64 bit fixed point. When converting:

return (Number)x / (Number)((s64)1 << 60);

you are losing up to 11 significant bits (usually 8 - 9 as you use only 2 bits of 4bit integer part).

You will gain for points close to axes, but for majority of set area the exponent stays the same, so the 11 exponent bits are not used much. The bottom line is that you would need larger type (like quad precision) to verify 4.60 multiplication correctness.

Edit:

I see you were using 'long double' also, but gcc on ARM will just use double precision for this, as far as I know.

Posted by [**sgstair**], 3340 days, 9 hours ago

strager: Yeah, I just ran some accuracy tests and it's not coming out how I thought it would; Out of time to think about it tonight though, I'll try to see what's up with it tomorrow.To clarify, I'm calculating in a way that should provide accuracy through the entire 64bit value. But, in my analysis I'm seeing errors much larger than should be possible. I've got a test application that's graphing the error into the problem domain graphically, and it's showing much more than just the bottom bit is error prone, which doesn't make sense.

I'm testing on PC, with long double as the comparison, my procedure is as follows:

generate random "a" and "b" 64bit values - create long double representations of the 4:60 number - run fixed point multiply and long double multiply, convert fixed point result to long double, compare results. The graphing program maps "a" and "b" to x and y, and maintains the average and the maximum error per pixel in a 1024x1024 image (white = <1bit error, blue = 1bit error, cyan=5bit error, green = 9bit error, yellow=13bit error, red=17bit error - it's a stepped linear system.) - Average error is on the left, maximum is on the right. Display goes from -8 to +8 in both directions

Warning: eye burning colors :)

http://akkit.org/sekrit/function-error.png

Posted by [nutki], 3340 days, 7 hours ago

I am not sure what you meant by 1bit error. But my testing your example function against 80 bit long double on my PC show differences below 1e-18 which is pretty accurate considering that 1/(1<<60) is 8.67361737988404e-19.Posted by [**sgstair**], 3340 days, 5 hours ago

nutki: Ok, interesting - that's actually the result I'm expecting. I wonder if the error my system is showing is due to long double not actually being more than double. I need to validate a few things.
Posted by [strager], 3340 days, 4 hours ago

The main issue I'm seeing is that not only does using (long) double cause different results than your fixed multiply, my own fixed multiplication function has the exact same difference (19/122880 in the test case template). I could send you the function if you'd like (in private, of course. It's much faster than your implementation. =])Too bad I can't test a simple multiply function (while(a) { out += b; --a; }) because it takes too long to execute. I believe that would return the most accurate and predictable answer, however.

Also, I didn't know gcc reduced `long double` to `double` on the DS. Interesting! I tried `long long double` but I think the compiler took that as `long long` (s64). xD

Posted by [**sgstair**], 3340 days, 3 hours ago

Technically the multiply should be completely accurate, as it's based off a true 64*64=128 multiply, and then chops out the section that's needed. The only minor nit really is that it doesn't round, which causes the loss of 1/2 bit of accuracy. I didn't think "long double" would reduce to "double" on pc compilers, I can certainly see it not being done on DS, because 128-bit floating point math in software is rather insane. But I'm going to check that now, to verify in my test whether I'm actually using 64bit or 128bit float. As nutki said, for most of the expected range the fixed point multiply works in, the 4:60 number actually does have several bits more precision than a double can. so, double isn't really an appropriate test.Edit:

Gah, indeed long double on MSVC is the same as double - I'm happily graphing the errors present in the double continaer, not the 64bit number. I think the multiply is correct at this point, coming up with another way to verify. And note, if you do implement the repeated add, besides it taking several minutes per test, you will come up with the same value I do :P

Posted by [Cydrak], 3339 days, 23 hours ago

Hmm, few questions.1) Can we submit more than one entry? Can they be updated? (spontaneous ideas at 3 AM for example ;p)

2) Are you going to post timing counts before this ends? How do we know what to shoot for? ^_^

3) When you say "must keep 64 bits precision", what's that mean? There are lots of ways to draw Mandelbrots, but some of the faster ones sacrifice a pixel or two (eg. from aliasing, which isn't a even a precision problem...). Others use a bit size suited for the zoom level. For example, in my testing, 32 bit math differs by 1/122880 (!) using the default settings. Considering it's around twice as fast, I'd say that's hardly useless, pretty significant actually! (And if it was 0, why should the internals matter at all? Efficiency does seem like a goal for an optimization contest. :)

And speed and precision are often enemies, a mandate on the latter leaves far less room for a good tradeoff, if you ask me. I juuust squeezed my iterator into the 14 regs available. (Don't try this de-caffeinated, ugh :D) Since the math now dwarfs the RAM and stepping overhead, I'm not sure (without smarter/riskier code) how much faster I could make it. :/ Hopefully, I'm just misunderstanding, and that's why *workram is there after all!

http://www.mrob.com/pub/muency/speedimprovements.html

This page has gobs of interesting material on that note...

As to the multiply, I compiled an arb-precision library:

http://spinning-yarns.org/michael/mpi/

and checked 100k random products on the DS. Between my ASM, your C template and MPI they are all identical, except rounding on the low bit. (MPI uses sign-magnitude, not two's complement, so shifting rounds toward zero.) This time, it looks like you're good. :)

Posted by [**sgstair**], 3339 days, 13 hours ago

Cydrak: Ah yes, I should post the answers to those questions in the main compo text;1) Yes, you may submit more than one entry. You may revoke/"Delete" entries once they're submitted (can't change this status once the deadline passes) - Entries can't be updated, but this system lets you upload newer versions.

2) I'm not going to post timing counts before compo ends, but you're welcome to post yours to encourage competition :P I will probably post my times when I get going with it seriously.

3) Yeah, I'm aware it's not clear yet. If you manually/randomly come up with some much more zoomed in tests though, such that the bottom bit moves around (1>>28) unit per pixel, that's about the precision the final tests are going to be done at. They might be smaller even, but I'm not really sure yet. :P There will be a new test app in a few weeks that will provide these "harder" tests, and a way to explore the fractal to find where your algorithm breaks down in comparison to the reference. (or just look at the pictures :)

Pretty much the point of this is to produce useful code, so optimizing it to the point that it's impractical isn't really helpful. I could define the precision requirements more strictly, but I will probably define it as being very close to the reference algorithm at a specific zoom level with a certain number of iterations - just for simplicitly's sake. I didn't mean to make this so complex, and I didn't really want to make this into a judgement call contest either, but I guess everything becomes clearer after you actually jump into it ;)

Side note: Everything deleted on this site can be undeleted, Just because that's not entirely clear without saying so.

Posted by [strager], 3339 days, 5 hours ago

I've noticed my algorithm introduces quite a bit of error (449 in the test). I'd like to find where this error is, but I can't see any difference between my generated image and the comparison image. Would it be possible to output a pixel-per-pixel difference image? That'd be really helpful.Currently my algorithm uses 0x000003AD1962 cycles, with an error of 449, at -O2. ARM ASM is used for the fixed multiplication (non-inline, sadly).

Posted by [**sgstair**], 3339 days, 4 hours ago

strager: That's something I've been considering. May have to wait a while for it though, it's something that would probably be added in the next major update to the test app - you're welcome to hack the test system a bit for personal testing though, to get that sort of thing earlier.Also I think you should pay a bit more attention to your multiply, I think you may be losing some bits or counting some twice, and should take a closer look.

Posted by [Cydrak], 3339 days ago

Yep, it was quite tempting to hack the pixel difference in. Could help with higher level optimizations, to see the problem areas. Also interactive tests, where it might help to know the parameters to copy down and test elsewhere. Good to hear. I've got other things on my mind, so it's no rush. :)What was it meant to be useful for? I suppose that's where the subjectivity comes in. I do mean to be practical, I'm just tempted to push the rules if it's more fun. Sorry if I'm being a pain, I don't mean it that way. ;)

Having a benchmark is good. I think what I'm getting at, is that it might favor some approaches over others. If the contest is "write a complex dword SMLAL" then that's no problem. If it's actually Mandelbrot then I suspect deep fancy zooms would favor cleaner, simpler code (since all the math has to be done anyway), while a range of tests benefits more practical, scalable, approximations. IMHO the latter are more interesting, but that's up to you.

My MandelFunc is all ASM, and I'm down to 0x000004c92872... eep, maybe there's more room for improvement than I thought! Thing is, I'm *completely* out of regs: the full Zr*Zi uses 7. That leaves 4 for (Cr,Ci), 2 for the pending Zr result and 1 for the iter counter. There are tradeoffs to be made, for sure... it's hard to cope with 1 temp in the square/multiply, even if the input's cannibalised (Rs == Rlo or Rhi).

strager: 449 seems like not even 32 bits accuracy. Maybe a sign error? I would compare it with working multiplies, especially print out the partial products so you can see how they should add up.

I wound up doing three types of multiplies:

- L*L, which is unsigned, UMULL works there.

- H*H, which can use SMULL (sign-extends to 64 bit before multiplying)

- H*L (the two "mid" products), these are the ones that can cause trouble.

For H*L, only the signed H arg should be extended. E.g. if you have 0x87654321 * 0xF5555555, you actually want 0xFFFFFFFF87654321 * 0xF5555555 for the two's complement to work out. So I UMULL, then subtract L<<32 if H was negative. That's if I understand this right! I found it easier to grab some paper and work it out in 4/8 bit first.

For addition: I'd watch the carries, and don't shift too soon like I did. Since H*L can go negative, you may have to sign extend when adding. Hm... or maybe abs(a)*abs(b) and fix the sign afterward. I'm not sure which is faster.

Posted by [strager], 3338 days, 14 hours ago

Turns out the main cause of the 449 was not the multiply function. But thanks for you help, Cydrak. =]For my multiply I convert the oprands to unsigned and add the sign at the end. That way things are simpler in my head.

This comment thing seems to be turning into a forum thread rather than a real comment listing. xD

Posted by [**sgstair**], 3338 days, 8 hours ago

Cydrak: What's it useful for? Well. Um. Actually I don't have a really good answer. This problem is however the one that motivated me to make this competition in the first place, but I promise the future competitions will be more relevant to general purpose programming. Probably things like huffman decoding, 1D and 2D cosine-related transforms. I don't have that many clear ideas yet, suggestions are appreciated.I understand that a lot of the "fun" here can probably be derived from the more abstract approximation techniques and similar. The reason I'm making this more subjective than trying to impose a strict accuracy standard is that I would like to allow really clever tricks to approximate as much as is possible and still get a good result ;)

strager: yeah, this is a lot like a forum thread. That's about what I intended, but it may get too long for one page. I considered adding pagination for comments, still not sure if it's a good idea or not :P

Posted by [strager], 3337 days, 20 hours ago

I have a question regarding the parameters which will be used to compare entries.Will the parameters be approximately the same as the test ones? Will the `rectangle` values be smaller, or will the `max_iterations` be significantly larger? Or will the seed simply be changed a bit?

And regarding error, what is a reasonable limit? 1%? Or what?

Thanks.

Posted by [**sgstair**], 3332 days ago

strager: I did already sortof answer this about as well as I'm probably going to for now; I really don't have a clear plan yet, only that the rectangle will be much more zoomed in and the locations will probably be hand picked. I will probably update the logic in the test app to generate similar sorts of zoomed in views randomly for testing purposes.What I've decided to do, to determine what level of error is acceptable, is to develop a test app on PC to compare the 4:60 precision with a much higher level of fixed point precision, and base a decision on how deep can be zoomed in practically on that. Also will probably base a decision on the reasonable limit for error on that :) for now I can just speculate as to what that's going to be though.

Posted by [**sgstair**], 3295 days, 18 hours ago

Well, this has been idle for a while :) Time to reawaken the monster.I'm working on clarification for precision, and my own implementation today. And probably a new set of test images that will provide pass/fail decisions for your code rather than just looking pretty.

Much to do, much to do. I'm very much considering adding RSS feeds to this site too, that might help discussion a bit...

I do have a workable solid model for testing precision now - just need to implement it (partly done at the moment) More details will follow soon. Unfortunately my time has been a lot more limited than I imagined lately, so the interactive and profiling code will have to wait 'til next competition.

Posted by [**sgstair**], 3294 days, 18 hours ago

(woo! v5 is posted now!) Ok, bug me if anything is out of place, I will have an implementation of my own to provide numbers from soon.
Posted by [pepsiman], 3289 days, 1 hour ago

I've submitted 2 entries, the slow one uses 93CF7B06 cycles on my DS for the test images in v5.
Posted by [eKid], 3283 days, 22 hours ago

Whee! I hope some last minute entries show up.
Posted by [**sgstair**], 3283 days, 14 hours ago

I smartly wasted my time doing other things, so my entry isn't ready. It's not as fast as the others yet, but I might release it later just to do so.Opti's probably going to be inactive for a while after this compo, I've got some serious RL things to take care of. Next compo will be better - and thanks all who participated :)

Posted by [Cydrak], 3283 days, 12 hours ago

Arg! That'll teach me to post last minute. Trouble cropped up, as expected. (Did I ever mention I loathe Make?)It's alright though, mine wasn't terribly polished either. I had another big project while waiting for the new tests, and never moved onto more adaptive algorithms. :P So this is just a minimal iterator. It takes around 0xc2100000 cycles under V5, or about a quarter of the example's time. (And should be bit-for-bit identical.) Sorry about that!

I might poke it later on. For the extra-curious, I've tossed it up here:

http://draci.chaosnet.org/code/opti_c1_v5_Cydrak.zip

Posted by [eKid], 3283 days, 12 hours ago

Yay :) I wanted to see how you implemented your multiply functions
Posted by [eKid], 3267 days, 14 hours ago

Still awaiting approval? :PI optimized mine a little more; mixed the math functions together a bit...

http://ekid.nintendev.com/mbrot2.zip :)

It dropped a bunch of cycles, 6B1A8EF8 in no$gba, a bit higher on hardware. I'm not really sure why it lost a little accuracy though : (maybe overflowing somewhere...)

Posted by [Cydrak], 3263 days, 20 hours ago

Hmm, I played around some, and I don't think it's overflowing. At least, not often enough to matter. (Nice, pre-shifting and still a spare bit to avoid the mid carries..)I did notice a few things:

- Not adding the low*low product (may carry into the top two words)

- Negation as ~x rather than ~x + 1

- Taking absolute values before multiplying (maybe due to rounding direction?)

- Maybe different escape checks?

The first three I tried--and they increased the diff for me. Naturally, attempting to "fix" them only slows it down. Arg! Guess it depends... what's ~7000/147000 (average off-by-one for 5% of pixels) worth to you? For the supplied tests, I can barely see a difference.

Posted by [eKid], 3263 days, 6 hours ago

By less accuracy I mean less than it was already (around 3600 in the original release) :PYeah.. I tried to do the negate X with '~x + 1' but it dropped the accuracy more??.. (maybe I messed up somewhere there) :

I thought maybe the +-1 error from not doing the low*low product wouldn't be noticeable in the output. (the example doesn't round the bits)

The whole thing is rather confusing.. x_x

You must be logged in to submit a comment.

Main Page - Competition List - Member List - Rules

Generated in 0.0151 seconds - 3 DB queries performed.

Generated in 0.0151 seconds - 3 DB queries performed.

sgstair], Started at 2008-03-29 00:00:00 (Competition Ended 3283 days, 14 hours ago.)