Author Topic: Premature optimization is the root of all evil, but...  (Read 2758 times)

0 Members and 1 Guest are viewing this topic.

Offline Kerame Pxel Nume

  • Taronyu
  • ****
  • Posts: 736
  • Karma: 6
  • CPU follow my command since 1991
Premature optimization is the root of all evil, but...
« on: June 07, 2010, 09:51:31 am »
when developing stuff one should at least keep his mind on things like L1 code cache behaviour:
http://x264dev.multimedia.cx/?p=201

Now take this small C++ snippet, which uses STL for only two things (a vector and a algorithm), compile it with
g++ -Os -c $SRCFILE

Code: [Select]
#include <vector>
#include <algorithm>
#include <iostream>

struct vertex;

void serialize_vertex(vertex const &);

struct vertex {
public:
vertex(float x, float y, float z)
{
pos[0] = x;
pos[1] = y;
pos[2] = z;
}

vertex& operator=(vertex const &v)
{
for(int i=0; i<3; ++i)
pos[i] = v.pos[i];

return *this;
}

private:
float pos[3];

friend void serialize_vertex(vertex const &);
};

typedef std::vector<vertex> vertex_vector;

void serialize_vertex(vertex const &v)
{
std::cout << v.pos[0] << ' ' << v.pos[1] << ' ' << v.pos[2] << std::endl;
}

void serialize_vertex_vector(vertex_vector &v)
{
std::for_each(v.begin(), v.end(), serialize_vertex);
}

int main()
{
vertex_vector V;
for(int i=0; i < 50; ++i) {
V.push_back(vertex(rand(), rand(), rand()));
}

serialize_vertex_vector(V);

return 0;
}

And look at it's size. If I compile with gcc-4.3.4 for x86_64 it's about 0.8k in the .text segment.

Okay, that't not a problem yet, but now imagine if you'd implement a real algorithm, like eliminating all double vertices in a terrain grid (after LOD you may and up with exactly those). Aside the fact that STL algorithm is suboptimal for that ( O(n²) in a naive program, also a lot of data gets copied around due to RAII), You'll easily go over that L1 code cache limit of 32K.

Now let's look at the equivalent C program:
Code: [Select]
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

typedef struct vertex {
        float pos[3];
} vertex;

typedef struct vertex_array {
        size_t len;
        vertex *v;
} vertex_array;

void vertex_array_push_vertex(vertex_array * const VA, vertex const * const v)
{
        vertex * new = 0;
        if(!VA)
                return;

        new = realloc(VA->v, sizeof(vertex)*(VA->len + 1));
        if(!new)
                return;

        VA->v = new;
        memcpy(VA->v + VA->len, v, sizeof(vertex));
        VA->len++;
}

void serialize_vertex(vertex const * const v)
{
        printf("%f %f %f\n", v->pos[0], v->pos[1], v->pos[2]);
}

void serialize_vertex_array(vertex_array const * const V)
{
        int i;
        for(i=0; i < V->len; ++i)
                serialize_vertex(V->v + i);
}

int main()
{
        int i;
        vertex_array V = {0,0};

        for(i=0; i < 50; ++i) {
                vertex v = {{rand(), rand(), rand()}};
                vertex_array_push_vertex(&V, &v);
        }

        serialize_vertex_array(&V);

        return 0;
}

This does the very same like the C++ equivalent, yet its .text segment size is only 0.3k. And since you keep working on the very same dataset (no RAII, no duplicates) as a side effect you also don't trash the L1 data cache and the L2 cache in total (which is shared by code and data). So if you've got a complex algorithm, doing it in pure C will give you that reduction in size of fit it into L1 code cache, resulting in a order of magnitude better performance.

Offline bommel

  • Palulukan Makto
  • *****
  • *
  • Posts: 3865
  • de Germany
  • Karma: 22
  • Addicted to music!
Re: Premature optimization is the root of all evil, but...
« Reply #1 on: June 08, 2010, 11:58:50 am »
Sometimes I even break it down to asm level (for example when using SSE).
For all of you interested, I can recommend the Intel Software Optimization guide as a good reference. You can download it for free at Intel (don't have the link atm) or pm me.

 

Become LearnNavi's friend on Facebook Follow LearnNavi on Twitter! Watch LearnNavi's videos on YouTube

SMF 2.0.15 | SMF © 2017, Simple Machines
Privacy Policy
| XHTML | RSS | WAP2 | Site Rules

LearnNavi is not affiliated with the official Avatar website,
James Cameron, or the Twentieth Century-Fox Film Corporation.
All trademarks and servicemarks are the properties of their respective owners.
Images in the LearnNavi.org Forums and Gallery may not be used without permission.

LearnNavi Affiliates:
ToS

LearnNavi is the community to learn Na'vi, the Avatar Language
"A place where real friendships are made." -Paul Frommer

AvatarMeet | Learn Na'vi Forum | Learn Na'vi Wiki | Na'viteri

LearnNavi