First post, by frobme
EDIT: Some sort of error on my part appears to have generated unusually bad GCC run time results. Since I use GCC myself all the time, I'm at a complete loss as to why, but I'm editing this post to reflect the second run results which are competitive with the other compilers.
Hola. Recently for my own education and to test some optimizations I was working on I set out to profile the performance of various posted builds and different compilers/environments I could verify myself.
This is by no means a comprehensive set of benchmarks. The performance characteristics of DOS games vary widely and many will exercise functionality not covered by my test, which could result in edge cases that behave contrary to these results. You have been warned, mileage my vary, blah blah.
I chose to use Doom and the famous -timedemo as my reference test. As I find more highly repeatable benchmark situations that run in DosBox, I will try to add them here (most of the "system benchmark" type of apps dont get along with Dosbox, due to esoteric hardware calls). So for all of these tests, I ran "doom.exe -timedemo demo3".
My system is a Core 2 Extreme X6800 2.93Ghz, 2GB with GeForce 8800 GTX 512MB. It is not overclocked and for the purposes of the tests, I ran WinXP 32bit.
I compared the following CVS versions:
The AEP Emulation daily CVS build from 01/07/2007, referred to here as "AEP"
Gulikoza's CVS build from 11/17/2006, referred to here as "Gulikoza"
YKHwong's CVS build from 01/05/2007, referred to here as "YKHwong"
My own from scratch GCC-3.4.5 build, built with "-march=pentium4 -O3 -fomit-frame-pointer" and stripped, referred to here as "gcc-3"
My own from scratch GCC-4.1.1 build, built with "-march=pentium4 -O3 -fomit-frame-pointer" and stripped, referred to here as "gcc-4"
Both GCC builds were built under MSYS/MinGW. If you're wondering, yes that means I built GCC-4.1.1 myself, and yes most of the posted instructions on how to do that are wrong =). I also built 4.3, but it was unstable and I haven't tested it yet due to that.
An Intel ICC Compiler build, version 9.1.33, built with VisStudio as an IDE but ICC as all the compile/linking. It was optimized specifically for Core Duo. Referred to as "ICC". I did not do a program guided opt for this compiler.
My own Visual Studio 2005 built, in three flavors:
- Built with the free edition of Visual Studio that anybody can download (referred to as "VisStudio-free")
- Built with commercial VS2005 SP1 using standard release optimizations, including whole program (referred to as "VisStudio-regular")
- Built with commercial VS2005 SP1 using release optimizations as before, but with one pass through of program guided optimization (a two pass optimization system which requires you to run the program first), referred to as "VisStudio-PGO")
All of the tests were done with the same version of SDL.dll, a 1.2.11 version that was built from scratch using VisStudio 2005.
As stated, I used original DOS Doom (1.9s) and -timedemo demo3. This is self running and exits with a "realtics" value, showing how many "real time" ticks occurred for the fixed number of "game ticks" that occur in the demo. Of course within DosBox they aren't real real time ticks, but it still works as a reference. If you'd like to see a description of Doom as a benchmark and see a huge chart of real world machine results, go here. You can run the demo yourself and see where your DosBox virtual machine compares to other real hardware.
For each build of DosBox, I would run the timedemo 3 times, and average the results. DosBox settings were always the same, windowed mode, "surface" display, 0 frameskip, normal 2x scaler, dynamic core, max cycles (so it runs as fast as it can), 16MB virtual machine, SB16 audio. The only modification to the Doom setup was to turn off audio - not because it was a problem, but running 30 something tests with it on was going to drive me crazy, sorry.
I used the incredibly sweet DosBox Game Launcher from Ronald Blankendaal to make all the launching less burdensome. Thanks for an awesome front end Ron.
Results
AEP: 1104,1060,1048 avg 1070 fps 69.8
ICC: 1038,1062,1019 avg 1040 fps 71.8
VisStudio-free: 1003,996,961 avg 987 fps 75.7
VisStudio-regular: 973,976,989 avg 979 fps 76.3
Ykhwong: 960,949,1015 avg 975 fps 76.6
gcc-3: 916,1012,1013 avg 980 fps 76.2
gcc-4: 925,928,922 avg 925 fps 80.8
Gulikoza: 889,886,936 avg 904 fps 82.6
VisStudio-PGO: 850,890,900 avg 880 fps 84.9
As a comparison case, I ran the same test against my retro machine. This is a P3-1Ghz, 768 RAM, GEForce 4 TI4600 running DOS 6.2 with nothing resident except QEMM for memory management:
P3/1Ghz GeForce Ti4600 627,626,626 avg 626 fps 119.3
Conclusions
Due to some error on my part, the initial gcc builds looked poor, but this has since been solved (still dont know what was happening, I used the same environment and settings for the second builds). Because of this I'll further investigate a gcc-4.3 build and a PGO guided optimization and see what those results are.
The results (now) are all closely packed together. Obviously the PGO VS2005 build was the highest FPS, but keep in mind that PGO run was against Doom specifically, so it looked for all the functions that got maximized in Doom and did what it could with them (PGO reported 50 functions in DosBox as targets, less than 1.5% of the total functions).
I didn't list results here, but I did several one-off tests of various optimizations that aren't listed since they were not interesting. Mostly DosBox is resistant to large performance variance due to optimization features, beyond the obvious ones the compiler makes easily. So if you are spending a lot of time playing with esoteric flags on GCC like -funroll-all-loops and such, you're most likely wasting your time.
And there you have it. Hope people get something out of it, I spent way too much time on it already =). Suggestions etc are welcome. Yes, I know there need to be more test cases to validate results. Feed me repeatable benchmark conditions that are game-like and run in DosBox, and I'll put them through paces.
Couple of side notes: the fastest display surface for me on my machine is "opengl" (windowed), which resulted in 119.3 FPS in this test. It's too much work to test an X64 build with Visual Studio, because Microsoft literally removed the _asm keyword from the compiler for X64, which would require considerable restructure of the DosBox code I'm not up for right now (you can still compile assembly, it just has to be in a module by itself). Besides, all that code should be re-written for X64 pretty much, if you're going through the trouble of a native build. Something for later, and it wouldn't have a major performance impact anyway.
Enjoy.