Crash Offsets

Cannon

The boot.ini /usepmtimer is a fix for a FLServer issue rather than FLHook or FLAC. I should have mentioned, we have an Intel CPU.

Also the test server has been running for 12 hours now. Yay!

robocop

HeIIoween wrote:
Not only for Amd

Robocop try /ONECPU

Enable in control panel - where power setting is - all power settings t “on”

Disable in BIOS power management support for CPU

This named Time Drift Bug

In Linux PM-Timer Bug

I’ve implemented the boot.ini workaround, we’ll see what that does for us.

@Helloween, for power options I have set ‘Always On’. Is that what you mean?

What is /ONECPU?

Next reboot I’ll disable BIOS power management support for CPU.

R

robocop

Well, daily misery report again…
implemented that /usepmtimer suggestion in the boot.ini file, no joy. 18 minutes after reboot got the triple .\HookFunction.cpp(887): *** ERROR: Exception in Hook_IServerImpl_TradeResponse (unhandled exception) message and everyone online got booted.

So, the fact that it happens within 18 minutes of a reboot indicates to me that it’s not likely a memory leak, especially when the server often runs fine for hours. It’s after 1630 server time so, this is going to go on now every 15-20 minutes or so as long as two or more players are online. At least until midnight anyway…

AlphaWolf seems to think that traderesponse message is related to NPCs scanning something/someone but I don’t know. Nothing in that regard has been changed and this behavior happens whether the mod is activated or not…

? Offline

/ONECPU - use one CPU on multiprocessor system
http://en.wikipedia.org/wiki/NTLDR

? Offline

I still have one problem - flserver frozens with window whitening and flhook window is fine but ofcourse without any response to commands.

All the same - no logs, no events, nothing.

Because all my server are on physical machine i dunno how it will be after restart with /usepmtimer key in boot.ini

Also will try to disable apic/acpi functions in bios - google says must help.

robocop

usepmtimer and /onecpu did not help my problem.

Still crashing in dalib.dll.

What causes hash collisions?

adoxa

Hash functions are very good, but not infallible, so sometimes two different strings will produce the same result. In the vanilla game, there were five collisions (search jflp.txt for clash; createid -s then whatis -t will find them).

robocop

Well, here’s my issue in a nutshell and the specific post which refers to my interest in the causes for a ‘hash collision’.

http://the-starport.net/modules/newbb/viewtopic.php?topic_id=2191&forum=11&post_id=24563#forumpost24563

? Offline

content.dll - 0x490a5 : formation errors, check faction_prop.ini formation values

Huor

I am stilled behind this crash at engbase.dll (+0x0124bd). I attached an debugger and the debugger always halt at this stack back trace:

 #   Memory  ChildEBP RetAddr  Args to Child  
00           00129544 066123db 00d1cec8 08e5c568 0012d5e4 EngBase+0x124bd
01        20 00129564 0661ae06 06612567 08e5c568 00c46318 EngBase+0x23db
02         4 00129568 06612567 08e5c568 00c46318 08e07aac EngBase+0xae06
03        a8 00129610 4fdf4e22 00002800 00000000 0b662220 EngBase+0x2567
04        14 00129624 4fd9c47d 0b662220 00000000 00000008 d3d9!CD3DDDIDX8::LockVB+0x32 (FPO: [2,0,0])
05        14 00129638 06d12996 0b662220 00006300 00000120 d3d9!CDriverVertexBuffer::Lock+0x4d (FPO: [5,0,4])
06      40a8 0012d6e0 7c9201db 77bfc3c9 00330000 00000000 RP8+0x12996
07       234 0012d914 77bfc3c9 00330000 00000000 77bfc3ce ntdll!RtlAllocateHeap+0xeac (FPO: [Non-Fpo])
08        40 0012d954 77bfc3e7 00000014 0012d970 77bf9cd4 msvcrt!_heap_alloc+0xe0 (FPO: [Non-Fpo])
09         c 0012d960 77bf9cd4 00000014 00000001 06be0038 msvcrt!_nh_malloc+0x13 (FPO: [2,0,0])
0a        10 0012d970 06b73f73 00000014 00000000 09061c28 msvcrt!operator new+0xf (FPO: [1,0,0])
0b        14 0012d984 06b71935 0012d9a8 090603c0 0012d9ac ReadFile+0x3f73
0c           00000000 00000000 00000000 00000000 00000000 ReadFile+0x1935

The ReadFile is as far as i could find out something with hudframe021004123005 << dunno maybe something from there…

Has anyone an idea what is done here? Has it something to do with graphical problems due to d3d9!CD3DDDIDX8??

–> that seems to be addressed when the server connection is lost

Correction: The stackback trace is:

 #   Memory  ChildEBP RetAddr  Args to Child              
WARNING: Stack unwind information not available. Following frames may be wrong.
00           00127434 0628a428 00d1cec8 0932d098 00000000 EngBase+0x124bd
01        18 0012744c 06293cce 0932d098 00000000 0932cff0 Common!PhySys::FindSphereCollisions+0x438
02        2c 00127478 0629ed87 00127464 77c05c94 001274f0 Common!CNonPhysAttachment::Disconnect+0x26e
03        1c 00127494 77bfc2e3 0932cff0 0629ed87 00000000 Common!CExternalEquip::IsConnected+0x7
04        70 00127504 0629b344 00000001 093292bc 093291d8 msvcrt!free+0xc8 (FPO: [Non-Fpo])
05        20 00127524 062a922e 093293a8 093291d8 09329548 Common!CEquipManager::Clear+0x94
06        34 00127558 062b0c21 00000000 093291d8 093233d4 Common!CEqObj::~CEqObj+0x9e
07        2c 00127584 06288222 00000000 00000000 062af65b Common!CShip::~CShip+0x271
08         c 00127590 062af65b 00000001 09323344 00539599 Common!BaseWatcher::~BaseWatcher+0x52
09           00000000 00000000 00000000 00000000 00000000 Common!CObject::Release+0x1b

Still i have no clue what happens at the engbase.dll is there a way to find out what routines are working there? The dissambly looks like:

crash offset is this line:

066224bd 8b4010 mov eax,dword ptr [eax+10h] ds:0023:09054e20=00000000

066224a9 90              nop
066224aa 90              nop
066224ab 90              nop
066224ac 90              nop
066224ad 90              nop
066224ae 90              nop
066224af 90              nop
066224b0 8b442408        mov     eax,dword ptr [esp+8]
066224b4 83f8ff          cmp     eax,0FFFFFFFFh
066224b7 740b            je      EngBase+0x124c4 (066224c4)
066224b9 85c0            test    eax,eax
066224bb 7407            je      EngBase+0x124c4 (066224c4)
066224bd 8b4010          mov     eax,dword ptr [eax+10h] ds:0023:09054e20=00000000
066224c0 85c0            test    eax,eax
066224c2 7503            jne     EngBase+0x124c7 (066224c7)
066224c4 83c8ff          or      eax,0FFFFFFFFh
066224c7 c20800          ret     8
066224ca 90              nop
066224cb 90              nop
066224cc 90              nop
066224cd 90              nop
066224ce 90              nop
066224cf 90              nop

Any hints would be appreciated. I am bad in understanding this assembler stuff ;(

Chips

Are these servers running mods at all? Some of these, if they are mods, sound like simple testing should have picked them up and been recognised…

I know it’s a non contributory post, just this whole thread seemed an interesting concept and made me immediately suspect it’d replace testing and experience (not trying to offend folks )

adoxa

@Huor: FindSphereCollisions could suggest a sur problem. Then again, looking at it, it looks like the warning is right, so I wouldn’t put too much credence on the trace. It’s a really strange error, since it already tests for ERROR and NULL, so eax appears legitimate, but is not. Furthermore, it looks like it’s telling you what is there, so it really is legitimate, so where’s the error coming from? Or is this just from a breakpoint, not a crash? Is there something I can test myself, or a remote connection?

Huor

I am not an server operator - just someone who might understand a bit of that coding stuff - but not at the level of Adoxa ;D

The stack back trace is made from a breakpoint and i have set the breakpoint to the offset where the server is causing crashes (engbase.dll + 0x0124bd) that we are hunting now for some weeks. I overstepped the breakpoint several times but the stack trace was looking always and nearly the same. So i assume that when it really crashes it must be one of these calling routines that may lead to the crash. And i tested it only client sided - the crash happens at the flserver - so it may be wrong what i wrote anyway.

We tried several stuff and it seems this crash offset is the only one remaining. We are using vanilla surs on the server for some weeks so normally that should not be related to it. Spheres are used for several stuff - so could it also have something to do with NPCs crashing into planet or something like this? As we did disable NPCs for some time the error wasnt there. So its really annoying to dont find the reason for this crash.

adoxa

After tracing it myself, it appears to be related to cmp reading - it seems to return the parent object. It’s been called from GetRoot, Hierarchy::GetDepth and CEGun::ComputeTurretFrame. It appears there’s either something wrong with your cmp file, or with something that uses it. I’m afraid I can’t be more specific, without knowing where it’s actually crashing. If you look at [eax+0x0C], that should point you to which object is going wrong. For example, my current breakpoint has EAX = 0xA478770, [0xA47877C] is 0x9FF2550; [0x9FF2554] is 0x9FF25A1, a pointer to “equipment\models\weapons\li_laser_beam.cmp”. [eax+0x08] is similar, pointing to the particular .3db within the .cmp.

adoxa

Here’s a plugin to log what’s happening with engbase at 0x124bd. Add it to dacomsrv.ini and you’ll get EXE\EngBase-0124BD-YYYY-MM-DD.hhmmss.txt (the time when the server was started). Since there’s a lot of data, I reset it every 100 calls, so there’s a slight possibility the crash will occur with no context. I also try another test for a bad address (thus preventing the crash); if it occurs, the file is renamed as *-bad_N_.txt (at least, I hope it is, didn’t actually test it).

? Offline

Now we have sometimes 000c45a2 in content.dll - something wrong with npc, but what?

adoxa

That’s a really strange address for a crash - cmp dword[ecx+34], 1 when there’s mov [ecx+2c], eax a few instructions earlier.

? Offline

adoxa wrote:
That’s a really strange address for a crash - cmp dword[ecx+34], 1 when there’s mov [ecx+2c], eax a few instructions earlier.

i use http://the-starport.net/freelancer/forum/viewtopic.php?post_id=31645#forumpost31645 patch but think it is of wrong encounter parameters

adoxa

Ah, that explains it. I did a better patch in an IM: 0C457F, 9981E2FF->7411EB05. Don’t forgot to undo the other one.

? Offline

adoxa wrote:
Ah, that explains it. I did a better patch in an IM: 0C457F, 9981E2FF->7411EB05. Don’t forgot to undo the other one.

Undo #34 and apply this?