Crash Offsets
-
Robo, don’t think FLAC is anything but FLHook with modifications. It most likely has the same bugs that Hook does.
-
As clarified previously, FLAC didn’t form from FLHook. They both use a similar approach because that’s the approach that’s needed but the sources are significantly different enough for them to be completely separate of each other.
The crash offset that is being raised seems not to be being triggered by FLAC and likely wouldn’t be by FLHook either as the errors appear to be triggered by FL itself. Whether further back along the line there’s an issue with the hooking functionality that’s causing the circumstance I can’t tell so far and there’s little to suggest that that is the case.
Unfortunately I’ve virtually exhausted options from my own investigations so far (prior to the post) but will continue to dig and will post if there’s anything new I figure out.
-Alpha
-
That MS boot.ini workaround refers specifically to servers using AMD procs. We have an Intel proc. Would running that workaround cause any issues or help at all?
going through player files as we speak. Anyone know of any tools that can identify a corrupted player file? I’m using FL PLayer Cleaner to clean everything up, but I’ve been using it for months and still have probs. Willing to try another tool.
-
DSAM does a good job, but I don’t know how dependent it is on FLHook to be able to do that.
-
Not only for Amd
Robocop try /ONECPU
Enable in control panel - where power setting is - all power settings t “on”
Disable in BIOS power management support for CPU
This named Time Drift Bug
In Linux PM-Timer Bug
-
HeIIoween wrote:
Not only for AmdRobocop try /ONECPU
Enable in control panel - where power setting is - all power settings t “on”
Disable in BIOS power management support for CPU
This named Time Drift Bug
In Linux PM-Timer Bug
I’ve implemented the boot.ini workaround, we’ll see what that does for us.
@Helloween, for power options I have set ‘Always On’. Is that what you mean?
What is /ONECPU?
Next reboot I’ll disable BIOS power management support for CPU.
R
-
Well, daily misery report again…
implemented that /usepmtimer suggestion in the boot.ini file, no joy. 18 minutes after reboot got the triple .\HookFunction.cpp(887): *** ERROR: Exception in Hook_IServerImpl_TradeResponse (unhandled exception) message and everyone online got booted.So, the fact that it happens within 18 minutes of a reboot indicates to me that it’s not likely a memory leak, especially when the server often runs fine for hours. It’s after 1630 server time so, this is going to go on now every 15-20 minutes or so as long as two or more players are online. At least until midnight anyway…
AlphaWolf seems to think that traderesponse message is related to NPCs scanning something/someone but I don’t know. Nothing in that regard has been changed and this behavior happens whether the mod is activated or not…
-
/ONECPU - use one CPU on multiprocessor system
http://en.wikipedia.org/wiki/NTLDR -
I still have one problem - flserver frozens with window whitening and flhook window is fine but ofcourse without any response to commands.
All the same - no logs, no events, nothing.
Because all my server are on physical machine i dunno how it will be after restart with /usepmtimer key in boot.ini
Also will try to disable apic/acpi functions in bios - google says must help.
-
usepmtimer and /onecpu did not help my problem.
Still crashing in dalib.dll.
What causes hash collisions?
-
Well, here’s my issue in a nutshell and the specific post which refers to my interest in the causes for a ‘hash collision’.
-
content.dll - 0x490a5 : formation errors, check faction_prop.ini formation values
-
I am stilled behind this crash at engbase.dll (+0x0124bd). I attached an debugger and the debugger always halt at this stack back trace:
# Memory ChildEBP RetAddr Args to Child 00 00129544 066123db 00d1cec8 08e5c568 0012d5e4 EngBase+0x124bd 01 20 00129564 0661ae06 06612567 08e5c568 00c46318 EngBase+0x23db 02 4 00129568 06612567 08e5c568 00c46318 08e07aac EngBase+0xae06 03 a8 00129610 4fdf4e22 00002800 00000000 0b662220 EngBase+0x2567 04 14 00129624 4fd9c47d 0b662220 00000000 00000008 d3d9!CD3DDDIDX8::LockVB+0x32 (FPO: [2,0,0]) 05 14 00129638 06d12996 0b662220 00006300 00000120 d3d9!CDriverVertexBuffer::Lock+0x4d (FPO: [5,0,4]) 06 40a8 0012d6e0 7c9201db 77bfc3c9 00330000 00000000 RP8+0x12996 07 234 0012d914 77bfc3c9 00330000 00000000 77bfc3ce ntdll!RtlAllocateHeap+0xeac (FPO: [Non-Fpo]) 08 40 0012d954 77bfc3e7 00000014 0012d970 77bf9cd4 msvcrt!_heap_alloc+0xe0 (FPO: [Non-Fpo]) 09 c 0012d960 77bf9cd4 00000014 00000001 06be0038 msvcrt!_nh_malloc+0x13 (FPO: [2,0,0]) 0a 10 0012d970 06b73f73 00000014 00000000 09061c28 msvcrt!operator new+0xf (FPO: [1,0,0]) 0b 14 0012d984 06b71935 0012d9a8 090603c0 0012d9ac ReadFile+0x3f73 0c 00000000 00000000 00000000 00000000 00000000 ReadFile+0x1935
The ReadFile is as far as i could find out something with hudframe021004123005 << dunno maybe something from there…
Has anyone an idea what is done here? Has it something to do with graphical problems due to d3d9!CD3DDDIDX8??
–> that seems to be addressed when the server connection is lost
Correction: The stackback trace is:
# Memory ChildEBP RetAddr Args to Child WARNING: Stack unwind information not available. Following frames may be wrong. 00 00127434 0628a428 00d1cec8 0932d098 00000000 EngBase+0x124bd 01 18 0012744c 06293cce 0932d098 00000000 0932cff0 Common!PhySys::FindSphereCollisions+0x438 02 2c 00127478 0629ed87 00127464 77c05c94 001274f0 Common!CNonPhysAttachment::Disconnect+0x26e 03 1c 00127494 77bfc2e3 0932cff0 0629ed87 00000000 Common!CExternalEquip::IsConnected+0x7 04 70 00127504 0629b344 00000001 093292bc 093291d8 msvcrt!free+0xc8 (FPO: [Non-Fpo]) 05 20 00127524 062a922e 093293a8 093291d8 09329548 Common!CEquipManager::Clear+0x94 06 34 00127558 062b0c21 00000000 093291d8 093233d4 Common!CEqObj::~CEqObj+0x9e 07 2c 00127584 06288222 00000000 00000000 062af65b Common!CShip::~CShip+0x271 08 c 00127590 062af65b 00000001 09323344 00539599 Common!BaseWatcher::~BaseWatcher+0x52 09 00000000 00000000 00000000 00000000 00000000 Common!CObject::Release+0x1b
Still i have no clue what happens at the engbase.dll is there a way to find out what routines are working there? The dissambly looks like:
crash offset is this line:
066224bd 8b4010 mov eax,dword ptr [eax+10h] ds:0023:09054e20=00000000
066224a9 90 nop 066224aa 90 nop 066224ab 90 nop 066224ac 90 nop 066224ad 90 nop 066224ae 90 nop 066224af 90 nop 066224b0 8b442408 mov eax,dword ptr [esp+8] 066224b4 83f8ff cmp eax,0FFFFFFFFh 066224b7 740b je EngBase+0x124c4 (066224c4) 066224b9 85c0 test eax,eax 066224bb 7407 je EngBase+0x124c4 (066224c4) 066224bd 8b4010 mov eax,dword ptr [eax+10h] ds:0023:09054e20=00000000 066224c0 85c0 test eax,eax 066224c2 7503 jne EngBase+0x124c7 (066224c7) 066224c4 83c8ff or eax,0FFFFFFFFh 066224c7 c20800 ret 8 066224ca 90 nop 066224cb 90 nop 066224cc 90 nop 066224cd 90 nop 066224ce 90 nop 066224cf 90 nop
Any hints would be appreciated. I am bad in understanding this assembler stuff ;(
-
Are these servers running mods at all? Some of these, if they are mods, sound like simple testing should have picked them up and been recognised…
I know it’s a non contributory post, just this whole thread seemed an interesting concept and made me immediately suspect it’d replace testing and experience (not trying to offend folks )
-
@Huor: FindSphereCollisions could suggest a sur problem. Then again, looking at it, it looks like the warning is right, so I wouldn’t put too much credence on the trace. It’s a really strange error, since it already tests for ERROR and NULL, so eax appears legitimate, but is not. Furthermore, it looks like it’s telling you what is there, so it really is legitimate, so where’s the error coming from? Or is this just from a breakpoint, not a crash? Is there something I can test myself, or a remote connection?
-
I am not an server operator - just someone who might understand a bit of that coding stuff - but not at the level of Adoxa ;D
The stack back trace is made from a breakpoint and i have set the breakpoint to the offset where the server is causing crashes (engbase.dll + 0x0124bd) that we are hunting now for some weeks. I overstepped the breakpoint several times but the stack trace was looking always and nearly the same. So i assume that when it really crashes it must be one of these calling routines that may lead to the crash. And i tested it only client sided - the crash happens at the flserver - so it may be wrong what i wrote anyway.
We tried several stuff and it seems this crash offset is the only one remaining. We are using vanilla surs on the server for some weeks so normally that should not be related to it. Spheres are used for several stuff - so could it also have something to do with NPCs crashing into planet or something like this? As we did disable NPCs for some time the error wasnt there. So its really annoying to dont find the reason for this crash.
-
After tracing it myself, it appears to be related to cmp reading - it seems to return the parent object. It’s been called from GetRoot, Hierarchy::GetDepth and CEGun::ComputeTurretFrame. It appears there’s either something wrong with your cmp file, or with something that uses it. I’m afraid I can’t be more specific, without knowing where it’s actually crashing. If you look at [eax+0x0C], that should point you to which object is going wrong. For example, my current breakpoint has EAX = 0xA478770, [0xA47877C] is 0x9FF2550; [0x9FF2554] is 0x9FF25A1, a pointer to “equipment\models\weapons\li_laser_beam.cmp”. [eax+0x08] is similar, pointing to the particular .3db within the .cmp.
-
Here’s a plugin to log what’s happening with engbase at 0x124bd. Add it to dacomsrv.ini and you’ll get EXE\EngBase-0124BD-YYYY-MM-DD.hhmmss.txt (the time when the server was started). Since there’s a lot of data, I reset it every 100 calls, so there’s a slight possibility the crash will occur with no context. I also try another test for a bad address (thus preventing the crash); if it occurs, the file is renamed as *-bad_N_.txt (at least, I hope it is, didn’t actually test it).