Crash Offsets
-
Well, the hotfix didn’t work and neither did a reinstall with WinXPsp2 only.
Still crashing in dalib.dll at 00004353I’m willing to kill for a solution…
Also, during the freeze/crashes, procmon shows buffer overflows in csrss.exe and lsass.exe as well as a bunch of dxdiag stuff flooding the monitor.
That help at all?
-
I might have a fix for this problem: “FLServer locks up every 10-20 minutes, mostly commonly after about 18 minutes. It tends to lock up more often if flhook event socket stuff is happening. I have both DSPM and DAM connected to flhook doing unicode event socket traffic. DirectPlay traffic still works but no game events are passed.”
The lock up is caused by the thread CRemotePhysicsSimulation+0x2660 ceasing to do stuff. I think this is the main FLServer loop. FLhook event processing is called from it and so if this thread stops, the flhook socket stuff will stop too.
Reviewing one of the stack traces I suspect that problem might be related to a QueryPerformanceTimer call. On futher investigation it might be possible for this to return an invalid time potentially causing FLServer to wait a very long time before doing processing. I haven’t actually proven that this is happening but it seems possible particularly on multi-core processors and/or running virtual machines.
Adding /usepmtimer to your boot.ini as described in this article might workaround this problem. See See http://support.microsoft.com/kb/895980 for instructions.
I’ve applied this change and so far, I’m on 1 hour 30 minutes of no crashes. I’ll edit this post if it really does fix it or not.
-
Still crashing in dalib.dll at 00004353
According to the code, this crash apparently happens when the server starts hosting. It is in the function call CDPServer::GetHostAddresses()
I might recall having this crash on GC ages ago. I believe it was related to a corrupt character file, causing a stack corruption which some how ended up in this function. I’m sorry, I’m not more certain that this.
-
I’m not sure your first suggestion about the boot.ini applies to us as we don’t use FLHook.
The second interests me but, I know each player character has been ‘cleaned’ by FL PlayerCleaner on more than one occasion, and FLAC is not identifying any attempts to log in with corrupt player chars.
So, short of doing a player wipe, how do you find the corrupt player char(s) if that’s indeed the problem?
-
Robo, don’t think FLAC is anything but FLHook with modifications. It most likely has the same bugs that Hook does.
-
As clarified previously, FLAC didn’t form from FLHook. They both use a similar approach because that’s the approach that’s needed but the sources are significantly different enough for them to be completely separate of each other.
The crash offset that is being raised seems not to be being triggered by FLAC and likely wouldn’t be by FLHook either as the errors appear to be triggered by FL itself. Whether further back along the line there’s an issue with the hooking functionality that’s causing the circumstance I can’t tell so far and there’s little to suggest that that is the case.
Unfortunately I’ve virtually exhausted options from my own investigations so far (prior to the post) but will continue to dig and will post if there’s anything new I figure out.
-Alpha
-
That MS boot.ini workaround refers specifically to servers using AMD procs. We have an Intel proc. Would running that workaround cause any issues or help at all?
going through player files as we speak. Anyone know of any tools that can identify a corrupted player file? I’m using FL PLayer Cleaner to clean everything up, but I’ve been using it for months and still have probs. Willing to try another tool.
-
DSAM does a good job, but I don’t know how dependent it is on FLHook to be able to do that.
-
Not only for Amd
Robocop try /ONECPU
Enable in control panel - where power setting is - all power settings t “on”
Disable in BIOS power management support for CPU
This named Time Drift Bug
In Linux PM-Timer Bug
-
HeIIoween wrote:
Not only for AmdRobocop try /ONECPU
Enable in control panel - where power setting is - all power settings t “on”
Disable in BIOS power management support for CPU
This named Time Drift Bug
In Linux PM-Timer Bug
I’ve implemented the boot.ini workaround, we’ll see what that does for us.
@Helloween, for power options I have set ‘Always On’. Is that what you mean?
What is /ONECPU?
Next reboot I’ll disable BIOS power management support for CPU.
R
-
Well, daily misery report again…
implemented that /usepmtimer suggestion in the boot.ini file, no joy. 18 minutes after reboot got the triple .\HookFunction.cpp(887): *** ERROR: Exception in Hook_IServerImpl_TradeResponse (unhandled exception) message and everyone online got booted.So, the fact that it happens within 18 minutes of a reboot indicates to me that it’s not likely a memory leak, especially when the server often runs fine for hours. It’s after 1630 server time so, this is going to go on now every 15-20 minutes or so as long as two or more players are online. At least until midnight anyway…
AlphaWolf seems to think that traderesponse message is related to NPCs scanning something/someone but I don’t know. Nothing in that regard has been changed and this behavior happens whether the mod is activated or not…
-
/ONECPU - use one CPU on multiprocessor system
http://en.wikipedia.org/wiki/NTLDR -
I still have one problem - flserver frozens with window whitening and flhook window is fine but ofcourse without any response to commands.
All the same - no logs, no events, nothing.
Because all my server are on physical machine i dunno how it will be after restart with /usepmtimer key in boot.ini
Also will try to disable apic/acpi functions in bios - google says must help.
-
usepmtimer and /onecpu did not help my problem.
Still crashing in dalib.dll.
What causes hash collisions?
-
Well, here’s my issue in a nutshell and the specific post which refers to my interest in the causes for a ‘hash collision’.
-
content.dll - 0x490a5 : formation errors, check faction_prop.ini formation values
-
I am stilled behind this crash at engbase.dll (+0x0124bd). I attached an debugger and the debugger always halt at this stack back trace:
# Memory ChildEBP RetAddr Args to Child 00 00129544 066123db 00d1cec8 08e5c568 0012d5e4 EngBase+0x124bd 01 20 00129564 0661ae06 06612567 08e5c568 00c46318 EngBase+0x23db 02 4 00129568 06612567 08e5c568 00c46318 08e07aac EngBase+0xae06 03 a8 00129610 4fdf4e22 00002800 00000000 0b662220 EngBase+0x2567 04 14 00129624 4fd9c47d 0b662220 00000000 00000008 d3d9!CD3DDDIDX8::LockVB+0x32 (FPO: [2,0,0]) 05 14 00129638 06d12996 0b662220 00006300 00000120 d3d9!CDriverVertexBuffer::Lock+0x4d (FPO: [5,0,4]) 06 40a8 0012d6e0 7c9201db 77bfc3c9 00330000 00000000 RP8+0x12996 07 234 0012d914 77bfc3c9 00330000 00000000 77bfc3ce ntdll!RtlAllocateHeap+0xeac (FPO: [Non-Fpo]) 08 40 0012d954 77bfc3e7 00000014 0012d970 77bf9cd4 msvcrt!_heap_alloc+0xe0 (FPO: [Non-Fpo]) 09 c 0012d960 77bf9cd4 00000014 00000001 06be0038 msvcrt!_nh_malloc+0x13 (FPO: [2,0,0]) 0a 10 0012d970 06b73f73 00000014 00000000 09061c28 msvcrt!operator new+0xf (FPO: [1,0,0]) 0b 14 0012d984 06b71935 0012d9a8 090603c0 0012d9ac ReadFile+0x3f73 0c 00000000 00000000 00000000 00000000 00000000 ReadFile+0x1935
The ReadFile is as far as i could find out something with hudframe021004123005 << dunno maybe something from there…
Has anyone an idea what is done here? Has it something to do with graphical problems due to d3d9!CD3DDDIDX8??
–> that seems to be addressed when the server connection is lost
Correction: The stackback trace is:
# Memory ChildEBP RetAddr Args to Child WARNING: Stack unwind information not available. Following frames may be wrong. 00 00127434 0628a428 00d1cec8 0932d098 00000000 EngBase+0x124bd 01 18 0012744c 06293cce 0932d098 00000000 0932cff0 Common!PhySys::FindSphereCollisions+0x438 02 2c 00127478 0629ed87 00127464 77c05c94 001274f0 Common!CNonPhysAttachment::Disconnect+0x26e 03 1c 00127494 77bfc2e3 0932cff0 0629ed87 00000000 Common!CExternalEquip::IsConnected+0x7 04 70 00127504 0629b344 00000001 093292bc 093291d8 msvcrt!free+0xc8 (FPO: [Non-Fpo]) 05 20 00127524 062a922e 093293a8 093291d8 09329548 Common!CEquipManager::Clear+0x94 06 34 00127558 062b0c21 00000000 093291d8 093233d4 Common!CEqObj::~CEqObj+0x9e 07 2c 00127584 06288222 00000000 00000000 062af65b Common!CShip::~CShip+0x271 08 c 00127590 062af65b 00000001 09323344 00539599 Common!BaseWatcher::~BaseWatcher+0x52 09 00000000 00000000 00000000 00000000 00000000 Common!CObject::Release+0x1b
Still i have no clue what happens at the engbase.dll is there a way to find out what routines are working there? The dissambly looks like:
crash offset is this line:
066224bd 8b4010 mov eax,dword ptr [eax+10h] ds:0023:09054e20=00000000
066224a9 90 nop 066224aa 90 nop 066224ab 90 nop 066224ac 90 nop 066224ad 90 nop 066224ae 90 nop 066224af 90 nop 066224b0 8b442408 mov eax,dword ptr [esp+8] 066224b4 83f8ff cmp eax,0FFFFFFFFh 066224b7 740b je EngBase+0x124c4 (066224c4) 066224b9 85c0 test eax,eax 066224bb 7407 je EngBase+0x124c4 (066224c4) 066224bd 8b4010 mov eax,dword ptr [eax+10h] ds:0023:09054e20=00000000 066224c0 85c0 test eax,eax 066224c2 7503 jne EngBase+0x124c7 (066224c7) 066224c4 83c8ff or eax,0FFFFFFFFh 066224c7 c20800 ret 8 066224ca 90 nop 066224cb 90 nop 066224cc 90 nop 066224cd 90 nop 066224ce 90 nop 066224cf 90 nop
Any hints would be appreciated. I am bad in understanding this assembler stuff ;(
-
Are these servers running mods at all? Some of these, if they are mods, sound like simple testing should have picked them up and been recognised…
I know it’s a non contributory post, just this whole thread seemed an interesting concept and made me immediately suspect it’d replace testing and experience (not trying to offend folks )