Low Voltage, Intermittent Communication, No Regeneration - $1,200 EGR Valve?
A case study of diagnostic strategy and working with what you have. Doesn't make the make, model, or product line (on road or off).
John Deere 244K-II (FT4) - 1LU244KTVZB046717 w/ Yanmar 4TNV98CT engine.
Before we delve into the laundry list of codes, let me give you a little info here. 244K-II is a machine made for John Deere by Liebherr and equipped with a Yanmar engine. So we have a few different languages going on here; it can make it interesting at times.
Codes & Brief Description (the description is pretty much all I get in the service information - code set parameter documentation is almost non-existent):
- P068A - ECU Main Relay Early Opening (main relay is internal to the ECU) (*turning battery disconnect off with machine running will cause this)
- P1424 - DPF OP Interface Back-Up Mode
- P1421 - Stationary Regeneration Standby
- P2459 - Regen Defect (Stationary Regen Not Performed)
- U1303 - Y_DPFIF CAN Message Reception Timeout
- … - CCU Improper Shutdown
- … - CAN 1 Abnormal Update Rate
- … - CAN 2 Abnormal Update Rate
- … - CCU Operating Hours Not Saved
- … - Battery Voltage Low - Engine Running
- … - CCU EEPROM Error
This case study starts out, as most of mine do, with a phone call from a road technician. Customer had called in requested a forced service regeneration because machine had de-rated and was essentially useless for loading salt. Fairly common occurrence for operators to short trip machines like this or inhibit regeneration due to lack of understanding of how to properly run the machine. Technician hooked up Service Advisor to the machine and was getting ready to start regeneration when he noticed a laundry list of codes popping up, a red STOP Engine light, and an incredibly annoying beep (all designed to force an operator to stop, sadly it's rarely enough).
JD is pretty convenient with certain machines, they have a test called DTC Priority test. It takes a list of codes such as this one and cross-references them. It then gives you a code to begin with and filters down so that your focused on direct faults instead of a fault code caused by another fault code. This machine was not one of them, and you'll notice that they aren't labeled in the DTC list, that means manual look-up. I recognized the … (battery voltage low - engine running), so I had the tech check out the belt drive, charging system, and battery while I pulled up the descriptions. The engine load profile was also brought up at this time as the initial complaint was regeneration related.
As you can see this machine has spent the vast majority of it's life idling with little to no load on it. That is a big no-no on these new engines with DOC/DPF and SCR (selective catalytic reduction). I get a call back from the tech at this point, alternator drive belt was loose and battery voltage was down to 11.5V with the engine running. He tightened the belt, cleared the codes (we both agreed that codes relating directly or indirectly to low voltage would likely not re-occur). Low voltage codes would be our … (low battery voltage), this caused our P068A (main relay opening early due to low voltage) which in turn caused our … (improper shutdown as it has a cycle down time after key off to store all its data), … (EEPROM error) and … (CCU operating hours not saved). The CCU (chassis control unit) is fed 5V from the ECU and remains powered up after the key is switched off to save data. If it loses power (due to the ECU losing power) it will throw all sorts of codes.
After charging the battery up I had the tech perform a Tuple Error Correction test (internal computer memory correction essentially), reprogram of the CCU with the updated Liebherr software, and finally a Chassis Configuration Test (this ensures that the CCU is operational and correctly configured for the specific machine). The only codes that re-occurred were our regeneration defect codes, our U1303, and our CAN abnormal update rate codes. Watching some live data recordings from the tech didn't bring anything to my attention, so I packed up some equipment and headed out to meet.
I knew from the theory of operation that the exhaust after-treatment system is on it's own leg of the CANbus (CAN 1). I hadn't seen any data to suggest that any components weren't working so I pulled a schematic and found CAN 1 only runs to the EGR valve. The EGR valve has power, ground, CAN H, CAN L wires. Nothing else on that leg. Bi-directional control of the EGR valve was attempted but unsuccessful (I believed at the time that this was due to the regen codes locking out some of the controls as a failsafe, because EGR is commanded fully closed to perform regeneration). I was able to initiate a service regeneration and this gave me a solid 2 hours to think about my direction and the system. I decided that we would attempt bi-directional controls again (they failed).
Seeing no other real direction, and wanting to get rid of communication errors before chasing anything else, I brought out the scope. Watching the CAN data, back-probed at the EGR valve, I didn't see anything out of the ordinary. I attempted bi-directional control once more while monitoring the CAN data and I saw this waveform. I am no network communication expert but I didn't like how it looked, and I also didn't like how I had no control over the valve. So I unplugged the valve and immediately the signal went back to a "standard" CAN signal. Plugging the valve back in caused the signal to continue to degrade. I unplugged the valve once again after this and continued on.
I cleared all codes and the only one that re-occurred was U1303 - CAN message timeout. Feeling 90% confident I next-day aired a new EGR valve and gaskets in. Made the repair and scoped the signal again. It was exactly what I expected to see.
So a confirmed issue where a CANbus EGR valve was able to scramble the network enough to cause communication errors. Definitely a first for me. Those more adept at interpreting network waveforms may be able to confirm or deny how much difference that network signal has. I know I replaced the EGR valve, loaded salt spreaders with the machine for about an hour and had no fault re-occurrences. I now also had bi-directional control of the EGR valve, as well as proper computer controlled operation while running.
Final verification was putting the machine into regeneration from the operator's seat, it went in and passed perfectly. Full repair verification complete.
Lesson here is that even when faced with a scary amount of codes there is no need to get nervous. In fact it is just more data pointing you to the right direction. The more strange the failure is, the easier it can be to track down. How many more variables are there to a simple low power complaint with no codes compared to a list of 13 codes all pointing in a general direction? By studying theory of operation and using logic we were able to knock out a lot of codes just by grouping them and tightening a loose belt. Then we were able to track down to one specific component based off a schematic. Using some network and tooling knowledge we were able to get to a relatively confident diagnosis and then confirm as best as we could given the lack of concrete SI.
I promise one day I will post a nice short 1 paragraph tech tip. Thanks to everyone who trudges through these write-ups and I hope they are at useful in some way, shape or form.
Thanks for the write up!
Hey Chris great write up. I would rather , as you said, "trudge through" a long write up with lots of detail than read a short write up lacking pertinent information. Thanks for taking the time.
I appreciate that. I've written them up for years for my own reference. I'm happy to be able to contribute them for others use.
Chris, I wish the Verus had better resolution... but the first waveform looks like the other modules are trying to transmit the so-called "error frame" upon detecting a garbled message previously. Sort of like CAN bus SOS. If you connected headphones to the bus, you would hear ABBA playing. Umm, I'll see myself out.
I agree that the verus isn't the best, but it does serve the purpose for my HD stuff. The time base for both is 1ms per screen. I've never been a fan of trying of only being able to zoom out on Snap-On scope captures. A Pico would have been infinitely easier to get good captures with. One day... I think the the main problem was how the signal would stay high or low without a full transition. I
Excellent observation, Chris -- there are lots of long pulses in the "bad waveform". However, it seems to be intentional: "every bit stream of more than 5 bits of the same polarity, dominant or recessive, is considered an error condition. As a matter of fact, CAN uses this rule to send an error frame, which contains of (minimum) 6 consecutive dominant bits. Each node in the network will
I'll read up on that more. Thanks Dmitriy. I'll have to think about the variables here some more as they have been bothering me a bit. One way to confirm would be to log that CAN message and then inject it back into the comm network and look for what results occur. Message priority might be a concern but it will flag eventually I would imagine. I've only played with CANbus data logging on my
Chris, I think the periods of time "without" full transitions is more indicative of a failing transceiver. It seems to me that the error frame transmissions should still modulate at the normal voltage levels. If anyone knows that to be incorrect please reply.
That's kind of where I was leaning Bob, that the problem was in the lack of full transition and not so much the packets themselves. I hope I phrased that correctly. I'm doing some reading into transceiver architecture and failure modes now. It makes sense with transceiver design (just looking at generic architecture),that one failure mode would be lack of full transition. This was a warranty
Hey, no fair, you've replaced a "known bad" waveform with a different one! The old one was not nearly as dysfunctional as the new one -- here the transitions are indeed screwed up. Could you re-upload the old one for comparison purposes?
My apologies. I was attempting to have both known bad up there. And.....it's fixed. Scope files uploaded as well. There is a brief glitch in the new EGR valve capture where the signal gets a little wonky, but clears out right away and didn't return (monitored the CANbus on and off while it ran through regen). I did cycle the key after this capture, so it was very likely that everything was still
Thanks, Chris, I will take a look soon. You've given us lots of food for thought lately!
Hi Chris, Always enjoy the HD write ups, thank you
Long as its helpful to at least one person I'll post up case studies. I've got notebooks full of them.
Hi Chris: An excellent example of not succumbing to paralysis by analysis. Your …e study illustrates what many techs are exposed to. Unlike you, they end up like this: youtube.com/watch?v=bs-Q0J… I was taught that that there is a hierarchy to diagnosing multiple codes: 1: Voltage Issues - a high side or low side issue can also cause communication, component or performance
Anthony, I believe so. We have Deere, Liebherr, and Yanmar all thrown together on these machines so alot of conflicting or missing service information. Generally I have found that codes with a 0 count are caused by something else. It's almost as if the fault detection strategy is telling you to treat them as secondary codes, which in this case they were. It may be that they are "soft" codes
Thank you, Chris! That is a … read. I even learned a new word- tuple. Sounds interesting. ;)
Glad you enjoyed it Marlin. I try to keep a solid thought process and game plan for every diagnosis (doesn't always happen). I know I learn all sorts of things by reading other people's diagnostic strategies. I find that my overall organizational process doesn't really change at all between automotive/equipment. Only differences are which tests I might choose based on ease of access. I have
Hi Chris: "I find that my overall organizational process doesn't really change at all between automotive/equipment. Only differences are which tests I might choose based on ease of access." That will be one of the most important observations that you will ever make. Guido
So the new word you learned "tuple"... Can you let us in on it's meaning?
Roger, Tuple is a computer term that takes on rather specific meanings depending on its context but for here the generic definition should be sufficient. When your are using a relational database (think of it as a spreadsheet of computer logic and sensor values, for example;, IF key is in crank [value 1] THEN send 12v on starter solenoid excite wire [relational value 6] these are made up
Hi Chris: Dmitiry is correct. I first observed it with 8 different GM vehicles and a Grand Caravan. On the GM's the complaint was the airbag lamp was illuminated. The Chrysler was a no start. What they had in common was the VIN had been erased in all of them. How it happened was the battery in each was so low, when attempting to start the vehicle using a jump pack, it got stupid (technical
Guido, I'm always happy to have your input, you always seem to be able to pull out the data that I just couldn't quite reach for. Surprisingly I haven't found nearly as much info on programming language in off-road ECUs but I would imagine it is still C, C++,C+³¹⁸⁴⁹ , or whatever new variations will be coming. That makes your link all the more relevant. I believe that link will let Roger and
Hi Chris: No promises but I wonder if this may help when looking at an orphaned module. (Something that was awaiting me in my Inbox when I got home tonight.) searchsecurity.techtarget.com/news…HTH, Guido