It may not necessary be data skew. The OP already mentioned that the data volume for this task is not much different with other tasks.
But OP is not clear if there is always ONE task per executor is much slower than the rest tasks due to the task deserializing much longer.
If this IS the case, that is most likely because of the time taken to ship the jars from the driver to the executors. You should only pay this cost once per spark context (assuming you are not adding more jars later on).
When you submit your spark jobs, how large is your jar file? A hundred Ks is much difference as hundred Ms.