target: Drop lun_sep_lock for se_lun->lun_se_dev RCU usage

With se_port and t10_alua_tg_pt_gp_member being absored into se_lun,
there is no need for an extra lock to protect se_lun->lun_se_dev
assignment.

This patch also converts backend drivers to use call_rcu() release
to allow any se_device readers to complete.  The call_rcu() instead
of kfree_rcu() is required here because se_device is embedded into
the backend driver specific structure.

Also, convert se_lun->lun_stats to use atomic_long_t within the
target_complete_ok_work() completion callback, and add FIXME for
transport_lookup_tmr_lun() with se_lun->lun_ref.

Finally, update sbp_update_unit_directory() special case usage with
proper rcu_dereference_raw() and configfs symlink comment.

Reported-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Chris Boot <bootc@bootc.net>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
diff --git a/drivers/target/target_core_pscsi.c b/drivers/target/target_core_pscsi.c
index afb87a8..c673980 100644
--- a/drivers/target/target_core_pscsi.c
+++ b/drivers/target/target_core_pscsi.c
@@ -579,6 +579,14 @@
 	return -ENODEV;
 }
 
+static void pscsi_dev_call_rcu(struct rcu_head *p)
+{
+	struct se_device *dev = container_of(p, struct se_device, rcu_head);
+	struct pscsi_dev_virt *pdv = PSCSI_DEV(dev);
+
+	kfree(pdv);
+}
+
 static void pscsi_free_device(struct se_device *dev)
 {
 	struct pscsi_dev_virt *pdv = PSCSI_DEV(dev);
@@ -610,8 +618,7 @@
 
 		pdv->pdv_sd = NULL;
 	}
-
-	kfree(pdv);
+	call_rcu(&dev->rcu_head, pscsi_dev_call_rcu);
 }
 
 static void pscsi_transport_complete(struct se_cmd *cmd, struct scatterlist *sg,