target: Drop lun_sep_lock for se_lun->lun_se_dev RCU usage

With se_port and t10_alua_tg_pt_gp_member being absored into se_lun,
there is no need for an extra lock to protect se_lun->lun_se_dev
assignment.

This patch also converts backend drivers to use call_rcu() release
to allow any se_device readers to complete.  The call_rcu() instead
of kfree_rcu() is required here because se_device is embedded into
the backend driver specific structure.

Also, convert se_lun->lun_stats to use atomic_long_t within the
target_complete_ok_work() completion callback, and add FIXME for
transport_lookup_tmr_lun() with se_lun->lun_ref.

Finally, update sbp_update_unit_directory() special case usage with
proper rcu_dereference_raw() and configfs symlink comment.

Reported-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Chris Boot <bootc@bootc.net>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
diff --git a/drivers/target/target_core_alua.c b/drivers/target/target_core_alua.c
index 1109c28..0a00873 100644
--- a/drivers/target/target_core_alua.c
+++ b/drivers/target/target_core_alua.c
@@ -1935,7 +1935,11 @@
 	size_t count)
 {
 	struct se_portal_group *tpg = lun->lun_tpg;
-	struct se_device *dev = lun->lun_se_dev;
+	/*
+	 * rcu_dereference_raw protected by se_lun->lun_group symlink
+	 * reference to se_device->dev_group.
+	 */
+	struct se_device *dev = rcu_dereference_raw(lun->lun_se_dev);
 	struct t10_alua_tg_pt_gp *tg_pt_gp = NULL, *tg_pt_gp_new = NULL;
 	unsigned char buf[TG_PT_GROUP_NAME_BUF];
 	int move = 0;
@@ -2189,7 +2193,11 @@
 	const char *page,
 	size_t count)
 {
-	struct se_device *dev = lun->lun_se_dev;
+	/*
+	 * rcu_dereference_raw protected by se_lun->lun_group symlink
+	 * reference to se_device->dev_group.
+	 */
+	struct se_device *dev = rcu_dereference_raw(lun->lun_se_dev);
 	unsigned long tmp;
 	int ret;