目录
一、连续登陆
1.1 连续登陆3天以上的用户
0 问题描述
查询连续登陆3天以上的用户(字节面试题)
1 数据准备
create table if not exists table1 (id int comment '用户id', `date` string comment'用户登录时间');
insert overwrite table table1 values
(1,'2019-01-01 19:28:00'),
(1,'2019-01-02 19:53:00'),
(1,'2019-01-03 22:00:00'),
(1,'2019-01-05 20:55:00'),
(1,'2019-01-06 21:58:00'),
(2,'2019-02-01 19:25:00'),
(2,'2019-02-02 21:00:00'),
(2,'2019-02-04 22:05:00'),
(2,'2019-02-05 20:59:00'),
(2,'2019-02-06 19:05:00'),
(3,'2019-03-04 21:05:00'),
(3,'2019-03-05 19:10:00'),
(3,'2019-03-06 19:55:00'),
(3,'2019-03-07 21:05:00');
2 数据分析
select
distinct id
from (select
id,
diff
from (
select
id,
date_sub(dt, row_number()over (partition by id order by dt)) diff
from ( --- 同一个用户一天可能登陆多次,所以,先去重
select
id,
date_format(`date`,'yyyy-MM-dd') as dt
from table1
-- current_date() 获取当前的年月日
where date_format(`date`,'yyyy-MM-dd') between date_sub(current_date(),7) and current_date()
group by id, date_format(`date`,'yyyy-MM-dd')
) tmp1
) tmp2
group by id, diff
having count(1) >= 3) tmp3;
3 小结
“连续登陆”的解题核心:分组排序,用时间减去排序,如果连续的话他们的差会是相同值
(1)对日期排序: row_number() over (partition by user_id oder by login_date)
(2)求日期和排序的差值diff:date_sub(login_date,row_number() over (partition by user_id oder by login_date)) as diff;
(3)对用户及差值diff分组:group by user_id,diff ;
(4)where count(1) >= 3的用户 user_id 就是连续登陆3天及以上的用户
1.2 每个用户历史至今连续登录的最大天数
0 问题描述
查询每个用户历史至今连续登录的最大天数
1 数据准备
cre