没有更多的隐私:2.02亿私人简历暴露(一月份最大的数据泄露)

2019-02-02 22:04:58 1 1955

BJ.58.com的安全团队未确认数据来自其源头:我们搜遍了我们的数据库并调查了所有其他存储,结果发现样本数据并不是从我们这泄露的。

12月28日,Hacken.io网络风险研究总监兼诈骗平台HackenProof的Bob Diachenko分析了BinaryEdge搜索引擎的数据流,并确定了一个开放且不受保护的MongoDB实例:



Shodan搜索结果中也出现了相同的IP :



经过仔细检查,一个854 GB大小的MongoDB数据库无人看管,无需密码/登录验证,无需查看和访问中国求职者超过2亿份非常详细的简历。

202,730,434条记录中的每条记录不仅包含候选人的技能和工作经验,还包括他们的个人信息,如手机号码,电子邮件,婚姻,子女,政治,身高,体重,驾驶执照,识字水平,工资等。期望等等。

直到我的一个Twitter粉丝指向GitHub存储库(页面不再可用但仍保存在Google缓存中)之前,数据的来源仍然未知,其中包含的Web应用程序源代码具有与暴露中使用的结构模式相同的结构模式简历:







名为“data-import”的工具(3年前创建)似乎是为了从不同的中文分类中获取数据(简历)而创建的,例如bj.58.com和其他。



目前尚不清楚它是用于收集所有申请人详细信息的官方申请还是非法申请,甚至是那些被标记为“私人”的申请。

根据其他请求,BJ.58.com的安全团队未确认数据来自其源头:

我们搜遍了我们的数据库并调查了所有其他存储,结果发现样本数据并不是从我们这泄露的。

似乎数据是从第三方泄露的,这些第三方从许多CV网站上抓取数据。


在我在Twitter上发布通知后不久,该数据库已得到保护。值得注意的是,MongoDB日志显示至少有十几个可能在脱机之前访问过数据的IP。

截至本文发布之日,数据所有者尚无正式确认。

------------------------------------------English Ver------------------------------------------------

No more privacy: 202 Million private resumes exposed

On December 28th, Bob Diachenko, Director of Cyber Risk Research at Hacken.io and bug bounty platform HackenProof, analyzed the data stream of BinaryEdge search engine and identified an open and unprotected MongoDB instance:

PIC1

The same IP also appeared in Shodan search results:

PIC2

Upon closer inspection, an 854 GB sized MongoDB database was left unattended, with no password/login authentication needed to view and access the details of what appeared to be more than 200 million very detailed resumes of Chinese job seekers.

Each of the 202,730,434 records contained the details not only on the candidates’ skills and work experience but also on their personal info, such as mobile phone number, email, marriage, children, politics, height, weight, driver license, literacy level, salary expectations and more.

See more details in the PDF factsheet

The origin of the data remained unknown until one of my Twitter followers pointed to a GitHub repository (page is no longer available but it is still saved in Google cache)  which contained a web app source code with identical structural patterns as those used in the exposed resumes:

git

git2

git3

The tool named “data-import” (created 3 years ago) seems to have been created to scrape data (resumes) from different Chinese classifieds, like bj.58.com and others.

PIC3

It is unknown, whether it was an official application or illegal one used to collect all the applicants’ details, even those labeled as ‘private’.

Upon additional request, the security team of BJ.58.com did not confirm that the data originated from their source:

We have searched all over the database of us and investigated all the other storage, turned out that the sample data is not leaked from us.

It seems that the data is leaked from a third party who scrape data from many CV websites.

Shortly after my notification on Twitter, the database had been secured. It’s worth noting that MongoDB log showed at least a dozen IPs who might have accessed the data before it was taken offline.

As of the date of this publication, there is no official confirmation on the data owner.

关于作者

godblack468篇文章1165篇回复T00ls认证专家。

一个高尚的人,一个纯粹的人,一个有道德的人,一个脱离了低级趣味的人,一个有益于人民的人。

评论1次

要评论?请先  登录  或  注册