项目

参加任务差不多一个月时间了，学习到非常多的东西，也有很多内容需要进一步深入学习沉淀，非常感谢老师和学长的指导。

10/02~10/08 接受任务修改两个前端页面学习java和spring
10/09~10/16 学习ansj 和 stanfordNLP
10/17~10/22 PageRank 关键词和摘要
10/23~10/29 整理问题库学习nltk
10/30~11/05 提取实体自定义词典时间火箭型号部位发射场

也不知道能不能通过考核，但我已经差不多尽力了。

生活学习

十一假期烤了月饼，自那之后生活一片混乱，作息有问题，亚健康。改用电子的流水账，弃坑纸质手账。

上课没有认真听讲，大作业完成的不错，Transe和机器学习分类webshell 做了很多工作。

投了几份简历，有两家创业公司回复，工程经验不足，三个月时间也不能学到什么，严肃思考规划，不要浮躁。

反思

上床不能拿手机，锻炼没有坚持。心情不好的时候不要做记录。

规划

读论文–80篇自然语言处理相关。
读书-数学之美。
公开课-知识图谱。

认真复习课堂知识准备考试。

项目

项目地址

已有资源：

基于开源的transE算法 https://github.com/wuxiyu/transE
使用开源的数据集 http://openkg.cn/dataset/

阅读全文 »

TransE 作用

就是把三元组翻译成embedding词向量的方法
三元组，也就是（头实体，关系，尾实体）的形式，头实体和尾实体统称为实体。为了简化起见，我们用(h,r,t)来表示三元组。其中

h表示头实体
r表示关系
t表示尾实体

我们的目标是将知识库中所有的实体、关系表示成一个低维的向量。我们把三元组(h,r,t)对应的向量表示为(h,r,t)。

h 表示头实体对应的向量
r 表示关系对应的向量
t 表示尾实体对应的向量

阅读全文 »

知识图谱

什么是知识图谱

2012年5月，搜索引擎巨头谷歌在它的搜索页面中首次引入“知识图谱”：用户除了得到搜索网页链接外，还将看到与查询词有关的更加智能化的答案。谷歌高级副总裁艾米特·辛格博士一语道破知识图谱的重要意义所在：“构成这个世界的是实体，而非字符串（things, not strings）”。

阅读全文 »

IoC

Inverse of Control 控制反转

某一接口的具体实现类的选择控制权从调用类中移除，转交给第三方决定，即Spring容器借由Bean配置来进行控制。不够容易理解。

提出DI（Dependency Injection)依赖注入来代替IoC

调用类对魔衣接口的实现类的依赖关系由第三方注入，以移除调用类对某一接口实现类的依赖。

阅读全文 »

Question answering

factoid question: 可以用简短的事实回答的问题
two paradigms：

IR-based/text-based question answering
依赖大量信息或者数据集例如PUBMED,给定问题从这些documents中抽取passage

处理问题先确定可能的答案类型（通常是命名实体）
制定查询发送到搜索引擎-
搜索引擎将排序的文档分解成合适的段落并排列
最后答案从passages中提取并排序

阅读全文 »

学习曲线

learning-curve

偏斜类

skew-data

开学

导师

生活

假期

基本命令

安装
pip install scrapy
开始
startproject test
执行爬虫
scrapy crawn yaoblog # 爬虫名字

目录结构

…

examples

一、Scrapy版的知乎爬虫
import scrapy,time
from scrapy.spiders
import CrawlSpider
from bs4 import BeautifulSoup

###通用confige
mail=’youremail@163.com’
passowrd=’yourpassword’
get_url=”https://www.zhihu.com/“
login_url=”https://www.zhihu.com/login/email“
captcha_url=”https://www.zhihu.com/captcha.gif?r=%d&type=login"% (time.time() 1000)
firstpage=’https://www.zhihu.com/people/gchen20/activities‘
headers = {
“Accept”:”text/html, application/xhtml+xml, /*”, “Accept-Language”:”zh-CN”,
“User-Agent”:”Mozilla/5.0 (Windows NT 6.3; WOW64; rv:43.0) Gecko/20100101 Firefox/43.0”,
“Accept-Encoding”: “gzip, deflate”,
“Host”: “http://www.zhihu.com“,
“DNT”: “1”,
“Connection”: “Keep-Alive”
}

###scrapy spider
class MySpider(CrawlSpider):
name = ‘zhihu’
allowed_domains = [‘zhihu.com’] ###起始的两个request，获取_xsrf和验证码
def start_requests(self):
urls = [get_url,str(captcha_url),]
for url in urls:
yield scrapy.Request(url=url, headers=headers, callback=self.parse_item)

###两个起始request的响应处理函数，获取_xsrf和验证码，并发起登陆的request   def parse_item(self, response):        
    _xsrf=""        
    captChar=""        
    if(response.url==get_url):            
        soup = BeautifulSoup(response.body, 'lxml')            
        _xsrf = soup.find("input", {'type': 'hidden', 'name': '_xsrf'}).attrs['value']        
    else:            
        with open('capt.gif', 'wb') as f:                
            f.write(response.body)            
        captChar = input('please read the image capt.gif and input the captchar:')        
    yield scrapy.FormRequest(login_url,headers=headers,                                   formdata={'_xsrf':_xsrf, 'captcha': captChar,"password": passowrd,"email": email},                                   callback=self.logged_in)    
###logged_in的处理函数，主要是获取是否成功登陆的信息，成功登陆后，发起一个request    
def logged_in(self,response):
    soup = BeautifulSoup(response.body, 'lxml')        
    login_result = eval(soup.p.string)        
    result=login_result["msg"]        
    yield {'login':result}        
    if (login_result["r"] == 0):            
        yield scrapy.Request(url=firstpage, headers=headers,        callback=self.parse_data) 
###成功登陆后而发起的request的响应处理函数。    
def parse_data(self,response):        
    soup = BeautifulSoup(response.body, 'lxml')        
    yield {'data':soup.prettify()}

I pray to myself, for myself.

2017-10

项目

生活学习

反思

规划

基于TransE算法计算相似度的实现

项目

已有资源：

TransE算法的理解

TransE 作用

基于TransE算法计算相似度的理论知识

知识图谱

什么是知识图谱

Spring核心

IoC

QuestionAnsweringNote

Question answering

ML:EvaluatingAlgorithm2

学习曲线

偏斜类

ML:Overfitting

2017-09

开学

导师

生活

假期

ML:NeuralNetwork2

基本命令

目录结构

examples