如何将python采集到的文章保存到wordpress

前言

wordpress算是比较流行的博客网站框架, 我本人也一直在使用, 关于python采集文章的上传, 有以下几种方法:

  1. 直接操作数据库
  2. 使用wordpressrest api
  3. 使用wordpress_xmlrpc第三方模块

其中第三种的体验最为舒适, 对新手友好, 推荐使用

好了 接下来挨个介绍一下这几种方法的使用

直接操作数据库

我们可以使用pythonpymysql库进行mysql数据库的直连操作, 具体不过多介绍, 直接上示例代码:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
import pymysql.cursors

# 连接数据库
connect = pymysql.Connect(
host='数据库IP',
port=3306,
user='root',
passwd='xxxx',
db='数据库名称',
charset='utf8mb4'
)

# 获取游标
cursor = connect.cursor()

# 插入数据
def insert(post_author, post_date, post_date_gmt,post_content,post_title,post_status,comment_status,ping_status,post_type,menu_order,post_excerpt,to_ping):
cursor.execute('INSERT INTO wp_posts (post_author, post_date, post_date_gmt,post_content,post_title,post_status,comment_status,ping_status,post_type,menu_order,post_excerpt)VALUES ( %d, %s,%s, %s,%s, %s,%s, %s,%s,%s, %s, %s)', (1, post_date, post_date_gmt,post_content,post_title,post_status,comment_status,ping_status,post_type,menu_order,post_excerpt,to_ping))

connect.commit()
print('成功插入', cursor.rowcount, '条数据')

使用wordpress的rest api

关于rest api官方文档如下:

https://developer.wordpress.org/rest-api/

我们先试一下api的威力 格式为:

1
http://{域名}/index.php/wp-json/wp/v2/posts

比如:

1
http://www.jhcms.net/index.php/wp-json/wp/v2/posts

我们能看到几乎大部分文章的信息

那么如何创建一个新文章

我们参考官方文档 https://developer.wordpress.org/rest-api/reference/posts/#create-a-post

得到重要信息如下:

参数

date The date the object was published, in the site’s timezone.
date_gmt The date the object was published, as GMT.
slug An alphanumeric identifier for the object unique to its type.
status A named status for the object. One of: publish, future, draft, pending, private
password A password to protect access to the content and excerpt.
title The title for the object.
content The content for the object.
author The ID for the author of the object.
excerpt The excerpt for the object.
featured_media The ID of the featured media for the object.
comment_status Whether or not comments are open on the object. One of: open, closed
ping_status Whether or not the object can be pinged. One of: open, closed
format The format for the object. One of: standard, aside, chat, gallery, link, image, quote, status, video, audio
meta Meta fields.
sticky Whether or not the object should be treated as sticky.
template The theme file to use to display the object. One of:``
categories The terms assigned to the object in the category taxonomy.
tags The terms assigned to the object in the post_tag taxonomy.

POST /wp/v2/posts 意为要用post方法提交到/wp/v2/posts这个地址

默认是只读api, 要实现提交数据需要安装插件jwt,安装了jwt后可以请求到token了,然后在rest api中传入token信息,系统就不会拒绝你的发布文章的操作了

操作步骤

  1. 第一步 在wordpress管理后台安装 JWT Auth 插件

  2. 第二部 在网站根目录 .htaccess 文件中添加如下内容

    1
    2
    3
    4
    5
    RewriteEngine on
    RewriteCond %{HTTP:Authorization} ^(.*)
    RewriteRule ^(.*) - [E=HTTP_AUTHORIZATION:%1]

    SetEnvIf Authorization "(.*)" HTTP_AUTHORIZATION=$1
  3. wp-config.php 文件中添加如下内容:

    1
    2
    define('JWT_AUTH_SECRET_KEY', 'your-top-secret-key');//随便填写一个密码
    define('JWT_AUTH_CORS_ENABLE', true);
  4. Post请求调用http://{你的域名}/wp-json/jwt-auth/v1/token接口获取token

  5. 根据token进行文章的发布

核心代码如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
# -*- coding:utf-8 -*-

import re
import requests
import json
import time
from numpy import *




def get_token():
session = requests.Session()
url = 'http://sex.newban.cn/wp-json/jwt-auth/v1/token'
data = {
'username':"son3g",
'password':"123456"
}
headers = {'user-agent': 'Mozolla/5.0',
}
resp = session.post(url, data=data, headers=headers, timeout=3335) # 请求
r = json.loads(resp.content)
return r




def _do_post( token =''):
session = requests.Session()
url = 'http://sex.newban.cn/wp-json/wp/v2/posts'
data = {
'date': time.strftime('%Y-%m-%d %H:%M:%S', time.localtime()),
'date_gmt': time.strftime('%Y-%m-%d %H:%M:%S', time.gmtime()),
'slug': 'xx',
'status': 'publish',
'password': '',
'title': 'rest api发布post测试',
'content': '系统测试我想我是海冬天的大海',
'author ': '121852835@qq.com',
'excerpt': '',
'featured_media': '0',
'comment_status': 'open',
'ping_status': 'closed',
'format': 'standard',
'meta': [],
'sticky': False, # 置顶
'template': '',
'categories': '1', # 1 未分类
'tags': ''
}
headers = {'user-agent': 'Mozolla/5.0',
'Authorization': 'Bearer ' + token
}
resp = session.post(url, data=data, headers=headers, timeout=3335) # 请求
print (resp.text)
# r = json.loads(resp.content, 'utf-8')

# if r["code"] == 400:
# print r["code"]
# print r["message"]
# print r["data"]
# print r["data"]["status"]
#
#
# # print r["data"]["params"]
# for key in r["data"]["params"]:
# print ("%s=> %s" % (key, r["data"]["params"][key]))
# # print 'resp.text=>' + resp.text
#
# # print time.strftime('%Y-%m-%d %H:%M:%S',time.localtime(time.time()))
# # print time.strftime('%a, %d %b %Y %H:%M:%S GMT+0800 (CST)',time.localtime(time.time())),
# dt = formatdate(None, usegmt=True)
# dt1 = formatdate(None, usegmt=False)
# dt3 = formatdate()
# print(dt)
# print(dt1)
# else:
# print r["code"]
# print r["message"]
# print resp.status_code


if __name__=='__main__':
r = get_token()
print (r)
_do_post(r["data"]['token'])

使用wordpress_xmlrpc第三方模块

操作步骤如下:

  1. 安装wordpress_xmlrpc

    1
    pip install python-wordpress-xmlrpc
  2. 模块引入

    1
    2
    from wordpress_xmlrpc import Client, WordPressPost
    from wordpress_xmlrpc.methods.posts import GetPosts,NewPost
  3. 发布新文章

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    def push_article(post_title,post_content_html):
    post = WordPressPost()
    post.title = post_title
    post.slug = post_title
    post.content = post_content_html
    post.terms_names = {
    'post_tag': post_title.split(" "),
    'category': ["itarticle"]
    }
    post.post_status = 'publish'
    wp.call(NewPost(post))

    if __name__ == '__main__':
    push_article("文章标题","文章内容")

    属性介绍:

    • title: 文章标题
    • content: 文章正文
    • post_status: 文章状态,不写默认是草稿,private表示私密的,draft表示草稿,publish表示发布
    • terms_names: 设置文章的标签 tag等
    • slug: 文章别名

完整代码如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
# -*- coding:utf-8 -*-
from wordpress_xmlrpc import Client, WordPressPost
from wordpress_xmlrpc.methods.posts import GetPosts,NewPost

def push_article(post_title,post_content_html):
post = WordPressPost()
post.title = post_title
post.slug = post_title
post.content = post_content_html
post.terms_names = {
'post_tag': post_title.split(" "),
'category': ["itarticle"]
}
post.post_status = 'publish'
wp.call(NewPost(post))

if __name__ == '__main__':
push_article("文章标题","文章内容")

是不是很简单呢, 如果是批量上传的话, 直接一个循环调用即可

本文为作者原创 转载时请注明出处 谢谢

乱码三千 – 点滴积累 ,欢迎来到乱码三千技术博客站

0%