找参数
前程无忧网站列表页原先为html内容,近期改版成了json格式
接口信息如下:
headers = {
# "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/110.0.0.0 Safari/537.36 Edg/110.0.1587.63",
"sign": "71a398e6babd2a64e211a01944370e35effdb65fe1f4ddf636f8b592360b98c3",
}
url = "https://cupid.51job.com/open/noauth/search-pc"
params = {
"api_key": "51job",
"timestamp": "1679558531",
"keyword": "",
"searchType": "2",
"function": "",
"industry": "",
"jobArea": "070000",
"jobArea2": "",
"landmark": "",
"metro": "",
"salary": "",
"workYear": "",
"degree": "",
"companyType": "",
"companySize": "",
"jobType": "",
"issueDate": "",
"sortType": "0",
"pageNum": "2",
"requestId": "17f464414091e22343b04e1703c6d9bb",
"pageSize": "50",
"source": "1",
"accountId": "",
"pageCode": "sou|sou|soulb"
}
response2 = session.get(url,
headers=headers,
params=params)
print(response2.text)
print(response2)
删删参数测试,可见主要是请求头中的sign和请求体中的requestsId
在接口的调用堆栈随机猜个位置,开始打断点,哪个位置,猜呗。
逐步调试,幸运的是,发现了可疑点:
感觉八九不离十了嗷!
参数1:sign
1.函数方法p.a.HmacSHA256
应该就是个sha256加密
2.参数1
t:
>t
<'/open/noauth/search-pc?api_key=51job×tamp=1679560613&keyword=&searchType=2&function=&industry=&jobArea=070000&jobArea2=&landmark=&metro=&salary=&workYear=°ree=&companyType=&companySize=&jobType=&issueDate=&sortType=0&pageNum=3&requestId=f6c5c9072464c4862cf50fc46ebde4c5&pageSize=50&source=1&accountId=&pageCode=sou%7Csou%7Csoulb'
这个简单,就是链接+请求体拼接
3.参数2
>c["a"].state.commonStore.cupid_sign_key
<'abfc8f9dcf8c3f3d8aa294ac5f2cf2cc7767e5592590f39c3f503271dd68562b'
麻了。让断点开放,看下是不是最终结果
结果是fbdb7289b8fe28652967cacc2d49124343894132d255ac28d47535306243c3ba
看来这个也是要解开的。先往下看,最终生成过程,再来看c["a"].state.commonStore.cupid_sign_key怎么形成的
哦,不用看了,这个e.headers.sign就是最终结果

似曾相识而又想不起来在哪写过了,还是应该注意文档记录。
那么下面就是如何解密刚刚的参数了
4.解密
感觉像是写死的?重开浏览器看看
换台电脑看看...
猜对了捏。
好像是vuex的用法
现在总结一下,就是p.a.HmacSHA256(t, i),t是链接、参数拼接,i是写死的密钥
HmacSHA256查了一下,可以使用crypto-js完成,不需要自己写方法,下面上代码
step1:安装crypto-js
官方文档贴上:

在代码路径下安装
step2:测试js代码
var SHA256 = require("crypto-js/hmac-sha256");
function hmac_sha256(t){
var i = "abfc8f9dcf8c3f3d8aa294ac5f2cf2cc7767e5592590f39c3f503271dd68562b";
var res = SHA256(t,i).toString();
return res
}
var t = "/open/noauth/search-pc?api_key=51job×tamp=1679564225&keyword=&searchType=2&function=&industry=&jobArea=070000&jobArea2=&landmark=&metro=&salary=&workYear=°ree=&companyType=&companySize=&jobType=&issueDate=&sortType=0&pageNum=3&requestId=dcabe20e033fafd1c08bf634eb87302e&pageSize=50&source=1&accountId=&pageCode=sou%7Csou%7Csoulb"
console.log(hmac_sha256(t))
>>>9c212c2cd03562d3102ed8ac051ef6c436c4b4c04795d926178f3ad9db6154f5
没毛病
step3:上线python
import requests
import execjs
import time
from urllib import parse
def get_sign(params):
t = "/open/noauth/search-pc?"
for k, v in params.items():
t += f"{k}={parse.quote(v)}&"
t = t[:-1]
ctx = execjs.compile("""var SHA256 = require("crypto-js/hmac-sha256");
function hmac_sha256(t){
var i = "abfc8f9dcf8c3f3d8aa294ac5f2cf2cc7767e5592590f39c3f503271dd68562b";
var res = SHA256(t,i).toString();
return res
}""")
sign = ctx.call("hmac_sha256", t)
return sign
def req1():
session = requests.session()
url = "https://cupid.51job.com/open/noauth/search-pc"
params = {
"api_key": "51job",
"timestamp": str(int(time.time())),
"keyword": "",
"searchType": "2",
"function": "",
"industry": "",
"jobArea": "070000",
"jobArea2": "",
"landmark": "",
"metro": "",
"salary": "",
"workYear": "",
"degree": "",
"companyType": "",
"companySize": "",
"jobType": "",
"issueDate": "",
"sortType": "0",
"pageNum": "2",
"requestId": "17f464414091e22343b04e1703c6d9bb",
"pageSize": "50",
"source": "1",
"accountId": "",
"pageCode": "sou|sou|soulb"
}
sign = get_sign(params)
headers = {
"sign": sign,
}
response2 = session.get(url,
headers=headers,
params=params)
print(response2.text)
print(response2)
if __name__ == '__main__':
req1()
参数2:requestId
不需要。sign如果是通过params生成的,那么这个参数不带也是可以的。