自然语言处理（NLP）学习大纲

人工智能 2025-12-25 14:52:54

学习阶段	核心模块	知识点（含 2024-2025 最新技术）	学习网站（含有效网址）	GitHub 项目地址	学习视频网站（含有效网址）
基础奠基阶段（2-3 个月）	数学与编程基础	1. 核心数学：线性代数（矩阵运算、特征分解）、概率论（贝叶斯定理、概率分布）、离散数学（图论、集合论）、最优化（梯度下降、Adam 优化）2. 核心作用：支撑 NLP 算法推导（如词嵌入、注意力机制）与模型优化	1. 3Blue1Brown 官网（https://www.3blue1brown.com/）2. Khan Academy（https://www.khanacademy.org/）3. MIT OpenCourseWare（https://ocw.mit.edu/）4. 统计学习方法官网（https://statlearning.com/）	1. https://github.com/kenjihiranabe/The-Art-of-Linear-Algebra2. https://github.com/ashishpatel26/Mathematics-for-Machine-Learning3. https://github.com/tdhopper/statistics-for-machine-learning	1. B 站《线性代数的本质》（https://www.bilibili.com/video/BV1ys411472E/）2. MIT 概率论公开课（https://www.youtube.com/playlist?list=PLUl4u3cNGP61iGlKMZ3s6FsT9_uvxle8_）3. 中国大学 MOOC《最优化方法》（https://www.icourse163.org/course/HIT-1001907001）
		1. Python 核心：语法基础、函数式编程、面向对象、文件操作、正则表达式2. NLP 适配：文本数据处理、高效字符串操作、批量数据处理	1. Python 官方文档（https://www.python.org/doc/）2. 菜鸟教程 Python 专区（https://www.runoob.com/python/python-tutorial.html）3. Real Python（https://realpython.com/）4. Python 数据科学手册（https://jakevdp.github.io/PythonDataScienceHandbook/）	1. https://github.com/TheAlgorithms/Python2. https://github.com/jakevdp/PythonDataScienceHandbook3. https://github.com/geekcomputers/Python	1. B 站尚硅谷 Python 基础（https://www.bilibili.com/video/BV1eW411t7rd/）2. Coursera Python for Everybody（https://www.coursera.org/specializations/python）3. 慕课网 Python 正则表达式实战（https://www.imooc.com/course/list?c=python&keyword=正则表达式）
		1. NLP 必备库：NumPy（数组运算）、Pandas（数据处理）、Matplotlib/Seaborn（可视化）、NLTK（基础文本处理）、SpaCy（工业级文本处理）2. 实战要点：文本读取、分词、词性标注、停用词过滤、词频统计	1. NumPy 官方文档（https://numpy.org/doc/）2. Pandas 官网（https://pandas.pydata.org/）3. NLTK 文档（https://www.nltk.org/）4. SpaCy 官网（https://spacy.io/）	1. https://github.com/numpy/numpy/tree/main/examples2. https://github.com/pandas-dev/pandas/tree/main/examples3. https://github.com/nltk/nltk4. https://github.com/explosion/spaCy	1. B 站《NLTK 文本处理实战》（https://www.bilibili.com/video/BV1Z4411o7iL/）2. SpaCy 官方教程（https://spacy.io/usage/spacy-101）3. 慕课网 Pandas 文本数据处理（https://www.imooc.com/course/list?c=data&keyword=Pandas%20%E6%96%87%E6%9C%AC）
		1. 开发工具：Git/GitHub（版本管理）、Linux 命令行、Anaconda（环境配置）、Jupyter Notebook（交互式开发）2. NLP 工程化：虚拟环境管理、依赖包配置、代码调试	1. GitHub Learning Lab（https://lab.github.com/）2. Linux 公社（https://www.linuxidc.com/）3. Anaconda 官方文档（https://docs.anaconda.com/）4. Jupyter 官网（https://jupyter.org/）	1. https://github.com/git-guides2. https://github.com/justjavac/free-programming-books-zh_CN#linux3. https://github.com/jupyter/notebook	1. B 站 Git 零基础入门（https://www.bilibili.com/video/BV1FE411P7B3/）2. Jupyter Notebook 实战教程（https://www.bilibili.com/video/BV12E411A7ZQ/）3. Linux 命令行文本处理工具（https://www.youtube.com/watch?v=ZtqBQ68cfJc）
	NLP 入门认知	1. 核心概念：定义、发展历程（规则式→统计式→深度学习→大模型）、应用场景（文本分类、机器翻译、问答系统、聊天机器人）2. 技术流派：传统 NLP vs 深度学习 NLP vs 大语言模型（LLM）3. 行业趋势：多模态融合、低资源语言处理、大模型轻量化	1. 机器之心（https://www.jiqizhixin.com/）2. 新智元（https://www.zhidx.com/）3. CS224n 官网（https://web.stanford.edu/class/cs224n/）4. Hugging Face NLP 入门（https://huggingface.co/learn/nlp-course/）	1. https://github.com/keon/awesome-nlp2. https://github.com/yizhen20133868/awesome-deep-learning-nlp3. https://github.com/huggingface/awesome-huggingface	1. B 站《自然语言处理发展简史》（https://www.bilibili.com/video/BV1Mb411i7oe/）2. 斯坦福 CS224n 2025 导论（https://www.youtube.com/playlist?list=PLoROMvodv4rOSH4v6133s9LFPRHjEmbmJ）3. 慕课网 NLP 入门到精通（https://www.imooc.com/course/list?c=ai&keyword=NLP）
	文本预处理基础	1. 文本基础：字符编码（UTF-8）、文本格式（TXT/JSON/XML）、语言特性（词法 / 句法 / 语义）2. 预处理流程：文本清洗（去噪、去重、小写化）、分词（中文：jieba/THULAC；英文：NLTK/SpaCy）、词性标注、命名实体识别（基础版）、停用词过滤3. 数据标准化：文本归一化、拼写纠错	1. jieba 官网（https://github.com/fxsjy/jieba）2. THULAC 文档（http://thulac.thunlp.org/）3. 中文 NLP 处理指南（https://github.com/brightmart/nlp_chinese_corpus）4. pyspellchecker 文档（https://pyspellchecker.readthedocs.io/）	1. https://github.com/fxsjy/jieba2. https://github.com/thunlp/THULAC3. https://github.com/stopwords-iso/stopwords-zh4. https://github.com/barrust/pyspellchecker	1. B 站 jieba 分词实战（https://www.bilibili.com/video/BV1Qt411u7Y8/）2. SpaCy 文本预处理教程（https://www.youtube.com/watch?v=6Zv-2T84X3A）3. 中国大学 MOOC《文本信息处理》（https://www.icourse163.org/course/BEIJING-1003379001）
传统 NLP 技术阶段（2-3 个月）	传统文本表示与特征工程	1. 文本表示：词袋模型（BoW）、TF-IDF、N-gram、共现矩阵2. 特征工程：词性特征、句法特征、关键词提取（TF-IDF、TextRank）、文本相似度计算（余弦相似度、Jaccard 系数）、主题模型（LDA/LSA）3. 应用局限：词汇鸿沟、缺乏上下文信息	1. Scikit-learn 文本特征文档（https://scikit-learn.org/stable/modules/feature_extraction.html#text-feature-extraction）2. TextRank 论文（https://web.eecs.umich.edu/~mihalcea/papers/mihalcea.emnlp04.pdf）3. LDA 官方教程（https://radimrehurek.com/gensim/models/ldamodel.html）	1. https://github.com/scikit-learn/scikit-learn/tree/main/examples/text2. https://github.com/letiantian/TextRank4ZH3. https://github.com/RaRe-Technologies/gensim4. https://github.com/davidadamojr/TextRank	1. B 站 TF-IDF 原理与实战（https://www.bilibili.com/video/BV1hE411t7RN/）2. Scikit-learn 文本特征工程教程（https://www.youtube.com/watch?v=8Ivy066u3l8）3. 慕课网 TextRank 关键词提取（https://www.imooc.com/course/list?c=data&keyword=TextRank）
	传统 NLP 核心算法	1. 文本分类：朴素贝叶斯、SVM、逻辑回归、随机森林（Scikit-learn 实现）2. 句法分析：短语结构分析、依存句法分析（Stanford Parser）3. 语义分析：WordNet 语义相似度、FrameNet 框架语义4. 命名实体识别（NER）：基于规则、基于统计（CRF）	1. Scikit-learn 分类文档（https://scikit-learn.org/stable/modules/classification.html）2. Stanford Parser 官网（https://nlp.stanford.edu/software/lex-parser.shtml）3. WordNet 官网（https://wordnet.princeton.edu/）4. CRFsuite 文档（https://crfsuite.readthedocs.io/）	1. https://github.com/scikit-learn/scikit-learn/tree/main/examples/classification2. https://github.com/nltk/nltk/tree/main/nltk/parse3. https://github.com/PrincetonML/WordNet4. https://github.com/chokkan/crfsuite	1. B 站朴素贝叶斯文本分类实战（https://www.bilibili.com/video/BV1sb411i7aG/）2. Stanford Parser 句法分析教程（https://www.youtube.com/watch?v=e9aG1yZ0Q4Q）3. 中国大学 MOOC《自然语言处理》（https://www.icourse163.org/course/NJU-1001571005）
	传统应用场景实战	1. 基础应用：垃圾邮件识别、情感分析（基于 TF-IDF+SVM）、关键词提取、文本聚类（K-Means）2. 进阶应用：简单问答系统（基于规则匹配）、文本摘要（提取式，基于 TextRank）、机器翻译（基于短语的统计机器翻译 SMT）3. 性能瓶颈：复杂语义理解能力弱、多语言适配难	1. Kaggle 垃圾邮件识别数据集（https://www.kaggle.com/datasets/uciml/sms-spam-collection-dataset）2. 情感分析数据集（https://www.kaggle.com/datasets/lakshmi25npathi/imdb-dataset-of-50k-movie-reviews）3. 中文文本摘要数据集（https://github.com/brightmart/nlp_chinese_corpus/tree/master/datasets/cn_summary）4. WMT 翻译数据集（https://www.statmt.org/wmt19/）	1. https://github.com/graykode/spam-filter2. https://github.com/cjhutto/vaderSentiment3. https://github.com/ppwwyyxx/text-summarization4. https://github.com/moses-smt/mosesdecoder	1. B 站垃圾邮件识别实战（https://www.bilibili.com/video/BV1Qt411o7dF/）2. Kaggle 情感分析教程（https://www.youtube.com/watch?v=oGq9vW4hN4s）3. 慕课网文本摘要实战（https://www.imooc.com/course/list?c=ai&keyword=文本摘要）
深度学习 NLP 核心阶段（3-4 个月）	词嵌入与基础序列模型	1. 词嵌入技术：Word2Vec（Skip-gram/CBOW）、GloVe、FastText（处理未登录词）、Gensim 实现2. 2025 升级：动态词嵌入（ELMo，上下文相关）、ERNIE 1.0（知识增强词嵌入）、SpanBERT（跨度级词嵌入）3. 基础序列模型：RNN、LSTM（解决梯度消失）、GRU（轻量化 LSTM）、双向 LSTM	1. Word2Vec 论文（https://arxiv.org/abs/1301.3781）2. GloVe 官网（https://nlp.stanford.edu/projects/glove/）3. FastText 官网（https://fasttext.cc/）4. ELMo 论文（https://arxiv.org/abs/1802.05365）5. ERNIE 官网（https://github.com/PaddlePaddle/ERNIE）	1. https://github.com/tmikolov/word2vec2. https://github.com/stanfordnlp/GloVe3. https://github.com/facebookresearch/fastText4. https://github.com/allenai/allennlp/tree/master/allennlp/modules/elmo5. https://github.com/PaddlePaddle/ERNIE	1. B 站 Word2Vec 原理可视化（https://www.bilibili.com/video/BV1aE411x7mC/）2. GloVe 实战教程（https://www.youtube.com/watch?v=R39tWYYKNcI）3. 慕课网 ELMo 上下文词嵌入（https://www.imooc.com/course/list?c=ai&keyword=ELMo）
	编码器 - 解码器与注意力机制	1. 编码器 - 解码器架构：Seq2Seq（机器翻译、文本生成基础）、编码器（LSTM/GRU）、解码器（LSTM/GRU）2. 注意力机制：Bahdanau 注意力（加性）、Luong 注意力（乘性）、自注意力（Self-Attention）3. 核心突破：解决长序列依赖问题、提升翻译 / 生成精度4. 应用场景：机器翻译、文本摘要、对话生成	1. PyTorch Seq2Seq 教程（https://pytorch.org/tutorials/intermediate/seq2seq_translation_tutorial.html）2. TensorFlow 注意力教程（https://www.tensorflow.org/tutorials/text/nmt_with_attention）3. 注意力机制论文（https://arxiv.org/abs/1409.0473）4. OpenNMT 官网（https://opennmt.net/）	1. https://github.com/pytorch/tutorials/tree/main/intermediate_source2. https://github.com/tensorflow/models/tree/master/official/nlp/models3. https://github.com/spro/practical-pytorch/tree/master/seq2seq-translation4. https://github.com/OpenNMT/OpenNMT-py	1. B 站 LSTM 原理与实战（https://www.bilibili.com/video/BV1Sb411i7aE/）2. Seq2Seq 机器翻译教程（https://www.youtube.com/watch?v=EoGUlvhRYpk）3. 斯坦福 CS224n 注意力机制精讲（https://www.youtube.com/watch?v=UXF8C0i61YI）
	Transformer 架构与经典预训练模型	1. Transformer 核心：Encoder/Decoder 结构、Multi-Head Attention、Position-wise Feed-Forward Network、Positional Encoding（位置编码）2. 经典预训练模型：BERT（双向注意力）、RoBERTa（优化版 BERT）、ALBERT（轻量化 BERT）、GPT（单向生成式）3. 2024-2025 升级：BERT-base-24（更长序列支持）、DistilBERTv3（更高压缩率）、T5（统一文本到文本任务）4. 微调技术：全参数微调、冻结微调、LoRA 微调	1. Transformer 论文（https://arxiv.org/abs/1706.03762）2. BERT 官网（https://github.com/google-research/bert）3. Hugging Face Transformers 文档（https://huggingface.co/docs/transformers/index）4. T5 论文（https://arxiv.org/abs/1910.10683）5. LoRA 论文（https://arxiv.org/abs/2106.09685）	1. https://github.com/google-research/bert2. https://github.com/huggingface/transformers3. https://github.com/facebookresearch/fairseq/tree/main/fairseq/models/roberta4. https://github.com/google-research/text-to-text-transfer-transformer5. https://github.com/microsoft/LoRA	1. B 站 Transformer 原理可视化（https://www.bilibili.com/video/BV1pu411o7BE/）2. BERT 微调实战教程（https://www.youtube.com/watch?v=7kLi8u2dJz0）3. 斯坦福 CS224n Transformer 精讲（https://www.youtube.com/watch?v=5vcj8kSwBCY）
NLP 大模型前沿阶段（3-4 个月）	大语言模型（LLM）核心技术	1. 主流大模型：GPT-4o（2025 多模态版）、Claude 3 Opus、Qwen-2（通义千问 2025）、Llama 3（Meta 开源）、Mistral Large 22. 核心技术：自回归生成、指令微调（Instruction Tuning）、RLHF（基于人类反馈的强化学习）、对齐技术（价值对齐、安全对齐）3. 2025 前沿：长上下文理解（GPT-4o 128k tokens）、多模态融合（文本 + 图像 + 语音）、MoE（混合专家模型，如 GLaM）	1. OpenAI 官网（https://openai.com/）2. Anthropic 官网（https://www.anthropic.com/）3. 阿里云通义千问（https://tongyi.aliyun.com/）4. Meta Llama 官网（https://ai.meta.com/llama/）5. Mistral AI 官网（https://mistral.ai/）6. RLHF 论文（https://arxiv.org/abs/1909.08593）	1. https://github.com/openai/gpt-4o2. https://github.com/meta-llama/llama33. https://github.com/QwenLM/Qwen24. https://github.com/mistralai/mistral-src5. https://github.com/facebookresearch/llama-recipes6. https://github.com/lvwerra/trl	1. B 站 GPT-4o 多模态实战（https://www.bilibili.com/video/BV1xH4y1W7bX/）2. Llama 3 本地部署教程（https://www.youtube.com/watch?v=9G7P3LzQEVs）3. 慕课网 RLHF 原理与实战（https://www.imooc.com/course/list?c=ai&keyword=RLHF）
	大模型训练与微调	1. 训练技术：预训练（海量文本）、持续预训练（领域适配）、指令微调（通用能力）、领域微调（如医疗 / 法律 NLP）2. 高效微调：LoRA（低秩适配）、QLoRA（量化 LoRA）、AdaLoRA（自适应 LoRA）、Prefix Tuning3. 训练框架：Megatron-LM、DeepSpeed、Colossal-AI4. 硬件适配：GPU 集群、TPU、千亿参数模型分布式训练	1. DeepSpeed 官网（https://www.deepspeed.ai/）2. Megatron-LM 文档（https://github.com/NVIDIA/Megatron-LM）3. Colossal-AI 官网（https://www.colossalai.org/）4. Hugging Face PEFT 文档（https://huggingface.co/docs/peft/index）5. QLoRA 论文（https://arxiv.org/abs/2305.14314）	1. https://github.com/microsoft/DeepSpeed2. https://github.com/NVIDIA/Megatron-LM3. https://github.com/hpcaitech/ColossalAI4. https://github.com/huggingface/peft5. https://github.com/artidoro/qlora6. https://github.com/huggingface/transformers/tree/main/examples/pytorch/language-modeling	1. B 站 LoRA 微调实战（https://www.bilibili.com/video/BV1yV4y1o7aK/）2. DeepSpeed 分布式训练教程（https://www.youtube.com/watch?v=zR1lO2Z0Xz0）3. 慕课网大模型领域微调（https://www.imooc.com/course/list?c=ai&keyword=大模型微调）
	大模型应用与开发	1. 核心应用：对话系统（Chatbot）、代码生成（如 GPT-4o Code）、文本生成（小说 / 报告）、智能问答（知识库问答）、机器翻译（多语言）2. 开发框架：LangChain（链式调用）、 LlamaIndex（知识索引）、Agent 智能体（如 AutoGPT）3. 2025 创新：大模型插件开发、多智能体协作、个性化大模型定制	1. LangChain 官网（https://www.langchain.com/）2. LlamaIndex 官网（https://www.llamaindex.ai/）3. AutoGPT 官网（https://agpt.co/）4. Hugging Face Agents 文档（https://huggingface.co/docs/transformers/main_classes/agents）5. 大模型插件开发指南（https://platform.openai.com/docs/plugins/introduction）	1. https://github.com/langchain-ai/langchain2. https://github.com/run-llama/llama_index3. https://github.com/Significant-Gravitas/AutoGPT4. https://github.com/huggingface/transformers/tree/main/examples/community5. https://github.com/facebookresearch/llama-agentic-system	1. B 站 LangChain 实战教程（https://www.bilibili.com/video/BV1n8411i7bH/）2. LlamaIndex 知识库问答（https://www.youtube.com/watch?v=3yPBVii7Ct0）3. 慕课网大模型 Agent 开发（https://www.imooc.com/course/list?c=ai&keyword=大模型 Agent）
	大模型评估与优化	1. 评估指标：PPL（困惑度）、BLEU（翻译）、ROUGE（摘要）、MMLU（多任务理解）、MT-Bench（对话质量）2. 优化技术：模型量化（INT4/INT8）、模型压缩（蒸馏、剪枝）、上下文压缩（RAG 检索增强）3. 安全与对齐：对抗样本防御、偏见缓解、内容审核、红队评估	1. EvalAI 官网（https://eval.ai/）2. MT-Bench 官网（https://github.com/lm-sys/FastChat/tree/main/fastchat/llm_judge）3. RAG 技术文档（https://www.pinecone.io/learn/series/rag/）4. 模型量化工具（https://github.com/facebookresearch/GPTQ）5. 大模型安全指南（https://ai.meta.com/research/publications/llama-3-responsible-ai-usage-guide/）	1. https://github.com/lm-sys/FastChat2. https://github.com/EleutherAI/lm-evaluation-harness3. https://github.com/pinecone-io/examples/tree/master/learn/generation/llm-field-guide4. https://github.com/facebookresearch/GPTQ5. https://github.com/huggingface/transformers/tree/main/examples/research_projects/distillation	1. B 站大模型评估指标实战（https://www.bilibili.com/video/BV1sH4y1o7qZ/）2. RAG 检索增强教程（https://www.youtube.com/watch?v=3K0ZgGQH04A）3. 慕课网大模型量化部署（https://www.imooc.com/course/list?c=ai&keyword=模型量化）
NLP 垂直领域与多模态阶段（2-3 个月）	NLP 垂直领域应用	1. 通用领域：情感分析、文本分类、命名实体识别、关系抽取、事件抽取2. 行业领域：医疗 NLP（电子病历分析、医学问答）、法律 NLP（合同审查、法条检索）、金融 NLP（舆情分析、风险预警）、教育 NLP（智能答疑、作文批改）3. 低资源语言处理：小语种 NLP、方言识别、跨语言迁移学习	1. 医疗 NLP 数据集（https://www.medrxiv.org/）2. 法律 NLP 平台（https://www.lexpredict.com/）3. 金融 NLP 数据集（https://www.kaggle.com/datasets/cnic92/financial-news-sentiment-analysis）4. 低资源语言联盟（https://www.elra.info/）5. XLM-RoBERTa 文档（https://huggingface.co/docs/transformers/model_doc/xlm-roberta）	1. https://github.com/medspacy/medspacy2. https://github.com/lexpredict/lexpredict-contraxsuite3. https://github.com/facebookresearch/fairseq/tree/main/examples/xlmr4. https://github.com/brightmart/nlp_chinese_corpus/tree/master/datasets/finance5. https://github.com/thunlp/OpenKE	1. B 站医疗 NLP 实战（https://www.bilibili.com/video/BV1vH4y1o7bX/）2. 法律 NLP 合同审查教程（https://www.youtube.com/watch?v=5fRcH2R9X7c）3. 慕课网金融舆情分析（https://www.imooc.com/course/list?c=ai&keyword=金融舆情）
	多模态 NLP 技术	1. 核心技术：文本 - 图像（CLIP、BLIP-2）、文本 - 语音（Whisper、TTS）、文本 - 视频（Video-LLaMA）2. 2025 前沿：GPT-4o（多模态统一模型）、Qwen-VL-Max（图文音视频）、Flamingo（视觉语言大模型）3. 应用场景：图文生成（文生图 / 图生文）、视觉问答（VQA）、语音识别与合成、视频字幕生成	1. CLIP 官网（https://openai.com/research/clip）2. BLIP-2 论文（https://arxiv.org/abs/2301.12597）3. Whisper 官网（https://openai.com/research/whisper）4. Video-LLaMA 文档（https://github.com/DAMO-NLP-SG/Video-LLaMA）5. Hugging Face Diffusers 文档（https://huggingface.co/docs/diffusers/index）	1. https://github.com/openai/CLIP2. https://github.com/salesforce/LAVIS/tree/main/projects/blip23. https://github.com/openai/whisper4. https://github.com/DAMO-NLP-SG/Video-LLaMA5. https://github.com/huggingface/diffusers6. https://github.com/coqui-ai/TTS	1. B 站 CLIP 原理与实战（https://www.bilibili.com/video/BV1wH4y1o7rZ/）2. Whisper 语音识别教程（https://www.youtube.com/watch?v=4z6tPpB5z0Y）3. 慕课网多模态大模型实战（https://www.imooc.com/course/list?c=ai&keyword=多模态大模型）
工程化与项目实战阶段（2-3 个月）	NLP 项目实战	1. 基础项目：情感分析系统（基于 BERT）、命名实体识别工具（基于 BERT-CRF）、关键词提取与文本摘要（基于 TextRank/T5）2. 进阶项目：智能聊天机器人（基于 Llama 3+LangChain）、知识库问答系统（基于 RAG）、机器翻译系统（基于 NMT/Transformer）3. 前沿项目：多模态视觉问答（VQA）、大模型 Agent 智能体、低资源语言翻译系统	1. Kaggle NLP 竞赛（https://www.kaggle.com/categories/nlp）2. GLUE 基准测试（https://gluebenchmark.com/）3. SQuAD 问答数据集（https://rajpurkar.github.io/SQuAD-explorer/）4. 多模态数据集（https://visualqa.org/）5. 中文 NLP 数据集（https://github.com/brightmart/nlp_chinese_corpus）	1. https://github.com/huggingface/transformers/tree/main/examples/pytorch/text-classification2. https://github.com/facebookresearch/DrQA3. https://github.com/langchain-ai/langchain/tree/master/examples/chatbot4. https://github.com/facebookresearch/fairseq/tree/main/examples/translation5. https://github.com/allenai/allennlp/tree/master/allennlp/models/reading_comprehension	1. B 站 BERT 情感分析实战（https://www.bilibili.com/video/BV1rH4y1o7mZ/）2. LangChain 聊天机器人教程（https://www.youtube.com/watch?v=2xxziIWmaSA）3. 慕课网知识库问答系统实战（https://www.imooc.com/course/list?c=ai&keyword=知识库问答）
	NLP 工程化能力	1. 数据工程：数据集标注（Prodigy/Doccano）、数据增强（EDA / 回译）、数据集划分与验证、数据清洗工具开发2. 模型训练：分布式训练（PyTorch Distributed/DeepSpeed）、超参数调优（Optuna/Weights & Biases）、训练监控与日志3. 模型部署：Docker 容器化、RESTful API 开发、模型服务化（FastAPI/Flask）、云端部署（AWS/GCP/ 阿里云）4. 运维监控：模型性能监控、版本管理、故障排查、服务扩容	1. Doccano 官网（https://doccano.herokuapp.com/）2. Prodigy 官网（https://prodi.gy/）3. Optuna 文档（https://optuna.readthedocs.io/）4. Weights & Biases 官网（https://wandb.ai/）5. FastAPI 文档（https://fastapi.tiangolo.com/）6. Docker 官网（https://www.docker.com/）	1. https://github.com/doccano/doccano2. https://github.com/optuna/optuna3. https://github.com/wandb/client4. https://github.com/tiangolo/fastapi5. https://github.com/pytorch/examples/tree/master/distributed/ddp6. https://github.com/microsoft/DeepSpeedExamples	1. B 站 Doccano 数据标注实战（https://www.bilibili.com/video/BV1qH4y1o7xZ/）2. FastAPI 模型部署教程（https://www.youtube.com/watch?v=7t2alSnE2-I）3. 慕课网 Docker 容器化部署（https://www.imooc.com/course/list?c=cloud&keyword=Docker%20部署）
	求职与进阶提升	1. 求职准备：NLP 工程师面试题整理、项目简历撰写、技术博客写作（如知乎 / 掘金）、开源项目贡献2. 进阶方向：大模型算法研究员、NLP 工程架构师、垂直领域 NLP 专家（医疗 / 法律 / 金融）、多模态 AI 工程师3. 学术研究：顶会论文阅读（ACL/EMNLP/NAACL）、论文复现、研究方向探索（如低资源 NLP、大模型对齐）	1. ACL 官网（https://www.aclweb.org/）2. EMNLP 官网（https://2024.emnlp.org/）3. 知乎 NLP 专栏（https://www.zhihu.com/topic/19551241）4. 掘金技术社区（https://juejin.cn/topic/6848723216349775885）5. GitHub Jobs（https://jobs.github.com/）	1. https://github.com/graykode/nlp-paper-with-code2. https://github.com/sebastianruder/NLP-progress3. https://github.com/huggingface/papers-with-code4. https://github.com/allenai/allennlp5. https://github.com/facebookresearch/fairseq	1. B 站 NLP 面试题精讲（https://www.bilibili.com/video/BV1fH4y1o7qZ/）2. ACL 论文精读系列（https://www.youtube.com/watch?v=zR1lO2Z0Xz0）3. 慕课网开源项目贡献指南（https://www.imooc.com/course/list?c=opensource&keyword=开源贡献）