nicolashernandez

Reading materials in Natural Language Processing and Text Mining

Natural Language Processing

Dan Jurafsky and James H. Martin, Speech and Language Processing (3rd ed. draft)

Bonan Min, Hayley Ross, Elior Sulem, Amir Pouran Ben Veyseh, Thien Huu Nguyen, Oscar Sainz, Eneko Agirre, Ilana Heinz, Dan Roth, Recent Advances in Natural Language Processing via Large Pre-Trained Language Models: A Survey, Nov 2021, arxiv

Gilles Adda, Annelies Braffort, Ioana Vasilescu, François Yvon, Nominé Jean-François. État de l’art des technologies linguistiques pour la langue française. Rapport de recherche CNRS - LISN. 2022.

Usman Naseem, Imran Razzak, Shah Khalid Khan, Mukesh Prasad, A Comprehensive Survey on Word Representation Models: From Classical to State-Of-The-Art Word Representation Language Models, ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP), 2020

Anna Rogers, Olga Kovaleva, and Anna Rumshisky. 2020. A Primer in BERTology: What We Know About How BERT Works. Transactions of the Association for Computational Linguistics, 8:842–866.

Deep Learning

Ian Goodfellow and Yoshua Bengio and Aaron Courville, Deep Learning book, MIT Press, 2016 http://www.deeplearningbook.org ; https://github.com/janishar/mit-deep-learning-book-pdf

Aston Zhang and Zachary C. Lipton and Mu Li and Alexander J. Smola. Dive into Deep Learning. 2020 https://d2l.ai

Ruder, S., Peters, M. E., Swayamdipta, S., and Wolf, T. (2019). “Transfer Learning in Natural Language Processing”. In: Proceedings of the 2019 Con- ference of the North American Chapter of the Association for Computational Linguistics: Tutorials. Association for Computational Linguistics, pp. 15–18. https://aclanthology.org/N19-5004/

Sebastian Ruder. Posts about different aspects of transfer learning. https://ruder.io/tag/transfer-learning/index.html

Tools and instruments

Margaret Mitchell, Simone Wu, Andrew Zaldivar, Parker Barnes, Lucy Vasserman, Ben Hutchinson, Elena Spitzer, Inioluwa Deborah Raji, Timnit Gebru. Model Cards for Model Reporting. FAT* ‘19: Conference on Fairness, Accountability, and Transparency, January 29–31, 2019, Atlanta, GA, USA [https://arxiv.org/abs/1810.03993]

Annotation tools

Mariana Neves, Jurica Ševa, An extensive review of tools for manual annotation of documents, Briefings in Bioinformatics, Volume 22, Issue 1, January 2021, Pages 146–163, [https://github.com/mariananeves/annotation-tools]

Ethics, Green, Research practice

Ana Lucic, Maurits Bleeker, Samarth Bhargav, Jessica Forde, Koustuv Sinha, Jesse Dodge, Sasha Luccioni, and Robert Stojnic. 2022. Towards Reproducible Machine Learning Research in Natural Language Processing. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics: Tutorial Abstracts, pages 7–11, Dublin, Ireland.

Notamment Reproducibility check-list, ACL Rolling Review, ACL adopts the ACM Code of Ethics

Analyse des résultats

Les bases de l’analyse statistique https://lepcam.fr/index.php/les-etapes/stat/
Les bases des tests statistiques https://lepcam.fr/index.php/les-etapes/test/ et https://lepcam.fr/wp-content/uploads/2016/04/Choix-test-statistique.pdf
Analyse univariée https://lepcam.fr/index.php/les-etapes/uni
Analyse bivariée https://lepcam.fr/index.php/les-etapes/bivariee
Analyse multivariée https://lepcam.fr/index.php/les-etapes/multivariee/
Medistica. pvalue.io, a Graphic User Interface to the R statistical analysis software for scientific medical publications. 2020. Available on: https://www.pvalue.io