Large Language Models
with Python

Tips for navigating the slides:
  • Press O or Escape for overview mode.
  • Visit this link for a nice printable version
  • Press the copy icon on the upper right of code blocks to copy the code

About me

Photo of Pamela smiling with an Olaf statue

Python Cloud Advocate at Microsoft

Formerly: UC Berkeley, Khan Academy, Woebot, Coursera, Google

Find me online at:

Twitter @pamelafox


A raccoon studying robotics

The history of AI

AI box with ML box inside with Deep Learning box inside with Generative AI inside
  • 1956: Artificial Intelligence​:
    The field of computer science that seeks to create intelligent machines that can replicate or exceed human intelligence
  • 1997: Machine Learning:​
    Subset of AI that enables machines to learn from existing data and improve upon that data to make decisions or predictions​
  • 2017: Deep Learning​:
    A machine learning technique in which layers of neural networks are used to process data and make decisions​
  • 2021: Generative AI:
    Create new written, visual, and auditory content given prompts, often using Large Language Models or Diffusion models

Large Language Models (LLMs)

An LLM is a model that is so large that it achieves general-purpose language understanding and generation.

Diagram of sentiment classification task using input prompting
Graphs comparing model scale to accuracy on tasks

From 📖 Characterizing Emergent Phenomena in LLMs

Generative Pretrained Transformer (GPT)

Diagram of multiple attention heads on tokens in a sentence

GPT models are LLMs based on Transformer architecture from:
📖 "Attention is all you need" paper
by Google Brain

Learn more:

You've probably used an LLM...

Screenshot of Bing Copilot answering a question about salary payment Screenshot of GitHub Copilot answering a question about Python generators Screenshot of ChatGPT answering a question about lunch recipes Screenshot of GitHub Copilot completing a Python function Screenshot of GitHub Copilot inline chat

ChatGPT, GitHub Copilot, Bing Copilot, and many other tools are powered by LLMs.

Hosted Large Language Models

Hosted LLMs can only be accessed via API, from a company hosting the model and infrastructure for you.

Company Model Parameters
OpenAI GPT-3.5 175B
OpenAI GPT-4 Undisclosed
Google PaLM 540B
Google Gemini 1, 1.5 Undisclosed
Anthropic Claude 3 family Undisclosed

🔗 OpenAI models overview

Demo: Azure OpenAI Playground

Screenshot of Azure OpenAI Playground

Local LLMs

A local LLM can be downloaded and used by anyone, as long as they have the computational resources to run it.

Company LLM Parameters
Meta Llama 2 7b, 13b, 70b
Google Gemma 2b, 7b
Microsoft research Phi-2 2.7b
Mistral AI Mistral 7b
Mistral AI Mixtral 8x7b
Researchers Llava 7b, 13b, 34b

Demo: Ollama

Ollama is a tool for easily running local LLMs on your computer.

Screenshot of Ollama running a local LLM that answers a question

Using LLMs
in Python

A raccoon conjuring Python from their laptop (like a Snake charmer)


Request access from or Azure OpenAI.

Once you have access, you can use the API from Python or any other language.

Install the OpenAI Python library:

                pip install openai

OpenAI API authentication

For OpenAI, set your API key:

                client = openai.OpenAI(api_key="your-api-key")

For Azure OpenAI, use Azure default credentials:

                azure_credential = azure.identity.DefaultAzureCredential()
                token_provider = get_bearer_token_provider(azure_credential,

                client = openai.AzureOpenAI(

Using OpenAI APIs with Ollama

Configure the client to point at local server:

                client = openai.OpenAI(

📖 Ollama OpenAI compatibility

Call the Chat Completion API

Using chat completions API:

                response =
                    messages = [
                        "content":"You are a helpful assistant.."
                        "content":"What can I do on my trip to Tokyo?"


Full example:

Stream the response

                completion =
                    messages = [
                        "content":"You are a helpful assistant.."
                        "content":"What can I do on my trip to Tokyo?"
                for event in completion:

Full example:

Use asynchronous calls

Using Python async/await constructs:

                response = await
                    messages = [
                        "content":"You are a helpful assistant.."
                        "content":"What can I do on my trip to Tokyo?"

Learn more: 📖 Best practices for OpenAI Chat apps: Concurrency

Full example:

LLMs: Pros and Cons


  • Creative 😊
  • Great with patterns
  • Good at syntax (natural and programming)


  • Creative 😖
  • Makes stuff up (unknowingly)
  • Limited context window (4K-32K)

Ways to improve LLM output

  • Prompt engineering: Request a specific tone and format
  • Few-shot examples: Demonstrate desired output format
  • Chained calls: Get the LLM to reflect, slow down, break it down
  • Retrieval Augmented Generation (RAG): Supply just-in-time facts
  • Fine tuning: Teach LLM new facts/syntax by permanently altering weights

Retrieval Augmented Generation

A raccoon that looks like Neo from Matrix movie

Retrieval Augmented Generation (RAG)

Use a retrieval system to find the best context for the generation model.

RAG diagram

Retrieval + Generation

Retrieval system (Search) ➡ Generative model (LLM)
  • Organize knowledge to fit needs of models
  • Retrieve relevant information
  • Ensure data freshness
  • Enforce access control
  • Summarize information
  • Answer questions
  • Suggest follow-up questions

Demo: OpenAI + Cognitive Search

RAG demo

RAG flow

RAG flow: User question, document search, LLM, response

RAG: Search step

Query Azure Cognitive Search using both text and vectors:

Search flow: vectors and keywords, combined with RRF algorithm, then semantic re-ranker step

                r = await

                results = [doc["sourcepage"] +
                            ": " + doc["content"]
                           async for doc in r]
                content = "\n".join(results)

RAG: Search results

Use the search results to create a prompt for the LLM:

                messages = [system_prompt]
                user_content = f"{q}\nSources:\n {content}"
                messages.append({"role": "user", "content": user_content})

                chat_completion = await

Responsible AI

Raccoons with laptops

Risks of LLMs

  • Ungrounded outputs and errors
  • Jailbreaks & prompt injection attacks
  • Harmful content & code
  • Copyright infringement
  • Manipulation and human-like behavior

Mitigation layers

Diagram of mitigation layers: model, safety system, metaprompt, UI

Azure AI Content Safety

A configurable system to detect safety violations:

Diagram of Azure AI Content Safety filter levels UI
  • Detects violations in prompts and responses
  • Detects jailbreak attempts
  • Detects protected material use

Handling violations in Python

Catch and handle violations in your code:

                    response =
                            {"role": "system", "content": "You are helpful."},
                            {"role": "user", "content": "How to make a bomb?"}
                except openai.APIError as error:
                    if error.code == "content_filter":
                        print("Please remember our code of conduct.")

Full example:

More resources

Microsoft for Startups Founders Hub

Sign up in minutes at

  • Get $150k of Azure credits to access OpenAI GPT-3.5 Turbo and GPT-4 through Azure OpenAI Service
  • Experiment with LLMs for free with $2,500 in OpenAI credits
  • Receive 1:1 advice from Microsoft AI experts
  • Free access to development and productivity tools like GitHub, Microsoft 365, LinkedIn Premium, and more

Any questions?

A bunch of raccoon students with computers