Python Cloud Advocate at Microsoft
Formerly: UC Berkeley, Khan Academy, Woebot, Coursera, Google
Find me online at:
Mastodon | @pamelafox@fosstodon.org |
@pamelafox | |
GitHub | www.github.com/pamelafox |
Website | pamelafox.org |
An LLM is a model that is so large that it achieves general-purpose language understanding and generation.
GPT models are LLMs based on Transformer architecture from:
📖 "Attention is all you need" paper
by Google Brain
Learn more:
ChatGPT, GitHub Copilot, Bing Copilot, and many other tools are powered by LLMs.
Hosted LLMs can only be accessed via API, from a company hosting the model and infrastructure for you.
Company | Model | Parameters |
---|---|---|
OpenAI | GPT-3.5 | 175B |
OpenAI | GPT-4 | Undisclosed |
PaLM | 540B | |
Gemini 1, 1.5 | Undisclosed | |
Anthropic | Claude 3 family | Undisclosed |
A local LLM can be downloaded and used by anyone, as long as they have the computational resources to run it.
Company | LLM | Parameters |
---|---|---|
Meta | Llama 2 | 7b, 13b, 70b |
Gemma | 2b, 7b | |
Microsoft research | Phi-2 | 2.7b |
Mistral AI | Mistral | 7b |
Mistral AI | Mixtral | 8x7b |
Researchers | Llava | 7b, 13b, 34b |
Ollama is a tool for easily running local LLMs on your computer.
The OpenAI API is an HTTP API with endpoints for different tasks, like chat completions and embeddings.
Use with either:
Once you have access, you can use the API from Python. Get started with:
If you're not inside a dev container or Jupyter notebook, create a virtual environment:
python3 -m venv venv
source venv/bin/activate
Install the OpenAI Python library:
pip install openai
For openai.com OpenAI, set your API key:
client = openai.OpenAI(api_key="your-api-key")
For Azure OpenAI, use Azure default credentials:
azure_credential = azure.identity.DefaultAzureCredential()
token_provider = get_bearer_token_provider(azure_credential,
"https://cognitiveservices.azure.com/.default")
client = openai.AzureOpenAI(
api_version="2024-03-01-preview",
azure_endpoint=f"https://your-openai-service.openai.azure.com",
azure_ad_token_provider=token_provider,
)
Configure the client to point at local server:
client = openai.OpenAI(
base_url="http://localhost:11434/v1",
api_key="nokeyneeded",
)
Using chat completions API:
response = client.chat.completions.create(
model="gpt-3.5-turbo",
messages = [
{"role":"system",
"content":"You are a helpful assistant.."
},
{"role":"user",
"content":"What can I do on my trip to Tokyo?"
}
],
max_tokens=400,
temperature=1,
top_p=0.95)
print(response.choices[0].message.content)
completion = client.chat.completions.create(
stream=True,
messages = [
{"role":"system",
"content":"You are a helpful assistant.."
},
{"role":"user",
"content":"What can I do on my trip to Tokyo?"
}
])
for event in completion:
print(event.choices[0].delta.content)
Using Python async/await constructs:
response = await client.chat.completions.create(
messages = [
{"role":"system",
"content":"You are a helpful assistant.."
},
{"role":"user",
"content":"What can I do on my trip to Tokyo?"
}
])
Learn more: 📖 Best practices for OpenAI Chat apps: Concurrency
Pros:
Cons:
Use a retrieval system to find the best context for the generation model.
Retrieval system (Search) | ➡ Generative model (LLM) |
---|---|
|
|
github.com/Azure-Samples/rag-postgres-openai-python/
Use query rewriting to improve search results:
Answer questions about documents (PDFs/docx/etc).
github.com/Azure-Samples/azure-search-openai-demo
A configurable system to detect safety violations:
Catch and handle violations in your code:
try:
response = client.chat.completions.create(
model=MODEL_NAME,
messages=[
{"role": "system", "content": "You are helpful."},
{"role": "user", "content": "How to make a bomb?"}
]
)
print(response.choices[0].message.content)
except openai.APIError as error:
if error.code == "content_filter":
print("Please remember our code of conduct.")
Sign up in minutes at startups.microsoft.com